Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hiroshi Sankoh is active.

Publication


Featured researches published by Hiroshi Sankoh.


multimedia signal processing | 2010

Robust background subtraction method based on 3D model projections with likelihood

Hiroshi Sankoh; Akio Ishikawa; Sei Naito; Shigeyuki Sakazawa

We propose a robust background subtraction method for multi-view images, which is essential for realizing free viewpoint video where an accurate 3D model is required. Most of the conventional methods determine background using only visual information from a single camera image, and the precise silhouette cannot be obtained. Our method employs an approach of integrating multi-view images taken by multiple cameras, in which the background region is determined using a 3D model generated by multi-view images. We apply the likelihood of background to each pixel of camera images, and derive an integrated likelihood for each voxel in a 3D model. Then, the background region is determined based on the minimization of energy functions of the voxel likelihood. Furthermore, the proposed method also applies a robust refining process, where a foreground region obtained by a projection of a 3D model is improved according to geometric information as well as visual information. A 3D model is finally reconstructed using the improved foreground silhouettes. Experimental results show the effectiveness of the proposed method compared with conventional works.


multimedia signal processing | 2015

Accurate silhouette extraction of multiple moving objects for free viewpoint sports video synthesis

Qiang Yao; Hiroshi Sankoh; Houari Sabirin; Sei Naito

In this paper, we propose a new method of automatic silhouette extraction of multiple moving objects with high accuracy for free viewpoint stadium sports video synthesis. The proposed method is basically composed of three parts, including a global extraction based on temporal background subtraction, a classification step based on the constraints of extracted candidates of objects, and local refinement based on the statistical information of the chrominance component of each extracted object. Experimental results show that the proposal outperforms the temporal background subtraction model and Gaussian Mixture Model (GMM) in terms of both objective and subjective evaluations. In addition, the quality of the synthesized free viewpoint sports video is also enhanced by adopting more accurate silhouettes of objects that are extracted by our proposed method. Furthermore, as there is no manual operation in the proposed method, the automatic multiple silhouettes extraction has also been fully realized.


acm multimedia | 2012

Interactive music video application for smartphones based on free-viewpoint video and audio rendering

Toshiharu Horiuchi; Hiroshi Sankoh; Tsuneo Kato; Sei Naito

This paper presents a novel interactive music video application for smartphones based on free-viewpoint video technology in conjunction with three-dimensional positional audio technology. A user can enjoy a music video from a moving viewpoint that the user can manipulate by the touch screen, with the positional audio through the headphone. A user can even manipulate the positions of the performers on the stage as well as the viewpoint. The application, consisting of our audio rendering engine for multiple AAC ADTS files and our video rendering engine for multiple H.264 ES files, runs on a smartphone in stand-alone mode. The application has been released as official content from a music label for Android and iOS.


international conference on image processing | 2016

Robust moving camera calibration for synthesizing free viewpoint soccer video

Qiang Yao; Keisuke Nonaka; Hiroshi Sankoh; Sei Naito

In this paper, a robust moving camera calibration method is proposed in order to synthesize a free viewpoint soccer video with a high degree of accuracy. The main problem in video registration-based moving camera calibration is that the calibration accuracy is very low if the detected feature points are from moving objects. In order to solve this problem, the proposed method tracks the feature points along video frames to construct a trajectory matrix of feature points, and the trajectory matrix is decomposed into a low-rank matrix representing global camera motion and a sparse matrix representing local individual motion. Therefore, according to such decomposition, the individual motions of dynamic feature points that are from moving objects could be suppressed and removed. Experimental results show that the proposed method achieves more accurate calibration result and the visual quality of a synthesized free viewpoint soccer video is also improved by the proposed method.


multimedia signal processing | 2016

Automatic camera self-calibration for immersive navigation of free viewpoint sports video

Qiang Yao; Hiroshi Sankoh; Keisuke Nonaka; Sei Naito

In recent years, the demand of immersive experience has triggered a great revolution in the applications and formats of multimedia. Particularly, immersive navigation of free viewpoint sports video has become increasingly popular, and people would like to be able to actively select different viewpoints when watching sports videos to enhance the ultra realistic experience. In the practical realization of immersive navigation of free viewpoint video, the camera calibration is of vital importance. Especially, automatic camera calibration is very significant in real-time implementation and the accuracy of camera parameter directly determines the final experience of free viewpoint navigation. In this paper, we propose an automatic camera self-calibration method based on a field model for free viewpoint navigation in sports events. The proposed method is composed of three parts, namely, extraction of field lines in a camera image, calculation of crossing points, determination of the optimal camera parameter. Experimental results show that the camera parameter can be automatically estimated by the proposed method for a fixed camera, dynamic camera and multi-view cameras with high accuracy. Furthermore, immersive free viewpoint navigation in sports events can also be completely realized based on the camera parameter estimated by the proposed method.


international conference on image processing | 2014

Joint gaze-correction and beautification of DIBR-synthesized human face via dual sparse coding

Xianming Liu; Gene Cheung; Deming Zhai; Debin Zhao; Hiroshi Sankoh; Sei Naito

Gaze mismatch is a common problem in video conferencing, where the viewpoint captured by a camera (usually located above or below a display monitor) is not aligned with the gaze direction of the human subject, who typically looks at his counterpart in the center of the screen. This means that the two parties cannot converse eye-to-eye, hampering the quality of visual communication. One conventional approach to the gaze mismatch problem is to synthesize a gaze-corrected face image as viewed from center of the screen via depth-image-based rendering (DIBR), assuming texture and depth maps are available at the camera-captured viewpoint(s). Due to self-occlusion, however, there will be missing pixels in the DIBR-synthesized view image that require satisfactory filling. In this paper, we propose to jointly solve the hole-filling problem and the face beautification problem (subtle modifications of facial features to enhance attractiveness of the rendered face) via a unified dual sparse coding framework. Specifically, we first train two dictionaries separately: one for face images of the intended conference subject, one for images of “beautiful” human faces. During synthesis, we simultaneously seek two code vectors - one is sparse in the first dictionary and explains the available DIBR-synthesized pixels, the other is sparse in the second dictionary and matches well with the first vector up to a restricted linear transform. This ensures a good match with the intended target face, while increasing proximity to “beautiful” facial features to improve attractiveness. Experimental results show naturally rendered human faces with noticeably improved attractiveness.


international conference on image processing | 2013

Occlusion robust free-viewpoint video synthesis based on inter-camera/-frame interpolation

Kentaro Yamada; Hiroshi Sankoh; Masaru Sugano; Sei Naito

In this paper, we propose a novel free-viewpoint video synthesis method that adaptively extracts the texture even from occluded areas. The conventional method based on object segmentation and inter-camera interpolation has two major problems. One is that textures of incorrectly segmented objects degrade image quality in the synthesized output of free-viewpoint video. For example, some object textures have missing regions and other object textures include unwanted regions. The other problem is that the inter-camera interpolation often causes inconsistency between the object appearance and corresponding moving direction. In order to overcome these problems, we propose a new texture acquisition scheme based on inter-frame interpolation to handle the case where object segmentation and inter-camera interpolation are both insufficient. In addition, the proposed method enables adaptive selection among three texture acquisition schemes, segmentation, inter-camera interpolation and inter-frame interpolation. This selection is optimally conducted considering the segmentation results and the direction of the virtual view point. The experimental results revealed that the proposed method can acquire an appropriate texture. Consequently, the subjective quality of generated free-viewpoint video is successfully improved while maintaining the original motion property even for occluded objects.


IEEE MultiMedia | 2018

Semi-Automatic Generation of Free Viewpoint Video Contents for Sport Events: Toward Real-time Delivery of Immersive Experience

Houari Sabirin; Qiang Yao; Keisuke Nonaka; Hiroshi Sankoh; Sei Naito

Free viewpoint technology makes it possible to view video of sports content from any angle or position, but creating such content is currently a time-consuming process that can prevent real-time delivery. To address this problem, the authors present an application framework that implements semi-automatic camera calibration, object extraction, object tracking, and object separation to seamlessly generate high-quality free viewpoint sports videos for handheld devices.


international conference on acoustics, speech, and signal processing | 2017

Fast camera self-calibration for synthesizing Free Viewpoint soccer Video

Qiang Yao; Akira Kubota; Kaoru Kawakita; Keisuke Nonaka; Hiroshi Sankoh; Sei Naito

Recently, non-fixed camera-based free viewpoint sports video synthesis has become very popular. Camera calibration is an indispensable step in free viewpoint video synthesis, and the calibration has to be done frame by frame for a non-fixed camera. Thus, calibration speed is of great significance in real-time application. In this paper, a fast self-calibration method for a non-fixed camera is proposed to estimate the homography matrix between a camera image and a soccer field model. As far as we know, it is the first time to propose constructing feature vectors by analyzing crossing points of field lines in both camera image and field model. Therefore, different from previous methods that evaluate all the possible homography matrices and select the best one, our proposed method only evaluates a small number of homography matrices based on the matching result of the constructed feature vectors. Experimental results show that the proposed method is much faster than other methods with only a slight loss of calibration accuracy that is negligible in final synthesized videos.


acm multimedia | 2014

Color Transfer based on Spatial Structure for Telepresence

Kentaro Yamada; Hiroshi Sankoh; Sei Naito

In this paper, we propose a novel color transfer method based on spatial structure. This work considers an immersive telepresence system, in which distant users can feel as if they are present at a place other than their true location. In preceding studies, the region of an attendee in a remote room is extracted and synthesized on the display of a local room with a preset background image that is similar to the local room. However, the difference between the structure of preset background image and remote room image often degrades the quality of image in the synthesized video. For example, when a part of the human region is occluded by an object that does not exist in the preset background image, the deficient human region is shown despite the inexistence of the occluding object. To solve this problem, instead of using a preset background image, we propose a method that applies a color transfer technique to images of the remote room. Our proposed method can offer users an experience that allows them to feel as if they are in the same room as the other participant of telepresence by changing the colors of the remote room to match the colors of the local room. Furthermore, we have improved the similarity between the rooms based on spatial structure that was overlooked by the conventional color transfer methods. The experimental results show that the proposed method can provide the same-room experience for users in the telepresence system through color similarity of the remote and local room.

Collaboration


Dive into the Hiroshi Sankoh's collaboration.

Top Co-Authors

Avatar

Keisuke Nonaka

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shigeyuki Sakazawa

Osaka Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Keisuke Nonaka

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gene Cheung

National Institute of Informatics

View shared research outputs
Researchain Logo
Decentralizing Knowledge