Is this you? Create Your Porfile

Mahito Fujii

Graduate University for Advanced Studies

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mahito Fujii is active.

Explore More

Publication

Featured researches published by Mahito Fujii.

Multimedia Tools and Applications | 2013

Human gesture recognition system for TV viewing using time-of-flight camera

Masaki Takahashi; Mahito Fujii; Masahide Naemura; Shin'ichi Satoh

We developed a new device-free user interface for TV viewing that uses a human gesture recognition technique. Although many motion recognition technologies have been reported, no man–machine interface that recognizes a large enough variety of gestures has been developed. The difficulty was the lack of spatial information that could be acquired from normal video sequences. We overcame the difficulty by using a time-of-flight camera and novel action recognition techniques. The main functions of this system are gesture recognition and posture measurement. The former is performed using the bag-of-features approach, which uses key-point trajectories as features. The use of 4-D spatiotemporal trajectory features is the main technical contribution of the proposed system. The latter is obtained through face detection and object tracking technology. The interface is useful because it does not require any contact-type devices. Several experiments proved the effectiveness of our proposed method and the usefulness of the system.

computer vision and pattern recognition | 2011

Human action recognition in crowded surveillance video sequences by using features taken from key-point trajectories

Masaki Takahashi; Masahide Naemura; Mahito Fujii; Shin'ichi Satoh

There is a need for systems that can automatically detect specific human actions in a surveillance video. However, almost all of the human action recognition techniques proposed so far are for detecting relatively large actions within simple video sequences. To alleviate this shortcoming, we propose a method that can detect specific actions within crowd sequences of real surveillance video. Our action recognition method is based on the bag-of-features approach, and key-point trajectories are used as its features. One problem is that key-point trajectories cannot be directly input when using the bag-of-features approach, because they have various time lengths. To overcome this difficulty, our method extracts a fixed-length feature descriptor from a key-point trajectory and uses it for event classification. In addition, feature weights are calculated for reducing the interference from noise trajectories in the background regions. Our method could more precisely detect specific actions than conventional other methods, and it performed well in the TRECVID 2010 Surveillance Event Detection task.

international conference on acoustics, speech, and signal processing | 2007

Distributed Particle Filtering for Multiocular Soccer-Ball Tracking

Toshihiko Misu; Atsushi Matsui; Masahide Naemura; Mahito Fujii; Nobuyuki Yagi

This paper proposes a distributed state estimation architecture for multi-sensor fusion. The system consists of networked subsystems that cooperatively estimate the state of a common target from their own observations. Each subsystem is equipped with a self-contained particle filter that can operate in stand-alone as well as in network mode with a particle exchange function. We applied this flexible architecture to 3D soccer-ball tracking by modeling the imaging processes related to the centroid, size, and motion-blur of a target, and by modeling the dynamics with ballistic motion, bounce, and rolling. To evaluate the precision and robustness of the system, we conducted experiments using multiocular images of a professional soccer match.

international symposium on multimedia | 2008

Automatic Pitch Type Recognition from Baseball Broadcast Videos

Masaki Takahashi; Mahito Fujii; Nobuyuki Yagi

An automatic pitch type recognition system has been developed. It is difficult to determine the pitch type automatically from a baseball broadcast video, so the decision is currently made by specialists that have expertise and experience in baseball. We developed a system incorporating expertise of professionals. The system identifies pitch type, such as straight balls and curveballs, from single-view pitching sequences in a live baseball broadcast. It analyses ball trajectories by using automatic ball tracking and the catcherpsilas stance by tracking the mitt region as well as recognizing the ball speed numbers superimposed on the screen, and it classifies the pitch type using the random forests ensemble-learning algorithm. The system achieved about 90% recognition accuracy in the experiment.

EURASIP Journal on Advances in Signal Processing | 2010

Robust recognition of specific human behaviors in crowded surveillance video sequences

Masaki Takahashi; Mahito Fujii; Masahiro Shibata; Shin'ichi Satoh

We describe a method that can detect specific human behaviors even in crowded surveillance video scenes. Our developed system recognizes specific behaviors based on the trajectories created by detecting and tracking people in a video. It detects people using an HOG descriptor and SVM classifier, and it tracks the regions by calculating the two-dimensional color histograms. Our system identifies several specific human behaviors, such as running and meeting, by analyzing the similarities to the reference trajectory of each behavior. Verification techniques such as backward tracking and calculating optical flows contributed to robust recognition. Comparative experiments showed that our system could track people more robustly than a baseline tracking algorithm even in crowded scenes. Our system precisely identified specific behaviors and achieved first place for detecting running people in the TRECVID 2009 Surveillance Event Detection Task.

conference on multimedia modeling | 2009

Probabilistic Integration of Tracking and Recognition of Soccer Players

Toshie Misu; Atsushi Matsui; Simon Clippingdale; Mahito Fujii; Nobuyuki Yagi

This paper proposes a method for integrating player trajectories tracked in wide-angle images and identities by face and back-number recognition from images by a motion-controlled camera. In order to recover from tracking failures efficiently, the motion-controlled camera scans and follows players who are judged likely to undergo heavy occlusions several seconds in the future. The candidates of identities for each tracked trajectory are probabilistically modeled and updated at every identification. The degradation due to the passage of time and occlusions are also modeled. Experiments showed the systems feasibility for automatic real-time formation estimation which will be applied to metadata production with semantic and dynamic information on sports scenes.

acm multimedia | 2010

Human gesture recognition using 3.5-dimensional trajectory features for hands-free user interface

Masaki Takahashi; Mahito Fujii; Masahide Naemura; Shin'ichi Satoh

We present a new human motion recognition technique for a hands-free user interface. Although many motion recognition technologies for video sequences have been reported, no man-machine interface that recognizes enough variety of motions has been developed. The difficulty was the lack of spatial information that could be acquired from video sequences captured by a normal camera. The proposed system uses a depth image in addition to a normal grayscale image from a time-of-flight camera that measures the depth to objects, so various motions are accurately recognized. The main functions of this system are gesture recognition and posture measurement. The former is performed using the bag-of-words approach. The trajectories of tracked key points around the human body are used as features in this approach. The main technical contribution of the proposed method is the use of 3.5D spatiotemporal trajectory features, which contain horizontal, vertical, time, and depth information. The latter is obtained through face detection and object tracking technology. The proposed user interface is useful and natural because it does not require any contact-type devices, such as a motion sensor controller. The effectiveness of the proposed 3.5D spatiotemporal features was confirmed through a comparative experiment with conventional 3.0D spatiotemporal features. The generality of the system was proven by an experiment with multiple people. The usefulness of the system as a pointing device was also proven by a practical simulation.

international symposium on multimedia | 2009

Multimedia Databases for Video Indexing: Toward Automatic Face Image Registration

Simon Clippingdale; Mahito Fujii; Masahiro Shibata

Pose-invariant face recognition systems for multimedia indexing require the prior registration of face images at multiple poses in a database, but it can be problematic and laborious to obtain and register appropriate imagery. We aim to automate the process by constructing 3D face models from the imagery available for registration, and then using the constructed models to generate templates for the face recognition system. The first step in the model construction process is the estimation of the generalized pose (scale, position, 3D orientation relative to the camera) of the face in each frame of the registration imagery, and the 3D positions of a number of feature points on the face. This is followed by warping the 3D model to fit the estimated 3D feature points, and mapping facial texture from the registration imagery onto the model. In this paper we outline (i) an algorithm for estimating the generalized pose and shape (3D feature point locations) from 2D feature point tracking data, and (ii) a texture mapping algorithm that combines texture regions from all of the available imagery. We show experimental results and discuss issues that remain in applying the method in practice as part of a multimedia indexing system.

LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application | 2008

Soccer formation classification based on fisher weight map and Gaussian mixture models

Toshie Misu; Masahide Naemura; Mahito Fujii; Nobuyuki Yagi

This paper proposes a method that analyzes player formations in order to classify kick and throw-in events in soccer matches. Formations are described in terms of local head counts and mean velocities, which are converted into canonical variates using a Fisher weight map in order to select effective variates for discriminating between events. The map is acquired by supervised learning. The distribution of the variates for each event class is modeled by Gaussian mixtures in order to handle its multimodality in canonical space. Our experiments showed that the Fisher weight map extracted semantically explicable variates related to such situations as players at corners and left/right separation. Our experiments also showed that characteristically formed events, such as kick-offs and corner-kicks, were successfully classified by the Gaussian mixture models. The effect of spatial nonlinearity and fuzziness of local head counts are also evaluated.

international conference on consumer electronics | 2013

Level-of-interest estimation for personalized TV program recommendation

Simon Clippingdale; Makoto Okuda; Masaki Takahashi; Masahide Naemura; Mahito Fujii

We describe a prototype system that analyzes video from a camera mounted on a TV receiver or set-top-box, showing viewers watching the TV. The system recognizes the faces of registered viewers, and estimates the level of interest that each viewer displays in the program being viewed. This information, along with receiver operation history, can be used to build viewer profiles and to offer personalized program recommendations reflecting each viewers perceived interests.

Explore More