Atsuto Maki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Atsuto Maki is active.

Explore More

Publication

Featured researches published by Atsuto Maki.

computer vision and pattern recognition | 2015

From generic to specific deep representations for visual recognition

Hossein Azizpour; Ali Sharif Razavian; Josephine Sullivan; Atsuto Maki; Stefan Carlsson

Evidence is mounting that ConvNets are the best representation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve state-of-the-art performances on 16 visual recognition tasks.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016

Factors of Transferability for a Generic ConvNet Representation

Hossein Azizpour; Ali Sharif Razavian; Josephine Sullivan; Atsuto Maki; Stefan Carlsson

Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.

Computer Vision and Image Understanding | 2000

Attentional Scene Segmentation

Atsuto Maki; Peter Nordlund; Jan-Olof Eklundh

We present an approach to attention in active computer vision. The notion of attention plays an important role in biological vision. In recent years, and especially with the emerging interest in active vision, computer vision researchers have been increasingly concerned with attentional mechanisms as well. The basic principles behind these efforts are greatly influenced by psychophysical research. That is the case also in the work presented here, which adapts to the model of Treisman (1985, Comput. Vision Graphics Image Process. Image Understanding31, 156?177), with an early parallel stage with preattentive cues followed by a later serial stage where the cues are integrated. The contributions in our approach are (i) the incorporation of depth information from stereopsis, (ii) the simple implementation of low level modules such as disparity and flow by local phase, and (iii) the cue integration along pursuit and saccade mode that allows us a proper target selection based on nearness and motion. We demonstrate the technique by experiments in which a moving observer selectively masks out different moving objects in real scenes.

international conference on pattern recognition | 1996

A computational model of depth-based attention

Atsuto Maki; Peter Nordlund; Jan-Olof Eklundh

We present a computational model for attention. It consists of an early parallel stage with preattentive cues followed by a later serial stage, where the cues are integrated. We base the model on disparity image flow and motion. As one of the several possibilities we choose a depth-based criterion to integrate these cues, in such a way that the attention is maintained to the closest moving object. We demonstrate the technique by experiments in which a moving observer selectively mask our different moving objects in real scenes.

international conference on computer vision | 1995

Towards an active visual observer

Tomas Uhlin; Peter Nordlund; Atsuto Maki; Jan-Olof Eklundh

We present a binocular active vision system that can attend to and fixate a moving target. Our system has an open and expandable design and it forms the first steps of a long term effort towards developing an active observer using vision to interact with the environment, in particular capable of figure-ground segmentation. We also present partial real-time implementations of this system and show their performance in real-world situations together with motor control. In pursuit we particularly focus on occlusions of other targets, both stationary and moving, and integrate three cues, ego-motion, target motion and target disparity, to obtain an overall robust behavior. An active vision system must be open, expandable, and operate with whatever data are available momentarily. It must also be equipped with means and methods to direct and change its attention. This system is therefore equipped with motion detection for changing attention and pursuit for maintaining attention, both of which run concurrently.<<ETX>>

Computer Vision and Image Understanding | 2009

Difference sphere: An approach to near light source estimation

Takeshi Takai; Atsuto Maki; Koichiro Niinuma; Takashi Matsuyama

We present a novel approach for estimating lighting sources from a single image of a scene that is illuminated by near point light sources, directional light sources and ambient light. We propose to employ a pair of reference spheres as light probes and introduce the difference sphere that we acquire by differencing the intensities of two image regions of the reference spheres. Since the effect by directional light sources and ambient light is eliminated by differencing, the key advantage of considering the difference sphere is that it enables us to estimate near point light sources including their radiance, which has been difficult to achieve in previous efforts where only distant directional light sources were assumed. We also show that analysis of gray level contours on spherical surfaces facilitates separate identification of multiple combined light sources and is well suited to the difference sphere. Once we estimate point light sources with the difference sphere, we update the input image by eliminating their influence and then estimate other remaining light sources, that is, directional light sources and ambient light. We demonstrate the effectiveness of the entire algorithm with experimental results.

IEEE Transactions on Circuits and Systems for Video Technology | 2009

The Multiple-Camera 3-D Production Studio

Jonathan Starck; Atsuto Maki; Shohei Nobuhara; Adrian Hilton; Takashi Matsuyama

Multiple-camera systems are currently widely used in research and development as a means of capturing and synthesizing realistic 3-D video content. Studio systems for 3-D production of human performance are reviewed from the literature, and the practical experience gained in developing prototype studios is reported across two research laboratories. System design should consider the studio backdrop for foreground matting, lighting for ambient illumination, camera acquisition hardware, the camera configuration for scene capture, and accurate geometric and photometric camera calibration. A ground-truth evaluation is performed to quantify the effect of different constraints on the multiple-camera system in terms of geometric accuracy and the requirement for high-quality view synthesis. As changing camera height has only a limited influence on surface visibility, multiple-camera sets or an active vision system may be required for wide area capture, and accurate reconstruction requires a camera baseline of 25deg, and the achievable accuracy is 5-10-mm at current camera resolutions. Accuracy is inherently limited, and view-dependent rendering is required for view synthesis with sub-pixel accuracy where display resolutions match camera resolutions. The two prototype studios are contrasted and state-of-the-art techniques for 3-D content production demonstrated.

International Journal of Computer Vision | 2014

Demisting the Hough Transform for 3D Shape Recognition and Registration

Oliver Woodford; Minh-Tri Pham; Atsuto Maki; Frank Perbet; Björn Stenger

In applying the Hough transform to the problem of 3D shape recognition and registration, we develop two new and powerful improvements to this popular inference method. The first, intrinsic Hough, solves the problem of exponential memory requirements of the standard Hough transform by exploiting the sparsity of the Hough space. The second, minimum-entropy Hough, explains away incorrect votes, substantially reducing the number of modes in the posterior distribution of class and pose, and improving precision. Our experiments demonstrate that these contributions make the Hough transform not only tractable but also highly accurate for our example application. Both contributions can be applied to other tasks that already use the standard Hough transform.

international conference on computer vision | 2011

A new distance for scale-invariant 3D shape recognition and registration

Minh-Tri Pham; Oliver Woodford; Frank Perbet; Atsuto Maki; Björn Stenger; Roberto Cipolla

This paper presents a method for vote-based 3D shape recognition and registration, in particular using mean shift on 3D pose votes in the space of direct similarity transforms for the first time. We introduce a new distance between poses in this space—the SRT distance. It is left-invariant, unlike Euclidean distance, and has a unique, closed-form mean, in contrast to Riemannian distance, so is fast to compute. We demonstrate improved performance over the state of the art in both recognition and registration on a real and challenging dataset, by comparing our distance with others in a mean shift framework, as well as with the commonly used Hough voting approach.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2001

Hyperpatches for 3D model acquisition and tracking

Charles Wiles; Atsuto Maki; Natsuko Matsuda

Automatic 3D model acquisition and 3D tracking of simple objects under motion using a single camera is often difficult due to the sparsity of information from which to establish the model. We developed an automatic scheme that first computes a simple Euclidean model of the object and then enriches this model using hyperpatches. These hyperpatches contain information on both the orientation and intensity pattern variation of roughly planar patches on an object. This information allows both the spatial and intensity distortions of the projected patch to be modeled accurately under 3D object motion. Considering human tracking as a specific application, we show that hyperpatches can not only be computed automatically during model acquisition from a monocular image sequence, but that they are also extremely appropriate for the task of visual tracking.

Explore More