Yukinobu Taniguchi
Tokyo University of Science
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yukinobu Taniguchi.
conference on multimedia computing and networking | 1994
Yoshinobu Tonomura; Akihito Akutsu; Yukinobu Taniguchi; Gen Suzuki
Video is becoming increasingly important for multimedia applications, but computers should let us do more than just watch.We propose a way for computers to structure video and several new interfaces that make it easier to browse and search.
acm multimedia | 1997
Yukinobu Taniguchi; Akihito Akutsu; Yoshinobu Tonomura
Browsing is a fundamental function in multimedia systems. This paper presents PanoramaExcerpts a video browsing interface that shows a catalogue of two types of video icons: panoramic and keyframe icons. A panoramic icon is synthesized from a video segment taken with camera pan or tilt, and extracted using a camera operation estimation technique. A keyframe icon is extracted to supplement the panoramic icons; a shot-change detection algorithm is used. A panoramic icon represents the entire visible contents of a scene extended with camera pan or tilt, which is difficult to summarize using a single keyframe. For the automatic generation of PanoramaExcerpts, we propose an approach to integrate the following: (a) a shot-change detection method that detects instantaneous cuts as well as dissolves, with adaptive control over the sampling rate for efficient processing; (b) a method for locating segments that contain smooth camera pans or tilts, from which the panoramic icons can be synthesized; and (c) a layout method for packing icons in a space-efficient manner. We also describe the experimental results of the above three methods and the potential applications of PanoramaExcerpts.
international conference on multimedia and expo | 2000
Hidetaka Kuwano; Yukinobu Taniguchi; Hiroyuki Arai; Minoru Mori; Shoji Kurakake; Haruhiko Kojima
The paper presents a telop-on-demand system that anatomically recognizes texts in video frames to create the indices needed for content based video browsing and retrieval. Superimposed texts are important as they provide semantic information about scene contents. Their attributes such as fonts, size, and position in a frame are important as they are carefully designed by the video editor and so reflect the intent of captioning. In news programs, for instance, the headline text is displayed in larger fonts than the subtitles. Our system takes into account not only the texts themselves but also their attributes for structuring videos. We describe: (i) novel methods for detecting and extracting texts that are robust against the presence of complex backgrounds and intensity degradation of the character patterns, and (ii) a method for structuring a video based on the text attributes.
international conference on computer vision | 2015
Go Irie; Hiroyuki Arai; Yukinobu Taniguchi
This paper addresses the problem of unsupervised learning of binary hash codes for efficient cross-modal retrieval. Many unimodal hashing studies have proven that both similarity preservation of data and maintenance of quantization quality are essential for improving retrieval performance with binary hash codes. However, most existing cross-modal hashing methods mainly have focused on the former, and the latter still remains almost untouched. We propose a method to minimize the binary quantization errors, which is tailored to cross-modal hashing. Our approach, named Alternating Co-Quantization (ACQ), alternately seeks binary quantizers for each modality space with the help of connections to other modality data so that they give minimal quantization errors while preserving data similarities. ACQ can be coupled with various existing cross-modal dimension reduction methods such as Canonical Correlation Analysis (CCA) and substantially boosts their retrieval performance in the Hamming space. Extensive experiments demonstrate that ACQ can outperform several state-of-the-art methods, even when it is combined with simple CCA.
british machine vision conference | 2015
Kota Yamaguchi; Takayuki Okatani; Kyoko Sudo; Kazuhiko Murasaki; Yukinobu Taniguchi
This paper studies clothing and attribute recognition in the fashion domain. Specifically, in this paper, we turn our attention to the compatibility of clothing items and attributes (Fig 1). For example, people do not wear a skirt and a dress at the same time, yet a jacket and a shirt are a preferred combination. We consider such inter-object or inter-attribute compatibility and formulate a Conditional Random Field (CRF) that seeks the most probable combination in the given picture. The model takes into account the location-specific appearance with respect to a human body and the semantic correlation between clothing items and attributes, which we learn using the max-margin framework. Fig 2 illustrates our pipeline. We evaluate our model using two datasets that resemble realistic applica- tion scenarios: on-line social networks and shopping sites. The empirical evaluation indicates that our model effectively improves the recognition performance over various baselines including the state-of-the-art feature designed exclusively for clothing recognition. The results also suggest that our model generalizes well to different fashion-related applications.
ubiquitous computing | 2014
Kyoko Sudo; Kazuhiko Murasaki; Jun Shimamura; Yukinobu Taniguchi
Estimating the nutritional value of food based on image recognition is important to health support services employing mobile devices. The estimation accuracy can be improved by recognizing regions of food objects and ingredients contained in those regions. In this paper, we propose a method that estimates nutritional information based on segmentation and labeling of food regions of an image by adopting a semantic segmentation method, in which we consider recipes as corresponding sets of food images and ingredient labels. Any food object or ingredient in a test food image can be annotated as long as the ingredient is contained in a training food image, even if the menu containing the food image appears for the first time. Experimental results show that better estimation is achieved through regression analysis using ingredient labels associated with the segmented regions than when using the local feature of pixels as the predictor variable.
Multimedia Tools and Applications | 2016
Yongqing Sun; Kyoko Sudo; Yukinobu Taniguchi
Due to the huge intra-class variations for visual concept detection, it is necessary for concept learning to collect large scale training data to cover a wide variety of samples as much as possible. But it presents great challenges on both how to collect and how to train the large scale data. In this paper, we propose a novel web image sampling approach and a novel group sparse ensemble learning approach to tackle these two challenging problems respectively. For data collection, in order to alleviate manual labeling efforts, we propose a web image sampling approach based on dictionary coherence to select coherent positive samples from web images. We propose to measure the coherence in terms of how dictionary atoms are shared because shared atoms represent common features with regard to a given concept and are robust to occlusion and corruption. For efficient training of large scale data, in order to exploit the hidden group structures of data, we propose a novel group sparse ensemble learning approach based on Automatic Group Sparse Coding (AutoGSC). After AutoGSC, we present an algorithm to use the reconstruction errors of data instances to calculate the ensemble gating function for ensemble construction and fusion. Experiments show that our proposed methods can achieve promising results and outperforms existing approaches.
international conference on machine vision | 2015
Haruka Yonemoto; Kazuhiko Murasaki; Tatsuya Osawa; Kyoko Sudo; Jun Shimamura; Yukinobu Taniguchi
Many studies on action recognition from the third-person viewpoint have shown that articulated human pose can directly describe human motion and is invariant to view change. However, conventional algorithms that estimate articulated human pose cannot handle ego-centric images because they assume the whole figure appears in the image; only a few parts of the body appear in ego-centric images. In this paper, we propose a novel method to estimate human pose for action recognition from ego-centric RGB-D images. Our method can extract the pose by integrating hand detection, camera pose estimation, and time-series filtering with the constraint of body shape. Experiments show that joint positions are well estimated when the detection error of hands and arms decreases. We demonstrate that the accuracy of action recognition is improved by the feature of skeleton when the action contains unintended view changes.
acm multimedia | 2013
Shuhei Tarashima; Go Irie; Ken Tsutsuguchi; Hiroyuki Arai; Yukinobu Taniguchi
Image/video collection summarization is an emerging paradigm to provide an overview of contents stored in massive databases. Existing algorithms require at least O(N) time to generate a summary, which cannot be applied to online scenarios. Assuming that contents are represented as a sparse graph, we propose a fast image/video collection summarization algorithm using local graph clustering. After a query node is specified, our algorithm first finds a small sub-graph near the query without looking at the whole graph, and then selects fewer number of nodes diverse to each other. Our algorithm thus provides a summary in nearly constant time in the number of contents. Experimental results demonstrate that our algorithm is more than 1500 times faster than a state-of-the-art method, with comparable summarization quality.
international conference on machine vision | 2015
Jun Shimamura; Taiga Yoshida; Yukinobu Taniguchi; Hiroko Yabushita; Kyoko Sudo; Kazuhiko Murasaki
This paper proposes a novel geometric verification method to handle 3D viewpoint changes under cluttered scenes for robust object recognition. Since previous voting-based verification approaches, which enable recognition in cluttered scenes, are based on 2D affine transformation, verification accuracy is significantly degraded when viewpoint changes occur for 3D objects that abound in real-world scenes. The method based on view-directional consistency constraints requires that the angle in 3D between observed directions of all matched feature points on two given images must be consistent with the relative pose between the two cameras, whereas the conventional methods consider the consistency of the spatial layout in 2D of feature points in the image. To achieve this, we first embed observed 3D angle parameters into local features when extracting the features. At the verification stage after local feature matching, a voting-based approach identifies the clusters of matches that agree on relative camera pose in advance of full geometric verification. Experimental results demonstrating the superior performance of the proposed method are shown.