Kyoko Sudo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kyoko Sudo is active.

Explore More

Publication

Featured researches published by Kyoko Sudo.

british machine vision conference | 2015

Mix and Match: Joint Model for Clothing and Attribute Recognition

Kota Yamaguchi; Takayuki Okatani; Kyoko Sudo; Kazuhiko Murasaki; Yukinobu Taniguchi

This paper studies clothing and attribute recognition in the fashion domain. Specifically, in this paper, we turn our attention to the compatibility of clothing items and attributes (Fig 1). For example, people do not wear a skirt and a dress at the same time, yet a jacket and a shirt are a preferred combination. We consider such inter-object or inter-attribute compatibility and formulate a Conditional Random Field (CRF) that seeks the most probable combination in the given picture. The model takes into account the location-specific appearance with respect to a human body and the semantic correlation between clothing items and attributes, which we learn using the max-margin framework. Fig 2 illustrates our pipeline. We evaluate our model using two datasets that resemble realistic applica- tion scenarios: on-line social networks and shopping sites. The empirical evaluation indicates that our model effectively improves the recognition performance over various baselines including the state-of-the-art feature designed exclusively for clothing recognition. The results also suggest that our model generalizes well to different fashion-related applications.

european conference on computer vision | 2016

Automatic Attribute Discovery with Neural Activations

Sirion Vittayakorn; Takayuki Umeda; Kazuhiko Murasaki; Kyoko Sudo; Takayuki Okatani; Kota Yamaguchi

How can a machine learn to recognize visual attributes emerging out of online community without a definitive supervised dataset? This paper proposes an automatic approach to discover and analyze visual attributes from a noisy collection of image-text data on the Web. Our approach is based on the relationship between attributes and neural activations in the deep network. We characterize the visual property of the attribute word as a divergence within weakly-annotated set of images. We show that the neural activations are useful for discovering and learning a classifier that well agrees with human perception from the noisy real-world Web data. The empirical study suggests the layered structure of the deep neural networks also gives us insights into the perceptual depth of the given word. Finally, we demonstrate that we can utilize highly-activating neurons for finding semantically relevant regions.

ubiquitous computing | 2014

Estimating nutritional value from food images based on semantic segmentation

Kyoko Sudo; Kazuhiko Murasaki; Jun Shimamura; Yukinobu Taniguchi

Estimating the nutritional value of food based on image recognition is important to health support services employing mobile devices. The estimation accuracy can be improved by recognizing regions of food objects and ingredients contained in those regions. In this paper, we propose a method that estimates nutritional information based on segmentation and labeling of food regions of an image by adopting a semantic segmentation method, in which we consider recipes as corresponding sets of food images and ingredient labels. Any food object or ingredient in a test food image can be annotated as long as the ingredient is contained in a training food image, even if the menu containing the food image appears for the first time. Experimental results show that better estimation is achieved through regression analysis using ingredient labels associated with the segmented regions than when using the local feature of pixels as the predictor variable.

machine vision applications | 2008

Estimating Anomality of the Video Sequences for Surveillance Using 1-Class SVM

Kyoko Sudo; Tatsuya Osawa; Kaoru Wakabayashi; Hideki Koike; Kenichi Arakawa

We have proposed a method to detect and quantitatively extract anomalies from surveillance videos. Using our method, anomalies are detected as patterns based on spatio-temporal features that are outliers in new feature space. Conventional anomaly detection methods use features such as tracks or local spatio-temporal features, both of which provide insufficient timing information. Using our method, the principal components of spatio-temporal features of change are extracted from the frames of video sequences of several seconds duration. This enables anomalies based on movement irregularity, both position and speed, to be determined and thus permits the automatic detection of anomal events in sequences of constant length without regard to their start and end. We used a 1-class SVM, which is a non-supervised outlier detection method. The output from the SVM indicates the distance between the outlier and the concentrated base pattern. We demonstrated that the anomalies extracted using our method subjectively matched perceived irregularities in the pattern of movements. Our method is useful in surveillance services because the captured images can be shown in the order of anomality, which significantly reduces the time needed.

conference on multimedia modeling | 2016

Attribute Discovery for Person Re-Identification

Takayuki Umeda; Yongqing Sun; Go Irie; Kyoko Sudo; Tetsuya Kinebuchi

An incremental attribute discovery method for person re-identification is proposed in this paper. Recent studies have shown the effectiveness of the attribute-based approach. Unfortunately, the approach has difficulty in discriminating people who are similar in terms of the pre-defined semantic attributes. To solve this problem, we automatically discover and learn new attributes that permit successful discrimination through a pair-wise learning process. We evaluate our method on two benchmark datasets and demonstrate that it significantly improves the performance of the person re-identification task.

Multimedia Tools and Applications | 2016

Visual concept detection of web images based on group sparse ensemble learning

Yongqing Sun; Kyoko Sudo; Yukinobu Taniguchi

Due to the huge intra-class variations for visual concept detection, it is necessary for concept learning to collect large scale training data to cover a wide variety of samples as much as possible. But it presents great challenges on both how to collect and how to train the large scale data. In this paper, we propose a novel web image sampling approach and a novel group sparse ensemble learning approach to tackle these two challenging problems respectively. For data collection, in order to alleviate manual labeling efforts, we propose a web image sampling approach based on dictionary coherence to select coherent positive samples from web images. We propose to measure the coherence in terms of how dictionary atoms are shared because shared atoms represent common features with regard to a given concept and are robust to occlusion and corruption. For efficient training of large scale data, in order to exploit the hidden group structures of data, we propose a novel group sparse ensemble learning approach based on Automatic Group Sparse Coding (AutoGSC). After AutoGSC, we present an algorithm to use the reconstruction errors of data instances to calculate the ensemble gating function for ensemble construction and fusion. Experiments show that our proposed methods can achieve promising results and outperforms existing approaches.

international conference on machine vision | 2015

Egocentric articulated pose tracking for action recognition

Haruka Yonemoto; Kazuhiko Murasaki; Tatsuya Osawa; Kyoko Sudo; Jun Shimamura; Yukinobu Taniguchi

Many studies on action recognition from the third-person viewpoint have shown that articulated human pose can directly describe human motion and is invariant to view change. However, conventional algorithms that estimate articulated human pose cannot handle ego-centric images because they assume the whole figure appears in the image; only a few parts of the body appear in ego-centric images. In this paper, we propose a novel method to estimate human pose for action recognition from ego-centric RGB-D images. Our method can extract the pose by integrating hand detection, camera pose estimation, and time-series filtering with the constraint of body shape. Experiments show that joint positions are well estimated when the detection error of hands and arms decreases. We demonstrate that the accuracy of action recognition is improved by the feature of skeleton when the action contains unintended view changes.

international conference on pattern recognition | 2008

Monocular 3D tracking of multiple interacting targets

Tatsuya Osawa; Kyoko Sudo; Hiroyuki Arai; Hideki Koike

In this paper, we present a new approach based on Markov Chain Monte Carlo(MCMC) for the stable monocular tracking of variable interacting targets in 3D space. The crucial problem with monocular tracking multiple targets is that mutual occlusions on the 2D image cause target conflict (change ID, merge targetshellip). We focus on the fact that multiple targets cannot occupy the same position in 3D space and propose to track multiple interacting targets using relative position of targets in 3D space. Experiments show that our system can stably track multiple humans that are interacting with each other.

international conference on machine vision | 2015

The method based on view-directional consistency constraints for robust 3D object recognition

Jun Shimamura; Taiga Yoshida; Yukinobu Taniguchi; Hiroko Yabushita; Kyoko Sudo; Kazuhiko Murasaki

This paper proposes a novel geometric verification method to handle 3D viewpoint changes under cluttered scenes for robust object recognition. Since previous voting-based verification approaches, which enable recognition in cluttered scenes, are based on 2D affine transformation, verification accuracy is significantly degraded when viewpoint changes occur for 3D objects that abound in real-world scenes. The method based on view-directional consistency constraints requires that the angle in 3D between observed directions of all matched feature points on two given images must be consistent with the relative pose between the two cameras, whereas the conventional methods consider the consistency of the spatial layout in 2D of feature points in the image. To achieve this, we first embed observed 3D angle parameters into local features when extracting the features. At the verification stage after local feature matching, a voting-based approach identifies the clusters of matches that agree on relative camera pose in advance of full geometric verification. Experimental results demonstrating the superior performance of the proposed method are shown.

conference on multimedia modeling | 2015

Cross-Domain Concept Detection with Dictionary Coherence by Leveraging Web Images

Yongqing Sun; Kyoko Sudo; Yukinobu Taniguchi

We propose a novel scheme to address video concept learning by leveraging social media, one that includes the selection of web training data and the transfer of subspace learning within a unified framework. Due to the existence of cross-domain incoherence resulting from the mismatch of data distributions, how to select sufficient positive training samples from scattered and diffused social media resources is a challenging problem in the training of effective concept detectors. In this paper, given a concept, the coherent positive samples from web images for further concept learning are selected based on the degree of image coherence. Then, by exploiting both the selected dataset and video keyframes, we train a robust concept classifier by means of a transfer subspace learning method. Experiment results demonstrate that the proposed approach can achieve constant overall improvement despite cross-domain incoherence.

Explore More