Kota Yamaguchi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kota Yamaguchi is active.

Explore More

Publication

Featured researches published by Kota Yamaguchi.

computer vision and pattern recognition | 2012

Parsing clothing in fashion photographs

Kota Yamaguchi; M. Hadi Kiapour; Luis E. Ortiz; Tamara L. Berg

In this paper we demonstrate an effective method for parsing clothing in fashion photographs, an extremely challenging problem due to the large number of possible garment items, variations in configuration, garment appearance, layering, and occlusion. In addition, we provide a large novel dataset and tools for labeling garment items, to enable future research on clothing estimation. Finally, we present intriguing initial results on using clothing estimates to improve pose identification, and demonstrate a prototype application for pose-independent visual garment retrieval.

computer vision and pattern recognition | 2011

Who are you with and where are you going

Kota Yamaguchi; Alexander C. Berg; Luis E. Ortiz; Tamara L. Berg

We propose an agent-based behavioral model of pedestrians to improve tracking performance in realistic scenarios. In this model, we view pedestrians as decision-making agents who consider a plethora of personal, social, and environmental factors to decide where to go next. We formulate prediction of pedestrian behavior as an energy minimization on this model. Two of our main contributions are simple, yet effective estimates of pedestrian destination and social relationships (groups). Our final contribution is to incorporate these hidden properties into an energy formulation that results in accurate behavioral prediction. We evaluate both our estimates of destination and grouping, as well as our accuracy at prediction and tracking against state of the art behavioral model and show improvements, especially in the challenging observational situation of infrequent appearance observations–something that might occur in thousands of webcams available on the Internet.

international conference on computer vision | 2013

Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items

Kota Yamaguchi; M. Hadi Kiapour; Tamara L. Berg

Clothing recognition is an extremely challenging problem due to wide variation in clothing item appearance, layering, and style. In this paper, we tackle the clothing parsing problem using a retrieval based approach. For a query image, we find similar styles from a large database of tagged fashion images and use these examples to parse the query. Our approach combines parsing from: pre-trained global clothing models, local clothing models learned on the fly from retrieved examples, and transferred parse masks (paper doll item transfer) from retrieved examples. Experimental evaluation shows that our approach significantly outperforms state of the art in parsing accuracy.

computer vision and pattern recognition | 2012

Understanding and predicting importance in images

Alexander C. Berg; Tamara L. Berg; Hal Daumé; Jesse Dodge; Amit Goyal; Xufeng Han; Alyssa Mensch; Margaret Mitchell; Aneesh Sood; Karl Stratos; Kota Yamaguchi

What do people care about in an image? To drive computational visual recognition toward more human-centric outputs, we need a better understanding of how people perceive and judge the importance of content in images. In this paper, we explore how a number of factors relate to human perception of importance. Proposed factors fall into 3 broad types: 1) factors related to composition, e.g. size, location, 2) factors related to semantics, e.g. category of object or scene, and 3) contextual factors related to the likelihood of attribute-object, or object-scene pairs. We explore these factors using what people describe as a proxy for importance. Finally, we build models to predict what will be described about an image given either known image content, or image content estimated automatically by recognition systems.

european conference on computer vision | 2014

Hipster wars: Discovering elements of fashion styles

M. Hadi Kiapour; Kota Yamaguchi; Alexander C. Berg; Tamara L. Berg

The clothing we wear and our identities are closely tied, revealing to the world clues about our wealth, occupation, and socio-identity. In this paper we examine questions related to what our clothing reveals about our personal style. We first design an online competitive Style Rating Game called Hipster Wars to crowd source reliable human judgments of style. We use this game to collect a new dataset of clothing outfits with associated style ratings for 5 style categories: hipster, bohemian, pinup, preppy, and goth. Next, we train models for between-class and within-class classification of styles. Finally, we explore methods to identify clothing elements that are generally discriminative for a style, and methods for identifying items in a particular outfit that may indicate a style.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2015

Retrieving Similar Styles to Parse Clothing

Kota Yamaguchi; M. Hadi Kiapour; Luis E. Ortiz; Tamara L. Berg

Clothing recognition is a societally and commercially important yet extremely challenging problem due to large variations in clothing appearance, layering, style, and body shape and pose. In this paper, we tackle the clothing parsing problem using a retrieval-based approach. For a query image, we find similar styles from a large database of tagged fashion images and use these examples to recognize clothing items in the query. Our approach combines parsing from: pre-trained global clothing models, local clothing models learned on the fly from retrieved examples, and transferred parse-masks (Paper Doll item transfer) from retrieved examples. We evaluate our approach extensively and show significant improvements over previous state-of-the-art for both localization (clothing parsing given weak supervision in the form of tags) and detection (general clothing parsing). Our experimental results also indicate that the general pose estimation problem can benefit from clothing parsing.

acm multimedia | 2014

Chic or Social: Visual Popularity Analysis in Online Fashion Networks

Kota Yamaguchi; Tamara L. Berg; Luis E. Ortiz

From Flickr to Facebook to Pinterest, pictures are increasingly becoming a core content type in social networks. But, how important is this visual content and how does it influence behavior in the network? In this paper we study the effects of visual, textual, and social factors on popularity in a large real-world network focused on fashion. We make use of state of the art computer vision techniques for clothing representation, as well as network and text information to predict post popularity in both in-network and out-of-network scenarios. Our experiments find significant statistical evidence that social factors dominate the in-network scenario, but that combinations of content and social factors can be helpful for predicting popularity outside of the network. This in depth study of image popularity in social networks suggests that social factors should be carefully considered for research involving social network photos.

british machine vision conference | 2015

Mix and Match: Joint Model for Clothing and Attribute Recognition

Kota Yamaguchi; Takayuki Okatani; Kyoko Sudo; Kazuhiko Murasaki; Yukinobu Taniguchi

This paper studies clothing and attribute recognition in the fashion domain. Specifically, in this paper, we turn our attention to the compatibility of clothing items and attributes (Fig 1). For example, people do not wear a skirt and a dress at the same time, yet a jacket and a shirt are a preferred combination. We consider such inter-object or inter-attribute compatibility and formulate a Conditional Random Field (CRF) that seeks the most probable combination in the given picture. The model takes into account the location-specific appearance with respect to a human body and the semantic correlation between clothing items and attributes, which we learn using the max-margin framework. Fig 2 illustrates our pipeline. We evaluate our model using two datasets that resemble realistic applica- tion scenarios: on-line social networks and shopping sites. The empirical evaluation indicates that our model effectively improves the recognition performance over various baselines including the state-of-the-art feature designed exclusively for clothing recognition. The results also suggest that our model generalizes well to different fashion-related applications.

european conference on computer vision | 2016

Automatic Attribute Discovery with Neural Activations

Sirion Vittayakorn; Takayuki Umeda; Kazuhiko Murasaki; Kyoko Sudo; Takayuki Okatani; Kota Yamaguchi

How can a machine learn to recognize visual attributes emerging out of online community without a definitive supervised dataset? This paper proposes an automatic approach to discover and analyze visual attributes from a noisy collection of image-text data on the Web. Our approach is based on the relationship between attributes and neural activations in the deep network. We characterize the visual property of the attribute word as a divergence within weakly-annotated set of images. We show that the neural activations are useful for discovering and learning a classifier that well agrees with human perception from the noisy real-world Web data. The empirical study suggests the layered structure of the deep neural networks also gives us insights into the perceptual depth of the given word. Finally, we demonstrate that we can utilize highly-activating neurons for finding semantically relevant regions.

International Journal of Computer Vision | 2016

Large Scale Retrieval and Generation of Image Descriptions

Vicente Ordonez; Xufeng Han; Polina Kuznetsova; Girish Kulkarni; Margaret Mitchell; Kota Yamaguchi; Karl Stratos; Amit Goyal; Jesse Dodge; Alyssa Mensch; Hal Daumé; Alexander C. Berg; Yejin Choi; Tamara L. Berg

What is the story of an image? What is the relationship between pictures, language, and information we can extract using state of the art computational recognition systems? In an attempt to address both of these questions, we explore methods for retrieving and generating natural language descriptions for images. Ideally, we would like our generated textual descriptions (captions) to both sound like a person wrote them, and also remain true to the image content. To do this we develop data-driven approaches for image description generation, using retrieval-based techniques to gather either: (a) whole captions associated with a visually similar image, or (b) relevant bits of text (phrases) from a large collection of image + description pairs. In the case of (b), we develop optimization algorithms to merge the retrieved phrases into valid natural language sentences. The end result is two simple, but effective, methods for harnessing the power of big data to produce image captions that are altogether more general, relevant, and human-like than previous attempts.

Explore More