Yu-Wei Chao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yu-Wei Chao is active.

Explore More

Publication

Featured researches published by Yu-Wei Chao.

computer vision and pattern recognition | 2013

Understanding Indoor Scenes Using 3D Geometric Phrases

Wongun Choi; Yu-Wei Chao; Caroline Pantofaru; Silvio Savarese

Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relationships between objects which frequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving individual object detections.

Pattern Recognition | 2013

Locality-sensitive dictionary learning for sparse representation based classification

Chia-Po Wei; Yu-Wei Chao; Yi-Ren Yeh; Yu-Chiang Frank Wang

Motivated by image reconstruction, sparse representation based classification (SRC) has been shown to be an effective method for applications like face recognition. In this paper, we propose a locality-sensitive dictionary learning algorithm for SRC, in which the designed dictionary is able to preserve local data structure, resulting in improved image classification. During the dictionary update and sparse coding stages in the proposed algorithm, we provide closed-form solutions and enforce the data locality constraint throughout the learning process. In contrast to previous dictionary learning approaches utilizing sparse representation techniques, which did not (or only partially) take data locality into consideration, our algorithm is able to produce a more representative dictionary and thus achieves better performance. We conduct experiments on databases designed for face and handwritten digit recognition. For such reconstruction-based classification problems, we will confirm that our proposed method results in better or comparable performance as state-of-the-art SRC methods do, while less training time for dictionary learning can be achieved.

computer vision and pattern recognition | 2012

Semantic structure from motion with points, regions, and objects

Sid Yingze Bao; Mohit Bagra; Yu-Wei Chao; Silvio Savarese

Structure from motion (SFM) aims at jointly recovering the structure of a scene as a collection of 3D points and estimating the camera poses from a number of input images. In this paper we generalize this concept: not only do we want to recover 3D points, but also recognize and estimate the location of high level semantic scene components such as regions and objects in 3D. As a key ingredient for this joint inference problem, we seek to model various types of interactions between scene components. Such interactions help regularize our solution and obtain more accurate results than solving these problems in isolation. Experiments on public datasets demonstrate that: 1) our framework estimates camera poses more robustly than SFM algorithms that use points only; 2) our framework is capable of accurately estimating pose and location of objects, regions, and points in the 3D scene; 3) our framework recognizes objects and regions more accurately than state-of-the-art single image recognition methods.

international conference on image processing | 2011

Locality-constrained group sparse representation for robust face recognition

Yu-Wei Chao; Yi-Ren Yeh; Yu-Wen Chen; Yuh-Jye Lee; Yu-Chiang Frank Wang

This paper presents a novel sparse representation for robust face recognition. We advance both group sparsity and data locality and formulate a unified optimization framework, which produces a locality and group sensitive sparse representation (LGSR) for improved recognition. Empirical results confirm that our LGSR not only outperforms state-of-the-art sparse coding based image classification methods, our approach is robust to variations such as lighting, pose, and facial details (glasses or not), which are typically seen in real-world face recognition problems.

european conference on computer vision | 2014

Discovering Groups of People in Images

Wongun Choi; Yu-Wei Chao; Caroline Pantofaru; Silvio Savarese

Understanding group activities from images is an important yet challenging task. This is because there is an exponentially large number of semantic and geometrical relationships among individuals that one must model in order to effectively recognize and localize the group activities. Rather than focusing on directly recognizing group activities as most of the previous works do, we advocate the importance of introducing an intermediate representation for modeling groups of humans which we call structure groups. Such groups define the way people spatially interact with each other. People might be facing each other to talk, while others sit on a bench side by side, and some might stand alone. In this paper we contribute a method for identifying and localizing these structured groups in a single image despite their varying viewpoints, number of participants, and occlusions. We propose to learn an ensemble of discriminative interaction patterns to encode the relationships between people in 3D and introduce a novel efficient iterative augmentation algorithm for solving this complex inference problem. A nice byproduct of the inference scheme is an approximate 3D layout estimate of the structured groups in the scene. Finally, we contribute an extremely challenging new dataset that contains images each showing multiple people performing multiple activities. Extensive evaluation confirms our theoretical findings.

computer vision and pattern recognition | 2015

Mining semantic affordances of visual object categories

Yu-Wei Chao; Zhan Wang; Rada Mihalcea; Jia Deng

Affordances are fundamental attributes of objects. Affordances reveal the functionalities of objects and the possible actions that can be performed on them. Understanding affordances is crucial for recognizing human activities in visual data and for robots to interact with the world. In this paper we introduce the new problem of mining the knowledge of semantic affordance: given an object, determining whether an action can be performed on it. This is equivalent to connecting verb nodes and noun nodes in WordNet, or filling an affordance matrix encoding the plausibility of each action-object pair. We introduce a new benchmark with crowdsourced ground truth affordances on 20 PASCAL VOC object classes and 957 action classes. We explore a number of approaches including text mining, visual mining, and collaborative filtering. Our analyses yield a number of significant insights that reveal the most effective ways of collecting knowledge of semantic affordances.

computer vision and pattern recognition | 2017

Forecasting Human Dynamics from Static Images

Yu-Wei Chao; Jimei Yang; Brian L. Price; Scott D. Cohen; Jia Deng

This paper presents the first study on forecasting human dynamics from static images. The problem is to input a single RGB image and generate a sequence of upcoming human body poses in 3D. To address the problem, we propose the 3D Pose Forecasting Network (3D-PFNet). Our 3D-PFNet integrates recent advances on single-image human pose estimation and sequence prediction, and converts the 2D predictions into 3D space. We train our 3D-PFNet using a three-step training strategy to leverage a diverse source of training data, including image and video based human pose datasets and 3D motion capture (MoCap) data. We demonstrate competitive performance of our 3D-PFNet on 2D pose forecasting and 3D structure recovery through quantitative and qualitative results.

international conference on image analysis and processing | 2013

Layout Estimation of Highly Cluttered Indoor Scenes Using Geometric and Semantic Cues

Yu-Wei Chao; Wongun Choi; Caroline Pantofaru; Silvio Savarese

Recovering the spatial layout of cluttered indoor scenes is a challenging problem. Current methods generate layout hypotheses from vanishing point estimates produced using 2D image features. This method fails in highly cluttered scenes in which most of the image features come from clutter instead of the rooms geometric structure. In this paper, we propose to use human detections as cues to more accurately estimate the vanishing points. Our method is built on top of the fact that people are often the focus of indoor scenes, and that the scene and the people within the scene should have consistent geometric configurations in 3D space. We contribute a new data set of highly cluttered indoor scenes contain- ing people, on which we provide baselines and evaluate our method. This evaluation shows that our approach improves 3D interpretation of scenes.

International Journal of Computer Vision | 2015

Indoor Scene Understanding with Geometric and Semantic Contexts

Wongun Choi; Yu-Wei Chao; Caroline Pantofaru; Silvio Savarese

Truly understanding a scene involves integrating information at multiple levels as well as studying the interactions between scene elements. Individual object detectors, layout estimators and scene classifiers are powerful but ultimately confounded by complicated real-world scenes with high variability, different viewpoints and occlusions. We propose a method that can automatically learn the interactions among scene elements and apply them to the holistic understanding of indoor scenes from a single image. This interpretation is performed within a hierarchical interaction model which describes an image by a parse graph, thereby fusing together object detection, layout estimation and scene classification. At the root of the parse graph is the scene type and layout while the leaves are the individual detections of objects. In between is the core of the system, our 3D Geometric Phrases (3DGP). We conduct extensive experimental evaluations on single image 3D scene understanding using both 2D and 3D metrics. The results demonstrate that our model with 3DGPs can provide robust estimation of scene type, 3D space, and 3D objects by leveraging the contextual relationships among the visual elements.

international conference on computer vision | 2015