Is this you? Create Your Porfile

Yuanhao Chen

University of Science and Technology of China

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuanhao Chen is active.

Explore More

Publication

Featured researches published by Yuanhao Chen.

computer vision and pattern recognition | 2010

Latent hierarchical structural learning for object detection

Long Zhu; Yuanhao Chen; Alan L. Yuille; William T. Freeman

We present a latent hierarchical structural learning method for object detection. An object is represented by a mixture of hierarchical tree models where the nodes represent object parts. The nodes can move spatially to allow both local and global shape deformations. The models can be trained discriminatively using latent structural SVM learning, where the latent variables are the node positions and the mixture component. But current learning methods are slow, due to the large number of parameters and latent variables, and have been restricted to hierarchies with two layers. In this paper we describe an incremental concave-convex procedure (iCCCP) which allows us to learn both two and three layer models efficiently. We show that iCCCP leads to a simple training algorithm which avoids complex multi-stage layer-wise training, careful part selection, and achieves good performance without requiring elaborate initialization. We perform object detection using our learnt models and obtain performance comparable with state-of-the-art methods when evaluated on challenging public PASCAL datasets. We demonstrate the advantages of three layer hierarchies - outperforming Felzenszwalb et al.s two layer models on all 20 classes.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010

Learning a Hierarchical Deformable Template for Rapid Deformable Object Parsing

Long Leo Zhu; Yuanhao Chen; Alan L. Yuille

In this paper, we address the tasks of detecting, segmenting, parsing, and matching deformable objects. We use a novel probabilistic object model that we call a hierarchical deformable template (HDT). The HDT represents the object by state variables defined over a hierarchy (with typically five levels). The hierarchy is built recursively by composing elementary structures to form more complex structures. A probability distribution-a parameterized exponential model-is defined over the hierarchy to quantify the variability in shape and appearance of the object at multiple scales. To perform inference-to estimate the most probable states of the hierarchy for an input image-we use a bottom-up algorithm called compositional inference. This algorithm is an approximate version of dynamic programming where approximations are made (e.g., pruning) to ensure that the algorithm is fast while maintaining high performance. We adapt the structure-perceptron algorithm to estimate the parameters of the HDT in a discriminative manner (simultaneously estimating the appearance and shape parameters). More precisely, we specify an exponential distribution for the HDT using a dictionary of potentials, which capture the appearance and shape cues. This dictionary can be large and so does not require handcrafting the potentials. Instead, structure-perceptron assigns weights to the potentials so that less important potentials receive small weights (this is like a ?soft? form of feature selection). Finally, we provide experimental evaluation of HDTs on different visual tasks, including detection, segmentation, matching (alignment), and parsing. We show that HDTs achieve state-of-the-art performance for these different tasks when evaluated on data sets with groundtruth (and when compared to alternative algorithms, which are typically specialized to each task).

computer vision and pattern recognition | 2008

Max Margin AND/OR Graph learning for parsing the human body

Long Leo Zhu; Yuanhao Chen; Yifei Lu; Chenxi Lin; Alan L. Yuille

We present a novel structure learning method, Max Margin AND/OR graph (MM-AOG), for parsing the human body into parts and recovering their poses. Our method represents the human body and its parts by an AND/OR graph, which is a multi-level mixture of Markov random fields (MRFs). Max-margin learning, which is a generalization of the training algorithm for support vector machines (SVMs), is used to learn the parameters of the AND/OR graph model discriminatively. There are four advantages from this combination of AND/OR graphs and max-margin learning. Firstly, the AND/OR graph allows us to handle enormous articulated poses with a compact graphical model. Secondly, max-margin learning has more discriminative power than the traditional maximum likelihood approach. Thirdly, the parameters of the AND/OR graph model are optimized globally. In particular, the weights of the appearance model for individual nodes and the relative importance of spatial relationships between nodes are learnt simultaneously. Finally, the kernel trick can be used to handle high dimensional features and to enable complex similarity measure of shapes. We perform comparison experiments on the base ball datasets, showing significant improvements over state of the art methods.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012

Recursive segmentation and recognition templates for image parsing

Long Leo Zhu; Yuanhao Chen; Yuan Lin; Chenxi Lin; Alan L. Yuille

In this paper, we propose a Hierarchical Image Model (HIM) which parses images to perform segmentation and object recognition. The HIM represents the image recursively by segmentation and recognition templates at multiple levels of the hierarchy. This has advantages for representation, inference, and learning. First, the HIM has a coarse-to-fine representation which is capable of capturing long-range dependency and exploiting different levels of contextual information (similar to how natural language models represent sentence structure in terms of hierarchical representations such as verb and noun phrases). Second, the structure of the HIM allows us to design a rapid inference algorithm, based on dynamic programming, which yields the first polynomial time algorithm for image labeling. Third, we learn the HIM efficiently using machine learning methods from a labeled data set. We demonstrate that the HIM is comparable with the state-of-the-art methods by evaluation on the challenging public MSRC and PASCAL VOC 2007 image data sets.

International Journal of Computer Vision | 2011

Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation

Long Zhu; Yuanhao Chen; Chenxi Lin; Alan L. Yuille

In this paper we formulate a hierarchical configurable deformable template (HCDT) to model articulated visual objects—such as horses and baseball players—for tasks such as parsing, segmentation, and pose estimation. HCDTs represent an object by an AND/OR graph where the OR nodes act as switches which enables the graph topology to vary adaptively. This hierarchical representation is compositional and the node variables represent positions and properties of subparts of the object. The graph and the node variables are required to obey the summarization principle which enables an efficient compositional inference algorithm to rapidly estimate the state of the HCDT. We specify the structure of the AND/OR graph of the HCDT by hand and learn the model parameters discriminatively by extending Max-Margin learning to AND/OR graphs. We illustrate the three main aspects of HCDTs—representation, inference, and learning—on the tasks of segmenting, parsing, and pose (configuration) estimation for horses and humans. We demonstrate that the inference algorithm is fast and that max-margin learning is effective. We show that HCDTs gives state of the art results for segmentation and pose estimation when compared to other methods on benchmarked datasets.

computer vision and pattern recognition | 2008

Structure-perceptron learning of a hierarchical log-linear model

Long Zhu; Yuanhao Chen; Xingyao Ye; Alan L. Yuille

In this paper, we address the problems of deformable object matching (alignment) and segmentation with cluttered background. We propose a novel hierarchical log-linear model (HLLM) which represents both shape and appearance features at multiple levels of a hierarchy. This model enables us to combine appearance cues at multiple scales directly into the hierarchy and to model shape deformations at short-range, medium range, and long-range. We introduce the structure-perceptron algorithm to estimate the parameters of the HLLM in a discriminative way. The learning is able to estimate the appearance and shape parameters simultaneously in a global manner. Moreover, the structure-perceptron learning has a feature selection aspect (similar to AdaBoost) which enables us to specify a class of appearance/shape features and allow the algorithm to select which features to use and weight their importance. This method was applied to the tasks of deformable object localization, segmentation, matching (alignment), and parsing. We demonstrate that the algorithm achieves the state of the art performance by evaluation on public dataset (horse and multi-view face).

computer vision and pattern recognition | 2010

Part and appearance sharing: Recursive Compositional Models for multi-view

Multi-Object Detection; Long Zhu; Yuanhao Chen; Antonio Torralba; William T. Freeman; Alan L. Yuille

We propose Recursive Compositional Models (RCMs) for simultaneous multi-view multi-object detection and parsing (e.g. view estimation and determining the positions of the object subparts). We represent the set of objects by a family of RCMs where each RCM is a probability distribution defined over a hierarchical graph which corresponds to a specific object and viewpoint. An RCM is constructed from a hierarchy of subparts/subgraphs which are learnt from training data. Part-sharing is used so that different RCMs are encouraged to share subparts/subgraphs which yields a compact representation for the set of objects and which enables efficient inference and learning from a limited number of training samples. In addition, we use appearance-sharing so that RCMs for the same object, but different viewpoints, share similar appearance cues which also helps efficient learning. RCMs lead to a multi-view multi-object detection system. We illustrate RCMs on four public datasets and achieve state-of-the-art performance.

IEEE | 2010