Is this you? Create Your Porfile

Xiaobai Liu

Huazhong University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaobai Liu is active.

Explore More

Publication

Featured researches published by Xiaobai Liu.

computer vision and pattern recognition | 2014

The Role of Context for Object Detection and Semantic Segmentation in the Wild

Roozbeh Mottaghi; Xianjie Chen; Xiaobai Liu; Nam-Gyu Cho; Seong Whan Lee; Sanja Fidler; Raquel Urtasun; Alan L. Yuille

In this paper we study the role of context in existing state-of-the-art detection and segmentation approaches. Towards this goal, we label every pixel of PASCAL VOC 2010 detection challenge with a semantic category. We believe this data will provide plenty of challenges to the community, as it contains 520 additional classes for semantic segmentation and object detection. Our analysis shows that nearest neighbor based approaches perform poorly on semantic segmentation of contextual classes, showing the variability of PASCAL imagery. Furthermore, improvements of existing contextual models for detection is rather modest. In order to push forward the performance in this difficult scenario, we propose a novel deformable part-based model, which exploits both local context around each candidate detection as well as global context at the level of the scene. We show that this contextual reasoning significantly helps in detecting objects at all scales.

acm multimedia | 2009

Label to region by bi-layer sparsity priors

Xiaobai Liu; Bin Cheng; Shuicheng Yan; Jinhui Tang; Tat-Seng Chua; Hai Jin

In this work, we investigate how to automatically reassign the manually annotated labels at the image-level to those contextually derived semantic regions. First, we propose a bi-layer sparse coding formulation for uncovering how an image or semantic region can be robustly reconstructed from the over-segmented image patches of an image set. We then harness it for the automatic label to region assignment of the entire image set. The solution to bi-layer sparse coding is achieved by convex l1-norm minimization. The underlying philosophy of bi-layer sparse coding is that an image or semantic region can be sparsely reconstructed via the atomic image patches belonging to the images with common labels, while the robustness in label propagation requires that these selected atomic patches come from very few images. Each layer of sparse coding produces the image label assignment to those selected atomic patches and merged candidate regions based on the shared image labels. The results from all bi-layer sparse codings over all candidate regions are then fused to obtain the entire label to region assignments. Besides, the presenting bi-layer sparse coding framework can be naturally applied to perform image annotation on new test images. Extensive experiments on three public image datasets clearly demonstrate the effectiveness of our proposed framework in both label to region assignment and image annotation tasks.

IEEE Transactions on Image Processing | 2010

Projective Nonnegative Graph Embedding

Xiaobai Liu; Shuicheng Yan; Hai Jin

We present in this paper a general formulation for nonnegative data factorization, called projective nonnegative graph embedding (PNGE), which 1) explicitly decomposes the data into two nonnegative components favoring the characteristics encoded by the so-called intrinsic and penalty graphs , respectively, and 2) explicitly describes how to transform each new testing sample into its low-dimensional nonnegative representation. In the past, such a nonnegative decomposition was often obtained for the training samples only, e.g., nonnegative matrix factorization (NMF) and its variants, nonnegative graph embedding (NGE) and its refined version multiplicative nonnegative graph embedding (MNGE). Those conventional approaches for out-of-sample extension either suffer from the high computational cost or violate the basic nonnegative assumption. In this work, PNGE offers a unified solution to out-of-sample extension problem, and the nonnegative coefficient vector of each datum is assumed to be projected from its original feature representation with a universal nonnegative transformation matrix. A convergency provable multiplicative nonnegative updating rule is then derived to learn the basis matrix and transformation matrix. Extensive experiments compared with the state-of-the-art algorithms on nonnegative data factorization demonstrate the algorithmic properties in convergency, sparsity, and classification power.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010

Layered Graph Matching with Composite Cluster Sampling

Liang Lin; Xiaobai Liu; Song-Chun Zhu

This paper presents a framework of layered graph matching for integrating graph partition and matching. The objective is to find an unknown number of corresponding graph structures in two images. We extract discriminative local primitives from both images and construct a candidacy graph whose vertices are matching candidates (i.e., a pair of primitives) and whose edges are either negative for mutual exclusion or positive for mutual consistence. Then we pose layered graph matching as a multicoloring problem on the candidacy graph and solve it using a composite cluster sampling algorithm. This algorithm assigns some vertices into a number of colors, each being a matched layer, and turns off all the remaining candidates. The algorithm iterates two steps: 1) Sampling the positive and negative edges probabilistically to form a composite cluster, which consists of a few mutually conflicting connected components (CCPs) in different colors and 2) assigning new colors to these CCPs with consistence and exclusion relations maintained, and the assignments are accepted by the Markov Chain Monte Carlo (MCMC) mechanism to preserve detailed balance. This framework demonstrates state-of-the-art performance on several applications, such as multi-object matching with large motion, shape matching and retrieval, and object localization in cluttered background.

international conference on computer vision | 2013

Human Re-identification by Matching Compositional Template with Cluster Sampling

Yuanlu Xu; Liang Lin; Wei-Shi Zheng; Xiaobai Liu

This paper aims at a newly raising task in visual surveillance: re-identifying people at a distance by matching body information, given several reference examples. Most of existing works solve this task by matching a reference template with the target individual, but often suffer from large human appearance variability (e.g. different poses/views, illumination) and high false positives in matching caused by conjunctions, occlusions or surrounding clutters. Addressing these problems, we construct a simple yet expressive template from a few reference images of a certain individual, which represents the body as an articulated assembly of compositional and alternative parts, and propose an effective matching algorithm with cluster sampling. This algorithm is designed within a candidacy graph whose vertices are matching candidates (i.e. a pair of source and target body parts), and iterates in two steps for convergence. (i) It generates possible partial matches based on compatible and competitive relations among body parts. (ii) It confirms the partial matches to generate a new matching solution, which is accepted by the Markov Chain Monte Carlo (MCMC) mechanism. In the experiments, we demonstrate the superior performance of our approach on three public databases compared to existing methods.

computer vision and pattern recognition | 2014

Single-View 3D Scene Parsing by Attributed Grammar

Xiaobai Liu; Yibiao Zhao; Song-Chun Zhu

In this paper, we present an attributed grammar for parsing man-made outdoor scenes into semantic surfaces, and recovering its 3D model simultaneously. The grammar takes superpixels as its terminal nodes and use five production rules to generate the scene into a hierarchical parse graph. Each graph node actually correlates with a surface or a composite of surfaces in the 3D world or the 2D image. They are described by attributes for the global scene model, e.g. focal length, vanishing points, or the surface properties, e.g. surface normal, contact line with other surfaces, and relative spatial location etc. Each production rule is associated with some equations that constraint the attributes of the parent nodes and those of their children nodes. Given an input image, our goal is to construct a hierarchical parse graph by recursively applying the five grammar rules while preserving the attributes constraints. We develop an effective top-down/bottom-up cluster sampling procedure which can explore this constrained space efficiently. We evaluate our method on both public benchmarks and newly built datasets, and achieve state-of-the-art performances in terms of layout estimation and region segmentation. We also demonstrate that our method is able to recover detailed 3D model with relaxed Manhattan structures which clearly advances the state-of-the-arts of single-view 3D reconstruction.

IEEE Transactions on Circuits and Systems for Video Technology | 2011

Adaptive Object Tracking by Learning Hybrid Template Online

Xiaobai Liu; Liang Lin; Shuicheng Yan; Hai Jin; Wenbin Jiang

This paper presents an adaptive tracking algorithm by learning hybrid object templates online in video. The templates consist of multiple types of features, each of which describes one specific appearance structure, such as flatness, texture, or edge/corner. Our proposed solution consists of three aspects. First, in order to make the features of different types comparable with each other, a unified statistical measure is defined to select the most informative features to construct the hybrid template. Second, we propose a simple yet powerful generative model for representing objects. This model is characterized by its simplicity since it could be efficiently learnt from the currently observed frames. Last, we present an iterative procedure to learn the object template from the currently observed frames, and to locate every feature of the object template within the observed frames. The former step is referred to as feature pursuit, and the latter step is referred to as feature alignment, both of which are performed over a batch of observations. We fuse the results of feature alignment to locate objects within frames. The proposed solution to object tracking is in essence robust against various challenges, including background clutters, low-resolution, scale changes, and severe occlusions. Extensive experiments are conducted over several publicly available databases and the results with comparisons show that our tracking algorithm clearly outperforms the state-of-the-art methods.

IEEE Transactions on Circuits and Systems for Video Technology | 2011

Integrating Spatio-Temporal Context With Multiview Representation for Object Recognition in Visual Surveillance

Xiaobai Liu; Liang Lin; Shuicheng Yan; Hai Jin; Wenbing Tao

We present in this paper an integrated solution to rapidly recognizing dynamic objects in surveillance videos by exploring various contextual information. This solution consists of three components. The first one is a multi-view object representation. It contains a set of deformable object templates, each of which comprises an ensemble of active features for an object category in a specific view/pose. The template can be efficiently learned via a small set of roughly aligned positive samples without negative samples. The second component is a unified spatio-temporal context model, which integrates two types of contextual information in a Bayesian way. One is the spatial context, including main surface property (constraints on object type and density) and camera geometric parameters (constraints on object size at a specific location). The other is the temporal context, containing the pixel-level and instance-level consistency models, used to generate the foreground probability map and local object trajectory prediction. We also combine the above spatial and temporal contextual information to estimate the object pose in scene and use it as a strong prior for inference. The third component is a robust sampling-based inference procedure. Taking the spatio-temporal contextual knowledge as the prior model and deformable template matching as the likelihood model, we formulate the problem of object category recognition as a maximum-a-posteriori problem. The probabilistic inference can be achieved by a simple Markov chain Mento Carlo sampler, owing to the informative spatio-temporal context model which is able to greatly reduce the computation complexity and the category ambiguities. The system performance and benefit gain from the spatio-temporal contextual information are quantitatively evaluated on several challenging datasets and the comparison results clearly demonstrate that our proposed algorithm outperforms other state-of-the-art algorithms.

international conference on data mining | 2009

Unified Solution to Nonnegative Data Factorization Problems

Xiaobai Liu; Shuicheng Yan; Jun Yan; Hai Jin

In this paper, we restudy the non-convex data factorization problems (regularized or not, unsupervised or supervised), where the optimization is confined in the \emph{nonnegative} orthant, and provide a \emph{unified} convergency provable solution based on multiplicative nonnegative update rules. This solution is general for optimization problems with block-wisely quadratic objective functions, and thus direct update rules can be derived by skipping over the tedious specific procedure deduction process and algorithmic convergence proof. By taking this unified solution as a general template, we i) re-explain several existing nonnegative data factorization algorithms, ii) develop a variant of nonnegative matrix factorization formulation for handling out-of-sample data, and iii) propose a new nonnegative data factorization algorithm, called Correlated Co-Decomposition (CCD), to simultaneously factorize two feature spaces by exploring the inter-correlated information. Experiments on both face recognition and multi-label image annotation tasks demonstrate the wide applicability of the unified solution as well as the effectiveness of two proposed new algorithms.

computer vision and pattern recognition | 2010

Nonparametric Label-to-Region by search

Xiaobai Liu; Shuicheng Yan; Jiebo Luo; Jinhui Tang; Zhongyang Huango; Hai Jin

In this work, we investigate how to propagate annotated labels for a given single image from the image-level to their corresponding semantic regions, namely Label-to-Region (L2R), by utilizing the auxiliary knowledge from Internet image search with the annotated image labels as queries. A nonparametric solution is proposed to perform L2R for single image with complete labels. First, each label of the image is used as query for online image search engines to obtain a set of semantically related and visually similar images, which along with the input image are encoded as Bags-of-Hierarchical-Patches. Then, an efficient two-stage feature mining procedure is presented to discover those input-image specific, salient and descriptive features for each label from the proposed Interpolation SIFT (iSIFT) feature pool. These features consequently constitute a patch-level representation, and the continuity-biased sparse coding is proposed to select few patches from the online images with preference to larger patches to reconstruct a candidate region, which randomly merges the spatially connected patches of the input image. Such candidate regions are further ranked according to the reconstruction errors, and the top regions are used to derive the label confidence vector for each patch of the input image. Finally, a patch clustering procedure is performed as postprocessing to finalize L2R for the input image. Extensive experiments on three public databases demonstrate the encouraging performance of the proposed nonparametric L2R solution.

Explore More