Xinggang Wang
Huazhong University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xinggang Wang.
computer vision and pattern recognition | 2015
Wei Shen; Xinggang Wang; Yan Wang; Xiang Bai; Zhijiang Zhang
Contour detection serves as the basis of a variety of computer vision tasks such as image segmentation and object recognition. The mainstream works to address this problem focus on designing engineered gradient features. In this work, we show that contour detection accuracy can be improved by instead making the use of the deep features learned from convolutional neural networks (CNNs). While rather than using the networks as a blackbox feature extractor, we customize the training strategy by partitioning contour (positive) data into subclasses and fitting each subclass by different model parameters. A new loss function, named positive-sharing loss, in which each subclass shares the loss for the whole positive class, is proposed to learn the parameters. Compared to the sofmax loss function, the proposed one, introduces an extra regularizer to emphasizes the losses for the positive and negative classes, which facilitates to explore more discriminative features. Our experimental results demonstrate that learned deep features can achieve top performance on Berkeley Segmentation Dataset and Benchmark (BSDS500) and obtain competitive cross dataset generalization result on the NYUD dataset.
Pattern Recognition | 2014
Xinggang Wang; Bin Feng; Xiang Bai; Wenyu Liu; Longin Jan Latecki
Shape representation is a fundamental problem in computer vision. Current approaches to shape representation mainly focus on designing low-level shape descriptors which are robust to rotation, scaling and deformation of shapes. In this paper, we focus on mid-level modeling of shape representation. We develop a new shape representation called Bag of Contour Fragments (BCF) inspired by classical Bag of Words (BoW) model. In BCF, a shape is decomposed into contour fragments each of which is then individually described using a shape descriptor, e.g., the Shape Context descriptor, and encoded into a shape code. Finally, a compact shape representation is built by pooling shape codes in the shape. Shape classification with BCF only requires an efficient linear SVM classifier. In our experiments, we fully study the characteristics of BCF, show that BCF achieves the state-of-the-art performance on several well-known shape benchmarks, and can be applied to real image classification problem. HighlightsA new shape representation is proposed by encoding contour fragments in shape.The proposed shape representation is compact yet informative.The proposed shape representation is robust to shape deformation and conclusion.We obtain the state-of-the-art shape classification performance on several bench-mark datasets.
international conference on computer vision | 2009
Xiang Bai; Xinggang Wang; Longin Jan Latecki; Wenyu Liu; Zhuowen Tu
We present a shape-based algorithm for detecting and recognizing non-rigid objects from natural images. The existing literature in this domain often cannot model the objects very well. In this paper, we use the skeleton (medial axis) information to capture the main structure of an object, which has the particular advantage in modeling articulation and non-rigid deformation. Given a set of training samples, a tree-union structure is learned on the extracted skeletons to model the variation in configuration. Each branch on the skeleton is associated with a few part-based templates, modeling the object boundary information. We then apply sum-and-max algorithm to perform rapid object detection by matching the skeleton-based active template to the edge map extracted from a test image. The algorithm reports the detection result by a composition of the local maximum responses. Compared with the alternatives on this topic, our algorithm requires less training samples. It is simple, yet efficient and effective. We show encouraging results on two widely used benchmark image sets: the Weizmann horse dataset [7] and the ETHZ dataset [16].
computer vision and pattern recognition | 2011
Xinggang Wang; Xiang Bai; Wenyu Liu; Longin Jan Latecki
In this paper, we presents a new method to encode the spatial information of local image features, which is a natural extension of Shape Context (SC), so we call it Feature Context (FC). Given a position in a image, SC computes histogram of other points belonging to the target binary shape based on their distances and angles to the position. The value of each histogram bin of SC is the number of the shape points in the region assigned to the bin. Thus, SC requires knowing the location of the points of the target shape. In other words, an image point can have only two labels, it belongs to the shape or not. In contrast, FC can be applied to the whole image without knowing the location of the target shape in the image. Each image point can have multiple labels depending on its local features. The value of each histogram bin of FC is a histogram of various features assigned to points in the bin region. We also introduce an efficient coding method to encode the local image features, call Radial Basis Coding (RBC). Combining RBC and FC together, and using a linear SVM classifier, our method is suitable for both image classification and object detection.
IEEE Transactions on Image Processing | 2014
Xiang Bai; Cong Rao; Xinggang Wang
In this paper, a learning-based shape descriptor for shape matching is demonstrated. Formulated in a bag-of-words like framework, the proposed method summarizes the local features extracted from certain shape to generate a integrated representation. It contributes to the speed-up of shape matching, since the distance metric in the vector space analysis can be directly applied to compare the constructed global descriptors, eliminating the time consuming stage of local feature matching. Similar to the philosophy in spatial pyramid matching, a strategy for feature division is applied in the phase of encoded feature pooling and vocabulary learning, which helps to construct a more discriminative descriptor incorporating both global and local information. Also, a local contour-based feature extraction method is designed for 2D shapes, while significant properties of the local contours are inspected for the design of feature division rules. The designed local feature extraction method and the feature division rules manage to reduce the variances of shape representation due to the changes in rotation. In addition to 2D shape, we also present a simple and natural method to extend the proposed method to the scenario of 3D shape representation. The proposed shape descriptor is validated on several benchmark data sets for evaluating 2D and 3D shape matching algorithms, and it is observed that the investigated shape descriptor maintains superior discriminative power as well as high time efficiency.
european conference on computer vision | 2010
Xiang Bai; Bo Wang; Xinggang Wang; Wenyu Liu; Zhuowen Tu
In this paper, we propose a new shape/object retrieval algorithm, co-transduction. The performance of a retrieval system is critically decided by the accuracy of adopted similarity measures (distances or metrics). Different types of measures may focus on different aspects of the objects: e.g. measures computed based on contours and skeletons are often complementary to each other. Our goal is to develop an algorithm to fuse different similarity measures for robust shape retrieval through a semi-supervised learning framework. We name our method co-transduction which is inspired by the co-training algorithm [1]. Given two similarity measures and a query shape, the algorithm iteratively retrieves the most similar shapes using one measure and assigns them to a pool for the other measure to do a re-ranking, and vice-versa. Using co-transduction, we achieved a significantly improved result of 97.72% on the MPEG-7 dataset [2] over the state-of-the-art performances (91% in [3], 93.4% in [4]). Our algorithm is general and it works directly on any given similarity measures/metrics; it is not limited to object shape retrieval and can be applied to other tasks for ranking/retrieval.
Neurocomputing | 2016
Zhuotun Zhu; Xinggang Wang; Song Bai; Cong Yao; Xiang Bai
We study the problem of how to build a deep learning representation for 3D shape. Deep learning has shown to be very effective in variety of visual applications, such as image classification and object detection. However, it has not been successfully applied to 3D shape recognition. This is because 3D shape has complex structure in 3D space and there are limited number of 3D shapes for feature learning. To address these problems, we project 3D shapes into 2D space and use autoencoder for feature learning on the 2D images. High accuracy 3D shape retrieval performance is obtained by aggregating the features learned on 2D images. In addition, we show the proposed deep learning feature is complementary to conventional local image descriptors. By combing the global deep learning representation and the local descriptor representation, our method can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.
computer vision and pattern recognition | 2012
Xinggang Wang; Xiang Bai; Tianyang Ma; Wenyu Liu; Longin Jan Latecki
We propose a novel shape model for object detection called Fan Shape Model (FSM). We model contour sample points as rays of final length emanating for a reference point. As in folding fan, its slats, which we call rays, are very flexible. This flexibility allows FSM to tolerate large shape variance. However, the order and the adjacency relation of the slats stay invariant during fan deformation, since the slats are connected with a thin fabric. In analogy, we enforce the order and adjacency relation of the rays to stay invariant during the deformation. Therefore, FSM preserves discriminative power while allowing for a substantial shape deformation. FSM allows also for precise scale estimation during object detection. Thus, there is not need to scale the shape model or image in order to perform object detection. Another advantage of FSM is the fact that it can be applied directly to edge images, since it does not require any linking of edge pixels to edge fragments (contours).
european conference on computer vision | 2014
Xiaojie Guo; Xinggang Wang; Liang Yang; Xiaochun Cao; Yi Ma
Foreground detection plays a core role in a wide spectrum of applications such as tracking and behavior analysis. It, especially for videos captured by fixed cameras, can be posed as a component decomposition problem, the background of which is typically assumed to lie in a low dimensional subspace. However, in real world cases, dynamic backgrounds like waving trees and water ripples violate the assumption. Besides, noises caused by the image capturing process and, camouflage and lingering foreground objects would also significantly increase the difficulty of accurate foreground detection. That is to say, simply imposing the correlation constraint on the background is no longer sufficient for such cases. To overcome the difficulties mentioned above, this paper proposes to further take into account foreground characteristics including 1) the smoothness: the foreground object should appear coherently in spatial domain and move smoothly in temporal, and 2) the arbitrariness: the appearance of foreground could be with arbitrary colors or intensities. With the consideration of the smoothness and the arbitrariness of foreground as well as the correlation of (static) background, we formulate the problem in a unified framework from a probabilistic perspective, and design an effective algorithm to seek the optimal solution. Experimental results on both synthetic and real data demonstrate the clear advantages of our method compared to the state of the art alternatives.
Neurocomputing | 2016
Yingying Zhu; Chengquan Zhang; Duoyou Zhou; Xinggang Wang; Xiang Bai; Wenyu Liu
Detecting and recognizing traffic signs is a hot topic in the field of computer vision with lots of applications, e.g., safe driving, path planning, robot navigation etc. We propose a novel framework with two deep learning components including fully convolutional network (FCN) guided traffic sign proposals and deep convolutional neural network (CNN) for object classification. Our core idea is to use CNN to classify traffic sign proposals to perform fast and accurate traffic sign detection and recognition. Due to the complexity of the traffic scene, we improve the state-of-the-art object proposal method, EdgeBox, by incorporating with a trained FCN. The FCN guided object proposals can produce more discriminative candidates, which help to make the whole detection system fast and accurate. In the experiments, we have evaluated the proposed method on publicly available traffic sign benchmark, Swedish Traffic Signs Dataset (STSD), and achieved the state-of-the-art results.