Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xinchao Wang is active.

Publication


Featured researches published by Xinchao Wang.


IEEE Transactions on Image Processing | 2011

Subspaces Indexing Model on Grassmann Manifold for Image Search

Xinchao Wang; Zhu Li; Dacheng Tao

Conventional linear subspace learning methods like principal component analysis (PCA), linear discriminant analysis (LDA) derive subspaces from the whole data set. These approaches have limitations in the sense that they are linear while the data distribution we are trying to model is typically nonlinear. Moreover, these algorithms fail to incorporate local variations of the intrinsic sample distribution manifold. Therefore, these algorithms are ineffective when applied on large scale datasets. Kernel versions of these approaches can alleviate the problem to certain degree but face a serious computational challenge when data set is large, where the computing involves Eigen/QP problems of size N × N. When N is large, kernel versions are not computationally practical. To tackle the aforementioned problems and improve recognition/searching performance, especially on large scale image datasets, we propose a novel local subspace indexing model for image search termed Subspace Indexing Model on Grassmann Manifold (SIM-GM). SIM-GM partitions the global space into local patches with a hierarchical structure; the global model is, therefore, approximated by piece-wise linear local subspace models. By further applying the Grassmann manifold distance, SIM-GM is able to organize localized models into a hierarchy of indexed structure, and allow fast query selection of the optimal ones for classification. Our proposed SIM-GM enjoys a number of merits: 1) it is able to deal with a large number of training samples efficiently; 2) it is a query-driven approach, i.e., it is able to return an effective local space model, so the recognition performance could be significantly improved; 3) it is a common framework, which can incorporate many learning algorithms. Theoretical analysis and extensive experimental results confirm the validity of this model.


computer vision and pattern recognition | 2017

NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results

Radu Timofte; Eirikur Agustsson; Luc Van Gool; Ming-Hsuan Yang; Lei Zhang; Bee Lim; Sanghyun Son; Heewon Kim; Seungjun Nah; Kyoung Mu Lee; Xintao Wang; Yapeng Tian; Ke Yu; Yulun Zhang; Shixiang Wu; Chao Dong; Liang Lin; Yu Qiao; Chen Change Loy; Woong Bae; Jaejun Yoo; Yoseob Han; Jong Chul Ye; Jae Seok Choi; Munchurl Kim; Yuchen Fan; Jiahui Yu; Wei Han; Ding Liu; Haichao Yu

This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results. A new DIVerse 2K resolution image dataset (DIV2K) was employed. The challenge had 6 competitions divided into 2 tracks with 3 magnification factors each. Track 1 employed the standard bicubic downscaling setup, while Track 2 had unknown downscaling operators (blur kernel and decimation) but learnable through low and high res train images. Each competition had ∽100 registered participants and 20 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution.


IEEE Transactions on Image Processing | 2013

Grassmannian Regularized Structured Multi-View Embedding for Image Classification

Xinchao Wang; Wei Bian; Dacheng Tao

Images are usually represented by features from multiple views, e.g., color and texture. In image classification, the goal is to fuse all the multi-view features in a reasonable manner and achieve satisfactory classification performance. However, the features are often different in nature and it is nontrivial to fuse them. Particularly, some extracted features are redundant or noisy and are consequently not discriminative for classification. To alleviate these problems in an image classification context, we propose in this paper a novel multi-view embedding framework, termed as Grassmannian regularized structured multi-view embedding, or GrassReg for short. GrassReg transfers the graph Laplacian obtained from each view to a point on the Grassmann manifold and penalizes the disagreement between different views according to Grassmannian distance. Therefore, a view that is consistent with others is more important than a view that disagrees with others for learning a unified subspace for multi-view data representation. In addition, we impose the group sparsity penalty onto the low-dimensional embeddings obtained hence they can better explore the group structure of the intrinsic data distribution. Empirically, we compare GrassReg with representative multi-view algorithms and show the effectiveness of GrassReg on a number of multi-view image data sets.


Signal Processing | 2010

Entropy controlled Laplacian regularization for least square regression

Xinchao Wang; Dacheng Tao; Zhu Li

Least square regression (LSR) is popular in pattern classification. Compared against other matrix factorization based methods, it is simple yet efficient. However, LSR ignores unlabeled samples in the training stage, so the regression error could be large when the labeled samples are insufficient. To solve this problem, the Laplacian regularization can be used to penalize LSR. Extensive theoretical and experimental results have confirmed the validity of Laplacian regularized least square (LapRLS). However, multiple hyper-parameters have been introduced to estimate the intrinsic manifold induced by the regularization, and thus the time consuming cross-validation should be applied to tune these parameters. To alleviate this problem, we assume the intrinsic manifold is a linear combination of a given set of known manifolds. By further assuming the priors of the given manifolds are equivalent, we introduce the entropy maximization penalty to automatically learn the linear combination coefficients. The entropy maximization trades the smoothness off the complexity. Therefore, the proposed model enjoys the following advantages: (1) it is able to incorporate both labeled and unlabeled data into training process, (2) it is able to learn the manifold hyper-parameters automatically, and (3) it approximates the true probability distribution with respect to prescribed test data. To test the classification performance of our proposed model, we apply the model on three well-known human face datasets, i.e. FERET, ORL, and YALE. Experimental results on these three face datasets suggest the effectiveness and the efficiency of the new model compared against the traditional LSR and the Laplacian regularized least squares.


computer vision and pattern recognition | 2017

On Compressing Deep Models by Low Rank and Sparse Decomposition

Xiyu Yu; Tongliang Liu; Xinchao Wang; Dacheng Tao

Deep compression refers to removing the redundancy of parameters and feature maps for deep learning models. Low-rank approximation and pruning for sparse structures play a vital role in many compression works. However, weight filters tend to be both low-rank and sparse. Neglecting either part of these structure information in previous methods results in iteratively retraining, compromising accuracy, and low compression rates. Here we propose a unified framework integrating the low-rank and sparse decomposition of weight matrices with the feature map reconstructions. Our model includes methods like pruning connections as special cases, and is optimized by a fast SVD-free algorithm. It has been theoretically proven that, with a small sample, due to its generalizability, our model can well reconstruct the feature maps on both training and test data, which results in less compromising accuracy prior to the subsequent retraining. With such a warm start to retrain, the compression method always possesses several merits: (a) higher compression rates, (b) little loss of accuracy, and (c) fewer rounds to compress deep models. The experimental results on several popular models such as AlexNet, VGG-16, and GoogLeNet show that our model can significantly reduce the parameters for both convolutional and fully-connected layers. As a result, our model reduces the size of VGG-16 by 15×, better than other recent compression methods that use a single strategy.


international conference on multimedia and expo | 2011

Grassmann Hashing for approximate nearest neighbor search in high dimensional space

Xinchao Wang; Zhu Li; Lei Zhang; Junsong Yuan

Locality-Sensitive Hashing (LSH) approximates nearest neighbors in high dimensions by projecting original data into low-dimensional subspaces. The basic idea is to hash data samples to ensure that the probability of collision is much higher for samples that are close to each other than for those that are far apart. However, by applying k random hashing functions on original data, LSH fails to find the most discriminant hashing-subspaces, so the nearest neighbor approximation is inefficient. To alleviate this problem, we propose the Grassmann Hashing (GRASH) for approximate nearest neighbor search in high dimensions. GRASH first introduces a set of subspace candidates from Linear Discriminant Analysis (LDA). Then it applies Grassmann metric to select the optimal subspaces for hashing. Finally, it generates hashing codes based on non-uniform bucket size design motivated by Lloyd-Max quantization. The proposed GRASH model enjoys a number of merits: 1) GRASH introduces the Grassmann metric to measure the similarity between different hashing subspaces, so the hashing function can better capture the data diversity; 2) GRASH obtains the subspace candidates from LDA, so it incorporates the discriminant information into the hashing functions; 3) GRASH extends LSHs 1-d hashing subspaces to m-d, i.e. it is a multidimensional extension of hashing approximation; 4) motivated by Lloyd-Max quantization, GRASH applies non-uniform size bucket to generate hashing codes, so the distortion can be minimized. Experimental results on a number of datasets confirm the validity of our proposed model.


IEEE Geoscience and Remote Sensing Letters | 2017

Feature Extraction by Rotation-Invariant Matrix Representation for Object Detection in Aerial Image

Guoli Wang; Xinchao Wang; Bin Fan; Chunhong Pan

This letter proposes a novel rotation-invariant feature for object detection in optical remote sensing images. Different from previous rotation-invariant features, the proposed rotation-invariant matrix (RIM) can incorporate partial angular spatial information in addition to radial spatial information. Moreover, it can be further calculated between different rings for a redundant representation of the spatial layout. Based on the RIM, we further propose an RIM_FV_RPP feature for object detection. For an image region, we first densely extract RIM features from overlapping blocks; then, these RIM features are encoded into Fisher vectors; finally, a pyramid pooling strategy that hierarchically accumulates Fisher vectors in ring subregions is used to encode richer spatial information while maintaining rotation invariance. Both of the RIM and RIM_FV_RPP are rotation invariant. Experiments on airplane and car detection in optical remote sensing images demonstrate the superiority of our feature to the state of the art.


computer vision and pattern recognition | 2017

Balanced Two-Stage Residual Networks for Image Super-Resolution

Yuchen Fan; Honghui Shi; Jiahui Yu; Ding Liu; Wei Han; Haichao Yu; Zhangyang Wang; Xinchao Wang; Thomas S. Huang

In this paper, balanced two-stage residual networks (BTSRN) are proposed for single image super-resolution. The deep residual design with constrained depth achieves the optimal balance between the accuracy and the speed for super-resolving images. The experiments show that the balanced two-stage structure, together with our lightweight two-layer PConv residual block design, achieves very promising results when considering both accuracy and speed. We evaluated our models on the New Trends in Image Restoration and Enhancement workshop and challenge on image super-resolution (NTIRE SR 2017). Our final model with only 10 residual blocks ranked among the best ones in terms of not only accuracy (6th among 20 final teams) but also speed (2nd among top 6 teams in terms of accuracy). The source code both for training and evaluation is available in https://github.com/ychfan/sr_ntire2017.


IEEE Transactions on Image Processing | 2018

Learning Temporal Dynamics for Video Super-Resolution: A Deep Learning Approach

Ding Liu; Zhaowen Wang; Yuchen Fan; Xianming Liu; Zhangyang Wang; Shiyu Chang; Xinchao Wang; Thomas S. Huang

Video super-resolution (SR) aims at estimating a high-resolution video sequence from a low-resolution (LR) one. Given that the deep learning has been successfully applied to the task of single image SR, which demonstrates the strong capability of neural networks for modeling spatial relation within one single image, the key challenge to conduct video SR is how to efficiently and effectively exploit the temporal dependence among consecutive LR frames other than the spatial relation. However, this remains challenging because the complex motion is difficult to model and can bring detrimental effects if not handled properly. We tackle the problem of learning temporal dynamics from two aspects. First, we propose a temporal adaptive neural network that can adaptively determine the optimal scale of temporal dependence. Inspired by the inception module in GoogLeNet [1], filters of various temporal scales are applied to the input LR sequence before their responses are adaptively aggregated, in order to fully exploit the temporal relation among the consecutive LR frames. Second, we decrease the complexity of motion among neighboring frames using a spatial alignment network that can be end-to-end trained with the temporal adaptive network and has the merit of increasing the robustness to complex motion and the efficiency compared with the competing image alignment methods. We provide a comprehensive evaluation of the temporal adaptation and the spatial alignment modules. We show that the temporal adaptive design considerably improves the SR quality over its plain counterparts, and the spatial alignment network is able to attain comparable SR performance with the sophisticated optical flow-based approach, but requires a much less running time. Overall, our proposed model with learned temporal dynamics is shown to achieve the state-of-the-art SR results in terms of not only spatial consistency but also the temporal coherence on public video data sets. More information can be found in http://www.ifp.illinois.edu/~dingliu2/videoSR/.


IEEE Transactions on Image Processing | 2017

Greedy Batch-Based Minimum-Cost Flows for Tracking Multiple Objects

Xinchao Wang; Bin Fan; Shiyu Chang; Zhangyang Wang; Xianming Liu; Dacheng Tao; Thomas S. Huang

Minimum-cost flow algorithms have recently achieved state-of-the-art results in multi-object tracking. However, they rely on the whole image sequence as input. When deployed in real-time applications or in distributed settings, these algorithms first operate on short batches of frames and then stitch the results into full trajectories. This decoupled strategy is prone to errors because the batch-based tracking errors may propagate to the final trajectories and cannot be corrected by other batches. In this paper, we propose a greedy batch-based minimum-cost flow approach for tracking multiple objects. Unlike existing approaches that conduct batch-based tracking and stitching sequentially, we optimize consecutive batches jointly so that the tracking results on one batch may benefit the results on the other. Specifically, we apply a generalized minimum-cost flows (MCF) algorithm on each batch and generate a set of conflicting trajectories. These trajectories comprise the ones with high probabilities, but also those with low probabilities potentially missed by detectors and trackers. We then apply the generalized MCF again to obtain the optimal matching between trajectories from consecutive batches. Our proposed approach is simple, effective, and does not require training. We demonstrate the power of our approach on data sets of different scenarios.

Collaboration


Dive into the Xinchao Wang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zhu Li

University of Missouri–Kansas City

View shared research outputs
Top Co-Authors

Avatar

Bin Fan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xianming Liu

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chunhong Pan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Guoli Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Long Lan

National University of Defense Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge