Zhangyang Wang
Texas A&M University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhangyang Wang.
acm multimedia | 2016
Jiahui Yu; Yuning Jiang; Zhangyang Wang; Zhimin Cao; Thomas S. Huang
In present object detection systems, the deep convolutional neural networks (CNNs) are utilized to predict bounding boxes of object candidates, and have gained performance advantages over the traditional region proposal methods. However, existing deep CNN methods assume the object bounds to be four independent variables, which could be regressed by the l2 loss separately. Such an oversimplified assumption is contrary to the well-received observation, that those variables are correlated, resulting to less accurate localization. To address the issue, we firstly introduce a novel Intersection over Union (IoU) loss function for bounding box prediction, which regresses the four bounds of a predicted box as a whole unit. By taking the advantages of IoU loss and deep fully convolutional networks, the UnitBox is introduced, which performs accurate and efficient localization, shows robust to objects of varied shapes and scales, and converges fast. We apply UnitBox on face detection task and achieve the best performance among all published methods on the FDDB benchmark.
computer vision and pattern recognition | 2015
Zhangyang Wang; Yingzhen Yang; Zhaowen Wang; Shiyu Chang; Wei Han; Jianchao Yang; Thomas S. Huang
Deep learning has been successfully applied to image super resolution (SR). In this paper, we propose a deep joint super resolution (DJSR) model to exploit both external and self similarities for SR. A Stacked Denoising Convolutional Auto Encoder (SDCAE) is first pre-trained on external examples with proper data augmentations. It is then fine-tuned with multi-scale self examples from each input, where the reliability of self examples is explicitly taken into account. We also enhance the model performance by sub-model training and selection. The DJSR model is extensively evaluated and compared with state-of-the-arts, and show noticeable performance improvements both quantitatively and perceptually on a wide range of images.
computer vision and pattern recognition | 2017
Radu Timofte; Eirikur Agustsson; Luc Van Gool; Ming-Hsuan Yang; Lei Zhang; Bee Lim; Sanghyun Son; Heewon Kim; Seungjun Nah; Kyoung Mu Lee; Xintao Wang; Yapeng Tian; Ke Yu; Yulun Zhang; Shixiang Wu; Chao Dong; Liang Lin; Yu Qiao; Chen Change Loy; Woong Bae; Jaejun Yoo; Yoseob Han; Jong Chul Ye; Jae Seok Choi; Munchurl Kim; Yuchen Fan; Jiahui Yu; Wei Han; Ding Liu; Haichao Yu
This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results. A new DIVerse 2K resolution image dataset (DIV2K) was employed. The challenge had 6 competitions divided into 2 tracks with 3 magnification factors each. Track 1 employed the standard bicubic downscaling setup, while Track 2 had unknown downscaling operators (blur kernel and decimation) but learnable through low and high res train images. Each competition had ∽100 registered participants and 20 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution.
IEEE Transactions on Image Processing | 2015
Zhangyang Wang; Yingzhen Yang; Zhaowen Wang; Shiyu Chang; Jianchao Yang; Thomas S. Huang
Single image super-resolution (SR) aims to estimate a high-resolution (HR) image from a low-resolution (LR) input. Image priors are commonly learned to regularize the, otherwise, seriously ill-posed SR problem, either using external LR-HR pairs or internal similar patterns. We propose joint SR to adaptively combine the advantages of both external and internal SR methods. We define two loss functions using sparse coding-based external examples, and epitomic matching based on internal examples, as well as a corresponding adaptive weight to automatically balance their contributions according to their reconstruction errors. Extensive SR results demonstrate the effectiveness of the proposed method over the existing state-of-the-art methods, and is also verified by our subjective evaluation study.
computer vision and pattern recognition | 2016
Zhangyang Wang; Shiyu Chang; Yingzhen Yang; Ding Liu; Thomas S. Huang
Visual recognition research often assumes a sufficient resolution of the region of interest (ROI). That is usually violated in practice, inspiring us to explore the Very Low Resolution Recognition (VLRR) problem. Typically, the ROI in a VLRR problem can be smaller than 16 16 pixels, and is challenging to be recognized even by human experts. We attempt to solve the VLRR problem using deep learning methods. Taking advantage of techniques primarily in super resolution, domain adaptation and robust regression, we formulate a dedicated deep learning method and demonstrate how these techniques are incorporated step by step. Any extra complexity, when introduced, is fully justified by both analysis and simulation results. The resulting Robust Partially Coupled Networks achieves feature enhancement and recognition simultaneously. It allows for both the flexibility to combat the LR-HR domain mismatch, and the robustness to outliers. Finally, the effectiveness of the proposed models is evaluated on three different VLRR tasks, including face identification, digit recognition and font recognition, all of which obtain very impressive performances.
IEEE Transactions on Geoscience and Remote Sensing | 2015
Zhangyang Wang; Nasser M. Nasrabadi; Thomas S. Huang
We present a semisupervised method for single-pixel classification of hyperspectral images. The proposed method is designed to address the special problematic characteristics of hyperspectral images, namely, high dimensionality of hyperspectral pixels, lack of labeled samples, and spatial variability of spectral signatures. To alleviate these problems, the proposed method features the following components. First, being a semisupervised approach, it exploits the wealth of unlabeled samples in the image by evaluating the confidence probability of the predicted labels, for each unlabeled sample. Second, we propose to jointly optimize the classifier parameters and the dictionary atoms by a task-driven formulation, to ensure that the learned features (sparse codes) are optimal for the trained classifier. Finally, it incorporates spatial information through adding a Laplacian smoothness regularization to the output of the classifier, rather than the sparse codes, making the spatial constraint more flexible. The proposed method is compared with a few comparable methods for classification of several popular data sets, and it produces significantly better classification results.
acm multimedia | 2015
Zhangyang Wang; Jianchao Yang; Hailin Jin; Eli Shechtman; Aseem Agarwala; Jonathan Brandt; Thomas S. Huang
As font is one of the core design concepts, automatic font identification and similar font suggestion from an image or photo has been on the wish list of many designers. We study the Visual Font Recognition (VFR) problem [4] LFE, and advance the state-of-the-art remarkably by developing the DeepFont system. First of all, we build up the first available large-scale VFR dataset, named AdobeVFR, consisting of both labeled synthetic data and partially labeled real-world data. Next, to combat the domain mismatch between available training and testing data, we introduce a Convolutional Neural Network (CNN) decomposition approach, using a domain adaptation technique based on a Stacked Convolutional Auto-Encoder (SCAE) that exploits a large corpus of unlabeled real-world text images combined with synthetic data preprocessed in a specific way. Moreover, we study a novel learning-based model compression approach, in order to reduce the DeepFont model size without sacrificing its performance. The DeepFont system achieves an accuracy of higher than 80% (top-5) on our collected dataset, and also produces a good font similarity measure for font selection and suggestion. We also achieve around 6 times compression of the model without any visible loss of recognition accuracy.
international conference on computer vision | 2017
Ding Liu; Zhaowen Wang; Yuchen Fan; Xianming Liu; Zhangyang Wang; Shiyu Chang; Thomas S. Huang
Video super-resolution (SR) aims to generate a highresolution (HR) frame from multiple low-resolution (LR) frames in a local temporal window. The inter-frame temporal relation is as crucial as the intra-frame spatial relation for tackling this problem. However, how to utilize temporal information efficiently and effectively remains challenging since complex motion is difficult to model and can introduce adverse effects if not handled properly. We address this problem from two aspects. First, we propose a temporal adaptive neural network that can adaptively determine the optimal scale of temporal dependency. Filters on various temporal scales are applied to the input LR sequence before their responses are adaptively aggregated. Second, we reduce the complexity of motion between neighboring frames using a spatial alignment network which is much more robust and efficient than competing alignment methods and can be jointly trained with the temporal adaptive network in an end-to-end manner. Our proposed models with learned temporal dynamics are systematically evaluated on public video datasets and achieve state-of-the-art SR results compared with other recent video SR approaches. Both of the temporal adaptation and the spatial alignment modules are demonstrated to considerably improve SR quality over their plain counterparts.
british machine vision conference | 2014
Yingzhen Yang; Zhangyang Wang; Jianchao Yang; Jiawei Han; Thomas S. Huang
l1-Graph has been proven to be effective in data clustering, which partitions the data space by using the sparse representation of the data as the similarity measure. However, the sparse representation is performed for each datum independently without taking into account the geometric structure of the data. Motivated by l1-Graph and manifold leaning, we propose Regularized l1-Graph (Rl1-Graph) for data clustering. Compared to l1-Graph, the sparse representations of Rl1-Graph are regularized by the geometric information of the data. In accordance with the manifold assumption, the sparse representations vary smoothly along the geodesics of the data manifold through the graph Laplacian constructed by the sparse codes. Experimental results on various data sets demonstrate the superiority of our algorithm compared to l1-Graph and other competing clustering methods.
siam international conference on data mining | 2016
Zhangyang Wang; Shiyu Chang; Jiayu Zhou; Meng Wang; Thomas S. Huang
While sparse coding-based clustering methods have shown to be successful, their bottlenecks in both efficiency and scalability limit the practical usage. In recent years, deep learning has been proved to be a highly effective, efficient and scalable feature learning tool. In this paper, we propose to emulate the sparse coding-based clustering pipeline in the context of deep learning, leading to a carefully crafted deep model benefiting from both. A feed-forward network structure, named TAGnet, is constructed based on a graph-regularized sparse coding algorithm. It is then trained with task-specific loss functions from end to end. We discover that connecting deep learning to sparse coding benefits not only the model performance, but also its initialization and interpretation. Moreover, by introducing auxiliary clustering tasks to the intermediate feature hierarchy, we formulate DTAGnet and obtain a further performance boost. Extensive experiments demonstrate that the proposed model gains remarkable margins over several state-of-the-art methods.