Is this you? Create Your Porfile

Zilei Wang

University of Science and Technology of China

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zilei Wang is active.

Explore More

Publication

Featured researches published by Zilei Wang.

international conference on computer vision | 2015

Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks

Chunshui Cao; Xianming Liu; Yi Yang; Yinan Yu; Jiang Wang; Zilei Wang; Yongzhen Huang; Liang Wang; Chang Huang; Wei Xu; Deva Ramanan; Thomas S. Huang

While feedforward deep convolutional neural networks (CNNs) have been a great success in computer vision, it is important to note that the human visual cortex generally contains more feedback than feedforward connections. In this paper, we will briefly introduce the background of feedbacks in the human visual cortex, which motivates us to develop a computational feedback mechanism in deep neural networks. In addition to the feedforward inference in traditional neural networks, a feedback loop is introduced to infer the activation status of hidden layer neurons according to the goal of the network, e.g., high-level semantic labels. We analogize this mechanism as Look and Think Twice. The feedback networks help better visualize and understand how deep neural networks work, and capture visual attention on expected objects, even in images with cluttered background and multiple objects. Experiments on ImageNet dataset demonstrate its effectiveness in solving tasks such as image classification and object localization.

computer vision and pattern recognition | 2016

Highway Vehicle Counting in Compressed Domain

Xu Liu; Zilei Wang; Jiashi Feng; Hongsheng Xi

This paper presents a highway vehicle counting method in compressed domain, aiming at achieving acceptable estimation performance approaching the pixel-domain methods. Such a task essentially is challenging because the available information (e.g. motion vector) to describe vehicles in videos is quite limited and inaccurate, and the vehicle count in realistic traffic scenes always varies greatly. To tackle this issue, we first develop a batch of low-level features, which can be extracted from the encoding metadata of videos, to mitigate the informational insufficiency of compressed videos. Then we propose a Hierarchical Classification based Regression (HCR) model to estimate the vehicle count from features. HCR hierarchically divides the traffic scenes into different cases according to vehicle density, such that the broad-variation characteristics of traffic scenes can be better approximated. Finally, we evaluated the proposed method on the real highway surveillance videos. The results show that our method is very competitive to the pixel-domain methods, which can reach similar performance along with its lower complexity.

computer vision and pattern recognition | 2016

Deeply Exploit Depth Information for Object Detection

Saihui Hou; Zilei Wang; Feng Wu

This paper addresses the issue on how to more effectively coordinate the depth with RGB aiming at boosting the performance of RGB-D object detection. Particularly, we investigate two primary ideas under the CNN model: property derivation and property fusion. Firstly, we propose that the depth can be utilized not only as a type of extra information besides RGB but also to derive more visual properties for comprehensively describing the objects of interest. So a two-stage learning framework consisting of property derivation and fusion is constructed. Here the properties can be derived either from the provided color/depth or their pairs (e.g. the geometry contour adopted in this paper). Secondly, we explore the fusion method of different properties in feature learning, which is boiled down to, under the CNN model, from which layer the properties should be fused together. The analysis shows that different semantic properties should be learned separately and combined before passing into the final classifier. Actually, such a detection way is in accordance with the mechanism of the primary neural cortex (V1) in brain. We experimentally evaluate the proposed method on the challenging dataset, and have achieved state-of-the-art performance.

Neurocomputing | 2018

Object detection via deeply exploiting depth information

Saihui Hou; Zilei Wang; Feng Wu

Abstract This paper addresses the issue on how to more effectively coordinate the depth with RGB aiming at boosting the performance of RGB-D object detection. Particularly, we investigate two primary ideas under the CNN model: property derivation and property fusion. Firstly, we propose that the depth can be utilized not only as a type of extra information besides RGB but also to derive more visual properties for comprehensively describing the objects of interest. Then a two-stage learning framework consisting of property derivation and fusion is constructed. Here the properties can be derived either from the provided color/depth or their pairs (e.g.xa0the geometry contour). Secondly, we explore the fusion methods of different properties in feature learning, which is boiled down to, under the CNN model, from which layer the properties should be fused together. The analysis shows that different semantic properties should be learned separately and combined before passing into the final classifier. Actually, such a detection way is in accordance with the mechanism of the primary visual cortex (V1) in brain. We experimentally evaluate the proposed method on the challenging datasets NYUD2 and SUN RGB-D, and both achieve remarkable performances that outperform the baselines.

IEEE Transactions on Multimedia | 2017

Background-Driven Salient Object Detection

Zilei Wang; Dao Xiang; Saihui Hou; Feng Wu

The background information is a significant prior for salient object detection, especially when images contain cluttered background and diverse object parts. In this paper, we propose a background-driven salient object detection (BD-SOD) method to more comprehensively exploit the background prior, aiming at generating more accurate and robust salient maps. To be specific, we first exploit the background prior to conduct the saliency estimation, i.e., computing the regional saliency values. In this stage, the background prior is utilized in threefold: restricting the reference regions to only the background regions, weighting the contribution of reference regions, and leveraging the importance of different features. Benefiting from such an explicit utilization, the proposed model can greatly mitigate the negative interference of the cluttered background and diverse object parts. We then embed the background prior into the optimization graph for saliency refinement. Specifically, two virtual supernodes (representing the background and foreground, respectively) are introduced with extra connections, and the nonlocal feature connections between similar regions are also set up. These connections enhance the power of optimization graph to alleviate the perturbations from diverse parts, and thus help to achieve the uniformity of saliency values. Finally, we provide systematical studies to investigate the effectiveness of the proposed BD-SOD in exploiting the valuable background prior. Experimental results on multiple public benchmark datasets, including MSRA-1000, THUS-10000, PASCAL-S, and ECSSD, clearly show that BD-SOD consistently outperforms the well-established baselines and achieves state-of-the-art performance.

IEEE Transactions on Multimedia | 2018

Software-Defined Multimedia Streaming System Aided By Variable-Length Interval In-Network Caching

Jian Yang; Zhen Yao; Bowen Yang; Xiaobin Tan; Zilei Wang; Quan Zheng

Explosive growth in video traffic volumes incurs a high percentage of redundancy in todays Internet, following the 80–20 rule. Fortunately, the advanced in-network cache is considered as an effective scheme for eliminating the repetitive traffic by caching the popular content in network nodes. Besides, the emerging software-defined networking (SDN) enables centralized control and management, as well as the collaboration between network devices and upper applications. Moreover, the Network Functions Virtualization is also developed to support for customized network functions, including caching and streaming. This inspires us to design an SDN-assisted multimedia streaming Video-on-Demand system, integrating in-network cache, to improve the quality of service. The designed architecture is capable of reducing the redundant traffic via the reusable duplications. In particular, it can achieve greater performance gains by deploying specific scheduling policy. We further propose a variable-length interval cache strategy for RTP streaming, which can realize the self-adaptive adjustment of the size of cached video segments based on their access patterns. Our goal is to efficiently utilize the limited storage resources and increase the cache hit ratio. We present the theoretical analysis to demonstrate the attainable performance of the proposed algorithm; furthermore, the integrated system design is implemented as a prototype to show its feasibility and applicability. Ultimately, emulation experiments are conducted to evaluate the achievable performance improvement more comprehensively.

IEEE Transactions on Circuits and Systems for Video Technology | 2017

Compressed-domain Highway Vehicle Counting By Spatial and Temporal Regression

Zilei Wang; Xu Liu; Jiashi Feng; Jian Yang; Hongsheng Xi

Counting on-road vehicles in the highway is fundamental for intelligent transportation management. This paper presents the first highway vehicle counting method in compressed domain, aiming at achieving comparable estimation performance with the pixel-domain methods. Counting in compressed domain is rather challenging due to limited information about vehicles and large variance in vehicle numbers. To address this problem, we develop new low-level features to mitigate the challenge from insufficient information in compressed videos. The new proposed features can be easily extracted from the coding-related metadata. Then, we propose a hierarchical classification-based regression (HCR) model to estimate the number of vehicles from the compressed-domain low-level features for individual frame. HCR hierarchically divides the traffic scenes into different cases according to the density of vehicles such that the large variance of traffic scenes can be effectively captured. Beside the spatial regression in each frame, we propose a locally temporal regression model to further refine the counting results, which exploits the continuous variation characteristics of the traffic flow. We extensively evaluate the proposed method on real highway surveillance videos. The experimental results consistently show that the proposed method is very competitive compared with the pixel-domain methods, which can reach similar performance with much lower computational cost.

international conference on computer vision | 2017