Wenjun Zeng
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wenjun Zeng.
european conference on computer vision | 2016
Yanghao Li; Cuiling Lan; Junliang Xing; Wenjun Zeng; Chunfeng Yuan; Jiaying Liu
Human action recognition from well-segmented 3D skeleton data has been intensively studied and has been attracting an increasing attention. Online action detection goes one step further and is more challenging, which identifies the action type and localizes the action positions on the fly from the untrimmed stream data. In this paper, we study the problem of online action detection from streaming skeleton data. We propose a multi-task end-to-end Joint Classification-Regression Recurrent Neural Network to better explore the action type and temporal localization information. By employing a joint classification and regression optimization objective, this network is capable of automatically localizing the start and end points of actions more accurately. Specifically, by leveraging the merits of the deep Long Short-Term Memory (LSTM) subnetwork, the proposed model automatically captures the complex long-range temporal dynamics, which naturally avoids the typical sliding window design and thus ensures high computational efficiency. Furthermore, the subtask of regression optimization provides the ability to forecast the action prior to its occurrence. To evaluate our proposed model, we build a large streaming video dataset with annotations. Experimental results on our dataset and the public G3D dataset both demonstrate very promising performance of our scheme.
international conference on multimedia and expo | 2016
Dong Liu; Lizhi Wang; Li Li; Zhiwei Xiong; Feng Wu; Wenjun Zeng
We propose a pseudo-sequence-based scheme for light field image compression. In our scheme, the raw image captured by a light field camera is decomposed into multiple views according to the lenslet array of that camera. These views constitute a pseudo sequence like video, and the redundancy between views is exploited by a video encoder. The specific coding order of views, prediction structure, and rate allocation have been investigated for encoding the pseudo sequence. Experimental results show the superior performance of our scheme, which achieves as high as 6.6 dB gain compared with directly encoding the raw image by the legacy JPEG.
computer vision and pattern recognition | 2015
Lizhi Wang; Zhiwei Xiong; Dahua Gao; Guangming Shi; Wenjun Zeng; Feng Wu
We propose a novel dual-camera design to acquire 4D high-speed hyperspectral (HSHS) videos with high spatial and spectral resolution. Our work has two key technical contributions. First, we build a dual-camera system that simultaneously captures a panchromatic video at a high frame rate and a hyperspectral video at a low frame rate, which jointly provide reliable projections for the underlying HSHS video. Second, we exploit the panchromatic video to learn an over-complete 3D dictionary to represent each band-wise video sparsely, and a robust computational reconstruction is then employed to recover the HSHS video based on the joint videos and the self-learned dictionary. Experimental results demonstrate that, for the first time to our knowledge, the hyperspectral video frame rate reaches up to 100fps with decent quality, even when the incident light is not strong.
IEEE Transactions on Image Processing | 2016
Hao Wu; Xiaoyan Sun; Jingyu Yang; Wenjun Zeng; Feng Wu
The explosion of digital photos has posed a significant challenge to photo storage and transmission for both personal devices and cloud platforms. In this paper, we propose a novel lossless compression method to further reduce the size of a set of JPEG coded correlated images without any loss of information. The proposed method jointly removes inter/intra image redundancy in the feature, spatial, and frequency domains. For each collection, we first organize the images into a pseudo video by minimizing the global prediction cost in the feature domain. We then present a hybrid disparity compensation method to better exploit both the global and local correlations among the images in the spatial domain. Furthermore, the redundancy between each compensated signal and the corresponding target image is adaptively reduced in the frequency domain. Experimental results demonstrate the effectiveness of the proposed lossless compression method. Compared with the JPEG coded image collections, our method achieves average bit savings of more than 31%.The explosion of digital photos has posed a significant challenge to photo storage and transmission for both personal devices and cloud platforms. In this paper, we propose a novel lossless compression method to further reduce the size of a set of JPEG coded correlated images without any loss of information. The proposed method jointly removes inter/intra image redundancy in the feature, spatial, and frequency domains. For each collection, we first organize the images into a pseudo video by minimizing the global prediction cost in the feature domain. We then present a hybrid disparity compensation method to better exploit both the global and local correlations among the images in the spatial domain. Furthermore, the redundancy between each compensated signal and the corresponding target image is adaptively reduced in the frequency domain. Experimental results demonstrate the effectiveness of the proposed lossless compression method. Compared with the JPEG coded image collections, our method achieves average bit savings of more than 31%.
IEEE Transactions on Multimedia | 2015
Dongliang He; Chong Luo; Cuiling Lan; Feng Wu; Wenjun Zeng
Hybrid digital-analog (HDA) transmission has gained increasing attention recently in the context of wireless video delivery , for its ability to simultaneously achieve high transmission efficiency and smooth quality adaptation. However, previous systems are optimized solely based on the mean squared error criterion without taking the perceptual video quality into consideration. In this work, we propose a structure-preserving HDA video delivery system, named SharpCast, to improve both the objective and subjective visual quality. SharpCast decomposes a video into a content part and structure part. The latter is important to the human perception and therefore is protected with a robust digital transmission scheme. Then, the energy-intensive part in the content information is extracted and transmitted in digital for energy efficiency while the residual is transmitted in analog to achieve the desired smooth adaptation. We formulate the resource (power and bandwidth) allocation problem in SharpCast and solve the problem with a greedy strategy. Evaluations over nine standard 720p video sequences show that the proposed SharpCast system outperforms the state-of-the-art digital, analog, and HDA schemes by a notable margin in both peak signal-to-noise ratio (PSNR) and structural similarity (SSIM).
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2017
Lizhi Wang; Zhiwei Xiong; Guangming Shi; Feng Wu; Wenjun Zeng
Leveraging the compressive sensing (CS) theory, coded aperture snapshot spectral imaging (CASSI) provides an efficient solution to recover 3D hyperspectral data from a 2D measurement. The dual-camera design of CASSI, by adding an uncoded panchromatic measurement, enhances the reconstruction fidelity while maintaining the snapshot advantage. In this paper, we propose an adaptive nonlocal sparse representation (ANSR) model to boost the performance of dual-camera compressive hyperspectral imaging (DCCHI). Specifically, the CS reconstruction problem is formulated as a 3D cube based sparse representation to make full use of the nonlocal similarity in both the spatial and spectral domains. Our key observation is that, the panchromatic image, besides playing the role of direct measurement, can be further exploited to help the nonlocal similarity estimation. Therefore, we design a joint similarity metric by adaptively combining the internal similarity within the reconstructed hyperspectral image and the external similarity within the panchromatic image. In this way, the fidelity of CS reconstruction is greatly enhanced. Both simulation and hardware experimental results show significant improvement of the proposed method over the state-of-the-art.
IEEE Signal Processing Magazine | 2017
Zhiwei Xiong; Yueyi Zhang; Feng Wu; Wenjun Zeng
Depth information plays an important role in a variety of applications, including manufacturing, medical imaging, computer vision, graphics, and virtual/augmented reality (VR/AR). Depth sensing has thus attracted sustained attention from both academia and industry communities for decades. Mainstream depth cameras can be divided into three categories: stereo, time of flight (ToF), and structured light. Stereo cameras require no active illumination and can be used outdoors, but they are fragile for homogeneous surfaces. Recently, off-the-shelf light field cameras have demonstrated improved depth estimation capability with a multiview stereo configuration. ToF cameras operate at a high frame rate and fit time-critical scenarios well, but they are susceptible to noise and limited to low resolution [3]. Structured light cameras can produce high-resolution, high-accuracy depth, provided that a number of patterns are sequentially used. Due to its promising and reliable performance, the structured light approach has been widely adopted for three-dimensional (3-D) scanning purposes. However, achieving real-time depth with structured light either requires highspeed (and thus expensive) hardware or sacrifices depth resolution and accuracy by using a single pattern instead.
modeling analysis and simulation of wireless and mobile systems | 2015
Dongliang He; Chong Luo; Feng Wu; Wenjun Zeng
Efficient and robust wireless stereo video delivery is an enabling technology for various mobile 3D applications. Existing digital solutions have high source coding efficiency but are not robust to channel variations, while analog solutions have the opposite characteristics. In this paper, we design a novel hybrid digital-analog (HDA) solution to embrace the advantages of both solutions and avoid their drawbacks. Basically, in each pair of stereo frames, one frame is digitally encoded to ensure basic quality and the other is analogly processed to opportunistically utilize good channels for better quality. To improve the system efficiency, we design a zigzag coding structure such that both intra-view and inter-view correlations can be explored through prediction in the frames to be analogly coded. A reference selection mechanism is proposed to further improve the coding efficiency. In addition, we address the problem of optimal power and bandwidth allocation between digital and analog streams. We implement a system, named Swift, and perform extensive trace-driven evaluations based on a software-defined radio platform. We show that Swift outperforms an omniscient digital scheme under the same bandwidth and power constraints, or can have around 2x power saving in order to achieve comparable performance. Subjective quality assessment evidences that Swift provides significantly better visual quality than a straightforward HDA extension of SoftCast.
IEEE Transactions on Circuits and Systems for Video Technology | 2018
Lizhi Wang; Zhiwei Xiong; Guangming Shi; Wenjun Zeng; Feng Wu
This letter presents a novel approach for simultaneous depth and spectral imaging with a cross-modal stereo system. Two images of the target scene are captured at the same time: one compressively sampled hyperspectral measurement and one panchromatic measurement. The underlying hyperspectral cube is first reconstructed by leveraging the compressive sensing theory, during which a self-adaptive dictionary is learned from the panchromatic measurement to facilitate the reconstruction. The depth information of the scene is then recovered by estimating a disparity map between the hyperspectral cube and the panchromatic measurement through stereo matching. This disparity map, once obtained, is used to align the hyperspectral and panchromatic measurements to boost the hyperspectral reconstruction in an iterative manner. Through hardware experiments, for the first time to our knowledge, we demonstrate a snapshot system that allows for simultaneous depth and spectral imaging. The proposed system is capable of recording depth and spectral videos of dynamic scenes.
IEEE Transactions on Circuits and Systems for Video Technology | 2018
Cuiling Lan; Jizheng Xu; Wenjun Zeng; Guangming Shi; Feng Wu
Transform, as one of the most important modules of mainstream video coding systems, seems very stable over the past several decades. However, recent developments indicate that bringing more options for transform can lead to coding efficiency benefits. In this paper, we go further to investigate how the coding efficiency can be improved over the state-of-the-art method by adapting a transform for each block. We present a variable block-sized signal-dependent transforms (SDTs) design based on the High Efficiency Video Coding (HEVC) framework. For a coding block ranged from <inline-formula> <tex-math notation=LaTeX>