Zhigang Tu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhigang Tu is active.

Explore More

Publication

Featured researches published by Zhigang Tu.

Pattern Recognition | 2014

A combined post-filtering method to improve accuracy of variational optical flow estimation

Zhigang Tu; Nico van der Aa; Coert van Gemeren; Remco C. Veltkamp

We present a novel combined post-filtering (CPF) method to improve the accuracy of optical flow estimation. Its attractive advantages are that outliers reduction is attained while discontinuities are well preserved, and occlusions are partially handled. Major contributions are the following: First, the structure tensor (ST) based edge detection is introduced to extract flow edges. Moreover, we improve the detection performance by extending the traditional 2D spatial edge detector into spatial-scale 3D space, and also using a gradient bilateral filter (GBF) to replace the linear Gaussian filter to construct a multi-scale nonlinear ST. GBF is useful to preserve discontinuity but it is computationally expensive. A hybrid GBF and Gaussian filter (HGBGF) approach is proposed by means of a spatial-scale gradient signal-to-noise ratio (SNR) measure to solve the low efficiency issue. Additionally, a piecewise occlusion detection method is used to extract occlusions. Second, we apply a CPF method, which uses a weighted median filter (WMF), a bilateral filter (BF) and a fast median filter (MF), to post-smooth the detected edges and occlusions, and the other flat regions of the flow field, respectively. Benchmark tests on both synthetic and real sequences demonstrate the effectiveness of our method.

Pattern Recognition | 2016

Weighted local intensity fusion method for variational optical flow estimation

Zhigang Tu; Ronald Poppe; Remco C. Veltkamp

Estimating a dense motion field of successive video frames is a fundamental problem in image processing. The multi-scale variational optical flow method is a critical technique that addresses this issue. Despite the considerable progress over the past decades, there are still some challenges such as dealing with large displacements and estimating the smoothness parameter. We present a local intensity fusion (LIF) method to tackle these difficulties. By evaluating the local interpolation error in terms of L1 block match on the corresponding set of images, we fuse flow proposals which are obtained from different methods and from different parameter settings integrally under a unified LIF. This approach has two benefits: (1) the incorporated matching information is helpful to recover large displacements; and (2) the obtained optimal fusion solution gives a tradeoff between the data term and the smoothness term. In addition, a selective gradient based weight is introduced to improve the performance of the LIF. Finally, we propose a corrected weighted median filter (CWMF), which applies the motion information to correct errors of the color distance weight to denoise the intermediate flow fields during optimization. Experiments demonstrate the effectiveness of our method. HighlightsA LIF is proposed to handle both large and small motion, and to estimate smoothness parameter.A selective gradient is introduced to the LIF to reduce errors that are caused by outliers.A CWMF is designed to overcome the defect of traditional WMF.

Pattern Recognition | 2017

Variational method for joint optical flow estimation and edge-aware image restoration

Zhigang Tu; Wei Xie; Jun Cao; Coert van Gemeren; Ronald Poppe; Remco C. Veltkamp

The most popular optical flow algorithms rely on optimizing the energy function that integrates a data term and a smoothness term. In contrast to this traditional framework, we derive a new objective function that couples optical flow estimation and image restoration. Our method is inspired by the recent successes of edge-aware constraints (EAC) in preserving edges in general gradient domain image filtering. By incorporating an EAC image fidelity term (IFT) in the conventional variational model, the new energy function can simultaneously estimate optical flow and restore images with preserved edges, in a bidirectional manner. For the energy minimization, we rewrite the EAC into gradient form and optimize the IFT with Euler-Lagrange equations. We can thus apply the image restoration by analytically solving a system of linear equations. Our EAC-combined IFT is easy to implement and can be seamlessly integrated into various optical flow functions suggested in literature. Extensive experiments on public optical flow benchmarks demonstrate that our method outperforms the current state-of-the-art in optical flow estimation and image restoration. HighlightsIncorporating an EAC added IFT to the variational model to form a new energy function, which can estimate optical flow and restore images jointly.The EAC can be rewritten into the first-order gradient form, and is beneficial for preserving edges and minimization.Input images can be fast restored by optimizing the Euler-Lagrange equations of the EAC integrated IFT.

asian conference on computer vision | 2014

Improved Color Patch Similarity Measure Based Weighted Median Filter

Zhigang Tu; Coert van Gemeren; Remco C. Veltkamp

Median filtering the intermediate flow fields during optimization has been demonstrated to be very useful for improving the estimation accuracy. By formulating the median filtering heuristic as non-local term in the objective function, and modifying the new term to include flow and image information that according to spatial distance, color similarity as well as the occlusion state, a weighted non-local term (a practical weighted median filter) reduces errors that are produced by median filtering and better preserves motion details. However, the color similarity measure, which is the most powerful cue, can be easily perturbed by noisy pixels. To increase robustness of the weighted median filter to noise, we introduce the idea of non-local patch denoising method to compute the color similarity in terms of patch difference. Most importantly, we propose an improved color patch similarity measure (ICPSM) to modify the traditional patch manner based measure from three aspects. Comparative experimental results on different optical flow benchmarks show that our method can denoise the flow field more effectively and outperforms the state-of-the art methods, especially for heavy noise sequences.

international conference on pattern recognition | 2016

MSR-CNN: Applying motion salient region based descriptors for action recognition

Zhigang Tu; Jun Cao; Yikang Li; Baoxin Li

In recent years the most popular video-based human action recognition methods rely on extracting feature representations using Convolutional Neural Networks (CNN) and then using these representations to classify actions. In this work, we propose a fast and accurate video representation that is derived from the motion-salient region (MSR), which represents features most useful for action labeling. By improving a well-performed foreground detection technique, the region of interest (ROI) corresponding to actors in the foreground in both the appearance and the motion field can be detected under various realistic challenges. Furthermore, we propose a complementary motion salient measure to select a secondary ROI - the major moving part of the human. Accordingly, a MSR-based CNN descriptor (MSR-CNN) is formulated to recognize human action, where the descriptor incorporates appearance and motion features along with tracks of MSR. The computation can be efficiently implemented due to two characteristics: 1) only part of the RGB image and the motion field need to be processed; 2) less data is used as input for the CNN feature extraction. Comparative evaluation on JHMDB and UCF Sports datasets shows that our method outperforms the state-of-the-art in both efficiency and accuracy.

Journal of Electronic Imaging | 2015

Estimating accurate optical flow in the presence of motion blur

Zhigang Tu; Ronald Poppe; Remco C. Veltkamp

Abstract. Spatially varying motion blur in video results from the relative motion of a camera and the scene. How to estimate accurate optical flow in the presence of spatially varying motion blur has received little attention so far. We extend the classical warping-based variational optical flow method to deal with this issue. First, we modify the data term by matching the identified nonuniform motion blur between the input images according to a fast blur detection and deblurring technique. Importantly, a downsample-interpolation technique is proposed to improve the blur detection efficiency, which saves 75% or more running time. Second, we improve the edge-preserving regularization term at blurry motion boundaries to reduce boundary errors that are caused by blur. The proposed method is evaluated on both synthetic and real sequences, and yields improved overall performance compared to the state-of-the-art in handling motion blur.

Signal Processing | 2016

Adaptive guided image filter for warping in variational optical flow computation

Zhigang Tu; Ronald Poppe; Remco C. Veltkamp

The variational optical flow method is considered to be the standard method to calculate an accurate dense motion field between successive frames. It assumes that the energy function has spatiotemporal continuities and appearance motions are small. However, for real image sequences, the temporal continuity assumption is often violated due to outliers and occlusions, causing inaccurate flow vectors at these regions. After each warping operation, errors are generated at the corresponding regions of the warped interpolation image. This results in an inaccurate discrete approximation of the temporal derivative and thus ends up affecting the accuracy of the estimated flow field. In this paper, we propose an adaptive guided image filter to correct these errors in the warped interpolation image. A guidance image is reconstructed by considering both the feature of the reference image as well as the difference between the warped interpolation image and the reference image, to guide the filtering of the warped interpolation image. To adjust the smoothing degree, the regularization parameter in the guided image filter is adaptively selected based on a confidence measure. Extensive experiments on different datasets and comparison with state-of-the-art variational optical flow algorithms demonstrate the effectiveness of our method. Introducing an adaptive guided image filter to correct errors of the intermediate warped interpolation image.Reconstructing a guidance image as a combination of the reference image and the warped image.The regularization parameter is adaptively selected based on a confidence measure to adjust the smoothing degree.

Pattern Recognition | 2018

Multi-stream CNN: Learning representations based on human-related regions for action recognition

Zhigang Tu; Wei Xie; Qianqing Qin; Ronald Poppe; Remco C. Veltkamp; Baoxin Li; Junsong Yuan

Abstract The most successful video-based human action recognition methods rely on feature representations extracted using Convolutional Neural Networks (CNNs). Inspired by the two-stream network (TS-Net), we propose a multi-stream Convolutional Neural Network (CNN) architecture to recognize human actions. We additionally consider human-related regions that contain the most informative features. First, by improving foreground detection, the region of interest corresponding to the appearance and the motion of an actor can be detected robustly under realistic circumstances. Based on the entire detected human body, we construct one appearance and one motion stream. In addition, we select a secondary region that contains the major moving part of an actor based on motion saliency. By combining the traditional streams with the novel human-related streams, we introduce a human-related multi-stream CNN (HR-MSCNN) architecture that encodes appearance, motion, and the captured tubes of the human-related regions. Comparative evaluation on the JHMDB, HMDB51, UCF Sports and UCF101 datasets demonstrates that the streams contain features that complement each other. The proposed multi-stream architecture achieves state-of-the-art results on these four datasets.

Pattern Recognition | 2017

Fusing disparate object signatures for salient object detection in video

Zhigang Tu; Zuwei Guo; Wei Xie; Mengjia Yan; Remco C. Veltkamp; Baoxin Li; Junsong Yuan

We present a novel spatiotemporal saliency model for object detection in videos. In contrast to previous methods focusing on exploiting or incorporating different saliency cues, the proposed method aims to use object signatures which can be identified by any kinds of object segmentation methods. We integrate two distinctive saliency maps, which are respectively computed from object proposals of an appearance-dominated method and a motion-dominated algorithm, to obtain a refined spatiotemporal saliency maps. This enables the method to achieve good robustness and precision in identifying salient objects in videos under various challenging conditions. First, an improved appearance-based and a modified motion-based segmentation approaches are separately utilized to extract two kinds of candidate foreground objects. Second, with these captured object signatures, we design a new approach to filter the extracted noisy object pixels and label foreground superpixels in each object signature channel. Third, we introduce a foreground connectivity saliency measure to compute two types of saliency maps, from which an adaptive fusion strategy is exploited to obtain the final spatiotemporal saliency maps for salient object detection in a video. Both quantitative and qualitative experiments on several challenging video benchmarks demonstrate that the proposed method outperforms existing state-of-the-art approaches.

IEEE Transactions on Circuits and Systems for Video Technology | 2018

Semantic Cues Enhanced Multi-modality Multi-Stream CNN for Action Recognition

Zhigang Tu; Wei Xie; Justin Dauwels; Baoxin Li; Junsong Yuan

This paper addresses the issue of video-based action recognition by exploiting an advanced multistream convolutional neural network (CNN) to fully use semantics-derived multiple modalities in both spatial (appearance) and temporal (motion) domains, since the performance of the CNN-based action recognition methods heavily relates to two factors: semantic visual cues and the network architecture. Our work consists of two major parts. First, to extract useful human-related semantics accurately, we propose a novel spatiotemporal saliency-based video object segmentation (STS) model. By fusing different distinctive saliency maps, which are computed according to object signatures of complementary object detection approaches, a refined STS maps can be obtained. In this way, various challenges in the realistic video can be handled jointly. Based on the estimated saliency maps, an energy function is constructed to segment two semantic cues: the actor and one distinctive acting part of the actor. Second, we modify the architecture of the two-stream network (TS-Net) to design a multistream network that consists of three TS-Nets with respect to the extracted semantics, which is able to use deeper abstract visual features of multimodalities in multi-scale spatiotemporally. Importantly, the performance of action recognition is significantly boosted when integrating the captured human-related semantics into our framework. Experiments on four public benchmarks—JHMDB, HMDB51, UCF-Sports, and UCF101—demonstrate that the proposed method outperforms the state-of-the-art algorithms.

Explore More