Canhui Cai
Huaqiao University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Canhui Cai.
IEEE Transactions on Circuits and Systems for Video Technology | 2009
Huanqiang Zeng; Canhui Cai; Kai-Kuang Ma
The intra-mode and inter-mode predictions have been made available in H.264/AVC for effectively improving coding efficiency. However, exhaustively checking for all the prediction modes for identifying the best one (commonly referred to as exhaustive mode decision) greatly increases computational complexity. In this paper, a fast mode decision algorithm, called the motion activity-based mode decision (MAMD), is proposed to speed up the encoding process by reducing the number of modes required to be checked in a hierarchical manner, and is as follows. For each macroblock, the proposed MAMD algorithm always starts with checking the rate-distortion (RD) cost computed at the SKIP mode for a possible early termination, once the RD cost value is below a predetermined ldquolowrdquo threshold. On the other hand, if the RD cost exceeds another ldquohighrdquo threshold, then this indicates that only the intra modes are worthwhile to be checked. If the computed RD cost falls between the above-mentioned two thresholds, the remaining seven modes, which are classified into three motion activity classes in our work, will be examined, and only one of the three classes will be chosen for further mode checking. The above-mentioned motion activity can be quantitatively measured, which is equal to the maximum city-block length of the motion vector taken from a set of adjacent macroblocks (i.e., region of support, ROS). This measurement is then used to determine the most possible motion-activity class for the current macroblock. Experimental results have shown that, on average, the proposed MAMD algorithm reduces the computational complexity by 62.96%, while incurring only 0.059 dB loss in PSNR (peak signal-to-noise ratio) and 0.19% increment on the total bit rate compared to that of exhaustive mode decision, which is a default approach set in the JM reference software.
IEEE Transactions on Circuits and Systems for Video Technology | 2010
Huanqiang Zeng; Kai-Kuang Ma; Canhui Cai
The intra mode prediction via exhaustive mode decision exploited in the H.264/advanced video coding effectively improves the coding efficiency, but at the expense of yielding higher computational complexity. In this letter, a fast intra mode decision algorithm, called the hierarchical intra mode decision (HIMD), is proposed to speed up the mode decision process by reducing the number of modes required to be checked for each macroblock. The novelty of the proposed HIMD algorithm lies at the following accounts. 1) An early decision with adaptive thresholding is developed for the mode decision of the luma component. 2) The candidate modes are selected according to their Hadamard distances and prediction directions. 3) Only one of the hierarchical paths will be chosen to compute its least rate-distortion cost. Experimental results have shown that the proposed HIMD algorithm achieves a reduction of 85.75% computational complexity on average, while incurring only 0.164 dB loss in peak signal-to-noise ratio (PSNR) and 2.336% increment on the total bit rate compared with that of exhaustive mode decision, which is a default approach set in the joint model reference software.
IEEE Transactions on Circuits and Systems for Video Technology | 2017
Jianqing Zhu; Huanqiang Zeng; Shengcai Liao; Zhen Lei; Canhui Cai; Li Xin Zheng
Person re-identification (Re-ID) aims to match person images captured from two non-overlapping cameras. In this paper, a deep hybrid similarity learning (DHSL) method for person Re-ID based on a convolution neural network (CNN) is proposed. In our approach, a light CNN learning feature pair for the input image pair is simultaneously extracted. Then, both the elementwise absolute difference and multiplication of the CNN learning feature pair are calculated. Finally, a hybrid similarity function is designed to measure the similarity between the feature pair, which is realized by learning a group of weight coefficients to project the elementwise absolute difference and multiplication into a similarity score. Consequently, the proposed DHSL method is able to reasonably assign complexities of feature learning and metric learning in a CNN, so that the performance of person Re-ID is improved. Experiments on three challenging person Re-ID databases, QMUL GRID, VIPeR, and CUHK03, illustrate that the proposed DHSL method is superior to multiple state-of-the-art person Re-ID methods.
IEEE Transactions on Circuits and Systems for Video Technology | 2011
Huanqiang Zeng; Kai-Kuang Ma; Canhui Cai
Exhaustive mode decision has been exploited in multiview video coding for effectively improving the coding efficiency, but at the expense of yielding much higher computational complexity. In this paper, a fast mode decision algorithm, called the mode correlation-based mode decision (MCMD), is proposed to speed up the encoding process by reducing the number of the modes required to be checked. In our approach, all the prediction modes are first categorized into five motion-activity classes, and only one of them will be chosen to identify the optimal mode in a hierarchical manner, as follows. For each macroblock (MB), the proposed MCMD algorithm always begins with checking whether the rate-distortion cost computed at the SKIP mode (i.e., Class 1) is below an adaptive threshold for providing a possible early termination chance. If this early termination condition is not met, one of the remaining four motion-activity classes will be chosen for further mode checking according to the analysis of the predicted motion vector (PMV) of the current MB. The above-mentioned adaptive threshold and PMV are derived by exploiting the mode correlation between the current MB and a set of adjacent MBs (i.e., region of support) in the current view and its neighboring view. Experimental results have shown that compared with exhaustive mode decision, which is a default approach set in the joint multiview video model (JMVM) reference software, the proposed MCMD algorithm achieves a reduction of the computational complexity by 73.39% on average, while incurring only 0.07 dB loss in peak signal-to-noise ratio (PSNR) and 2.22% increment on the total bit rate.
IEEE Transactions on Circuits and Systems for Video Technology | 2014
Huanqiang Zeng; Xiaolan Wang; Canhui Cai; Jing Chen; Yan Zhang
The multiview video coding (MVC) adopts hierarchical B picture prediction structure and offers many prediction modes to effectively remove the spatial, temporal, and inter-view redundancies inherited in multiview video (MVV), but at the price of extremely high computational complexity. To address this problem, a fast MVC method by jointly using adaptive prediction structure (APS) and hierarchical mode decision (HMD) is proposed in this paper. The complexity reduction is achieved by: 1) designing four APSs for different MVV contents based on the fact that the contribution of the inter-view prediction varies from sequence to sequence and 2) developing an HMD scheme based on the observation that the relationship between the rate distortion (RD) cost and size of prediction mode is a unimodal function. In particular, for the current group of picture of the input MVV, the prediction structure is adaptively selected based on its characteristic, which is measured by the ratio of the average RD cost of the base view frames to the sum of the average RD cost of the base view frames and that of anchor frames in nonbase views, and then an HMD scheme is further performed to skip the checking process of those unlikely modes. The experimental results have shown that compared with the exhaustive mode decision in the MVC, the proposed algorithm achieves a reduction of the computational complexity by 83.49% on average, whereas incurring only a 0.086 dB loss in Bjontegaard delta peak signal-to-noise ratio and 2.97% increment on the total Bjontegaard delta bit rate.
Signal Processing-image Communication | 2009
Canhui Cai; Huanqiang Zeng; Sanjit K. Mitra
Several specific features have been incorporated into Motion estimation (ME) in H.264 coding standard to improve its coding efficiency. However, they result in very high computational load. In this paper, a fast ME algorithm is proposed to reduce the computational complexity. First, a mode discriminant method is used to free the encoder from checking the small block size modes in homogeneous regions. Second, a condensed hierarchical block matching method and a spatial neighbor searching scheme are employed to find the best full-pixel motion vector. Finally, direction-based selection rule is utilized to reduce the searching range in sub-pixel ME process. Experimental results on commonly used QCIF and CIF format test sequences have shown that the proposed algorithm achieves a reduction of 88% ME process time on average, while incurring only 0.033dB loss in PSNR and 0.50% increment on the total bit rate compared with that of exhaustive ME process, which is a default approach adopted in the JM reference software.
Signal Processing | 2002
Canhui Cai; Sanjit K. Mitra; Runtao Ding
This paper presents a novel scheme of wavelet image coding, called X-tree coding. An X-tree is defined as a spatial hierarchical tree whose all descendants are insignificant, and it is used to denote 2-D clustered insignificant wavelet coefficients of an image. Two new coding schemes, the progressive X-tree approach and the stack X-tree approach, which are the X-tree versions of the embedded zerotree wavelet (EZW) algorithm and the stack-run coding algorithm, respectively, are proposed. Experimental results have shown that the performances of the proposed algorithms are better than those of the stack-run and the EZW algorithms, and are highly comparable to that of the set partitioning in hierarchical trees algorithm.
international conference on image processing | 2000
Canhui Cai; Sanjit K. Mitra
This paper presents the quaternion color difference edge detector, a new approach to detection of edges in color images. Based on a new type of convolution, the color difference subspace and the proposed edge detector are expressed analytically by using the algebra of the quaternion. The proposed color image edge detector generates edges only where sharp changes of color occur in the original image. The experimental results have verified the improved performance of the new edge detector compared to some well known methods.
international symposium on intelligent signal processing and communication systems | 2006
Zhe Wei; Canhui Cai; Kai-Kuang Ma
In this paper, a new H.264-based multiple description coding (MDC) scheme is proposed and called the prediction-based spatial polyphase transform (PSPT) MDC, to enhance the error resilience and reduce the computational complexity of the H.264 codec. In our approach, each input frame is first subsampled in both horizontal and vertical directions to form four subframes. Instead of directly coding and transporting all these subframes, two of them are predicted based on their neighboring subframes, followed by encoding and transmission. To handle the prediction process between the adjacent subframes, a hybrid interpolation algorithm is proposed, which combines the bilinear interpolation with some gradient tests. The simulation results demonstrate that the proposed MDC scheme is more robust and yields better reconstructed video qualities than existing approaches under an error-prone condition. Moreover, the computational complexity of the proposed encoder is reduced by half approximately, compared with that of the H.264/AVC
international conference on image processing | 2009
Yibin Chen; Canhui Cai; Kai-Kuang Ma
At low bit-rate video communications, packet loss may easily cause whole-frame loss that, in return, leads to annoying frame drop phenomenon. In this paper, a novel error concealment algorithm is specifically developed for stereoscopic video, called the disparity-based frame difference projection (DFDP), to recover the lost frames at the decoder. The proposed DFDP contains three key components: 1) change detection, 2) disparity estimation, and 3) frame difference projection, which exploits both the intra-view frame difference from one view and interview correlation to estimate the lost frame in another view. The change region computed on the correctly received frame will be used to predict the change region between current missing frame and its previous frame through the estimated disparity, which is the summation of the estimated global disparity and the estimated local disparity. Experimental results have shown that the proposed stereoscopic video error concealment method can effectively restore the lost frames at the decoder and deliver attractive performance, in terms of objective measurement (in peak signal-to-noise ratio) and subjective visual quality.