Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Houqiang Li is active.

Publication


Featured researches published by Houqiang Li.


acm multimedia | 2010

Spatial coding for large scale partial-duplicate web image search

Wengang Zhou; Yijuan Lu; Houqiang Li; Yibing Song; Qi Tian

The state-of-the-art image retrieval approaches represent images with a high dimensional vector of visual words by quantizing local features, such as SIFT, in the descriptor space. The geometric clues among visual words in an image is usually ignored or exploited for full geometric verification, which is computationally expensive. In this paper, we focus on partial-duplicate web image retrieval, and propose a novel scheme, spatial coding, to encode the spatial relationships among local features in an image. Our spatial coding is both efficient and effective to discover false matches of local features between images, and can greatly improve retrieval performance. Experiments in partial-duplicate web image search, using a database of one million images, reveal that our approach achieves a 53% improvement in mean average precision and 46% reduction in time cost over the baseline bag-of-words approach.


IEEE Transactions on Image Processing | 2007

Adaptive Directional Lifting-Based Wavelet Transform for Image Coding

Wenpeng Ding; Feng Wu; Xiaolin Wu; Shipeng Li; Houqiang Li

We present a novel 2-D wavelet transform scheme of adaptive directional lifting (ADL) in image coding. Instead of alternately applying horizontal and vertical lifting, as in present practice, ADL performs lifting-based prediction in local windows in the direction of high pixel correlation. Hence, it adapts far better to the image orientation features in local windows. The ADL transform is achieved by existing 1-D wavelets and is seamlessly integrated into the global wavelet transform. The predicting and updating signals of ADL can be derived even at the fractional pixel precision level to achieve high directional resolution, while still maintaining perfect reconstruction. To enhance the ADL performance, a rate-distortion optimized directional segmentation scheme is also proposed to form and code a hierarchical image partition adapting to local features. Experimental results show that the proposed ADL-based image coding technique outperforms JPEG 2000 in both PSNR and visual quality, with the improvement up to 2.0 dB on images with rich orientation features


computer vision and pattern recognition | 2016

Jointly Modeling Embedding and Translation to Bridge Video and Language

Yingwei Pan; Tao Mei; Houqiang Li; Yong Rui

Automatically describing video content with natural language is a fundamental challenge of computer vision. Re-current Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true. This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding. The former aims to locally maximize the probability of generating the next word given previous words and visual content, while the latter is to create a visual-semantic embedding space for enforcing the relationship between the semantics of the entire sentence and visual content. The experiments on YouTube2Text dataset show that our proposed LSTM-E achieves to-date the best published performance in generating natural sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. Superior performances are also reported on two movie description datasets (M-VAD and MPII-MD). In addition, we demonstrate that LSTM-E outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets.


IEEE Transactions on Image Processing | 2014

λ domain rate control algorithm for high efficiency video coding.

Bin Li; Houqiang Li; Li Li; Jinlei Zhang

Rate control is a useful tool for video coding, especially in real-time communication applications. Most of existing rate control algorithms are based on the R - Q model, which characterizes the relationship between bitrate R and quantization Q, under the assumption that Q is the critical factor on rate control. However, with the video coding schemes becoming more and more flexible, it is very difficult to accurately model the R - Q relationship. In fact, we find that there exists a more robust correspondence between R and the Lagrange multiplier λ. Therefore, in this paper, we propose a novel λ-domain rate control algorithm based on the R - λ model, and implement it in the newest video coding standard high efficiency video coding (HEVC). Experimental results show that the proposed λ-domain rate control can achieve the target bitrates more accurately than the original rate control algorithm in the HEVC reference software as well as obtain significant R-D performance gain. Thanks to the high accurate rate control algorithm, hierarchical bit allocation can be enabled in the implemented video coding scheme, which can bring additional R-D performance gain. Experimental results demonstrate that the proposed λ-domain rate control algorithm is effective for HEVC, which outperforms the R - Q model based rate control in HM-8.0 (HEVC reference software) by 0.55 dB on average and up to 1.81 dB for low delay coding structure, and 1.08 dB on average and up to 3.77 dB for random access coding structure. The proposed λ-domain rate control algorithm has already been adopted by Joint Collaborative Team on Video Coding and integrated into the HEVC reference software.


IEEE Transactions on Image Processing | 2012

Principal Visual Word Discovery for Automatic License Plate Detection

Wengang Zhou; Houqiang Li; Yijuan Lu; Qi Tian

License plates detection is widely considered a solved problem, with many systems already in operation. However, the existing algorithms or systems work well only under some controlled conditions. There are still many challenges for license plate detection in an open environment, such as various observation angles, background clutter, scale changes, multiple plates, uneven illumination, and so on. In this paper, we propose a novel scheme to automatically locate license plates by principal visual word (PVW), discovery and local feature matching. Observing that characters in different license plates are duplicates of each other, we bring in the idea of using the bag-of-words (BoW) model popularly applied in partial-duplicate image search. Unlike the classic BoW model, for each plate character, we automatically discover the PVW characterized with geometric context. Given a new image, the license plates are extracted by matching local features with PVW. Besides license plate detection, our approach can also be extended to the detection of logos and trademarks. Due to the invariance virtue of scale-invariant feature transform feature, our method can adaptively deal with various changes in the license plates, such as rotation, scaling, illumination, etc. Promising results of the proposed approach are demonstrated with an experimental study in license plate detection.


IEEE Transactions on Multimedia | 2008

Video Error Concealment Using Spatio-Temporal Boundary Matching and Partial Differential Equation

Yan Chen; Yang Hu; Oscar C. Au; Houqiang Li; Chang Wen Chen

Error concealment techniques are very important for video communication since compressed video sequences may be corrupted or lost when transmitted over error-prone networks. In this paper, we propose a novel two-stage error concealment scheme for erroneously received video sequences. In the first stage, we propose a novel spatio-temporal boundary matching algorithm (STBMA) to reconstruct the lost motion vectors (MV). A well defined cost function is introduced which exploits both spatial and temporal smoothness properties of video signals. By minimizing the cost function, the MV of each lost macroblock (MB) is recovered and the corresponding reference MB in the reference frame is obtained using this MV. In the second stage, instead of directly copying the reference MB as the final recovered pixel values, we use a novel partial differential equation (PDE) based algorithm to refine the reconstruction. We minimize, in a weighted manner, the difference between the gradient field of the reconstructed MB in current frame and that of the reference MB in the reference frame under given boundary condition. A weighting factor is used to control the regulation level according to the local blockiness degree. With this algorithm, the annoying blocking artifacts are effectively reduced while the structures of the reference MB are well preserved. Compared with the error concealment feature implemented in the H.264 reference software, our algorithm is able to achieve significantly higher PSNR as well as better visual quality.


acm multimedia | 2012

Scalar quantization for large scale image search

Wengang Zhou; Yijuan Lu; Houqiang Li; Qi Tian

Bag-of-Words (BoW) model based on SIFT has been widely used in large scale image retrieval applications. Feature quantization plays a crucial role in BoW model, which generates visual words from the high dimensional SIFT features, so as to adapt to the inverted file structure for indexing. Traditional feature quantization approaches suffer several problems: 1) high computational cost---visual words generation (codebook construction) is time consuming especially with large amount of features; 2) limited reliability---different collections of images may produce totally different codebooks and quantization error is hard to be controlled; 3) update inefficiency--once the codebook is constructed, it is not easy to be updated. In this paper, a novel feature quantization algorithm, scalar quantization, is proposed. With scalar quantization, a SIFT feature is quantized to a descriptive and discriminative bit-vector, of which the first tens of bits are taken out as code word. Our quantizer is independent of collections of images. In addition, the result of scalar quantization naturally lends itself to adapt to the classic inverted file structure for image indexing. Moreover, the quantization error can be flexibly reduced and controlled by efficiently enumerating nearest neighbors of code words. The performance of scalar quantization has been evaluated in partial-duplicate Web image search on a database of one million images. Experiments reveal that the proposed scalar quantization achieves a relatively 42% improvement in mean average precision over the baseline (hierarchical visual vocabulary tree approach), and also outperforms the state-of-the-art Hamming Embedding approach and soft assignment method.


international conference on image processing | 2006

Error Resilient Mode Decision in Scalable Video Coding

Yi Guo; Ye-Kui Wang; Houqiang Li

Error resilient macroblock mode decision has been extensively investigated in the literature for single-layer video coding, for which error resilient mode decision is also called as intra refresh. In this paper, we present a loss-aware rate-distortion optimized macroblock mode decision algorithm for scalable video coding, wherein more macroblock coding modes than intra and inter are involved. Thanks to the good performance, the proposed method has been adopted into the joint scalable video model by the joint video team.


IEEE Transactions on Circuits and Systems for Video Technology | 2009

Error Resilient Coding and Error Concealment in Scalable Video Coding

Yi Guo; Ying Chen; Ye-Kui Wang; Houqiang Li; Miska Hannuksela; Moncef Gabbouj

Scalable video coding (SVC), which is the scalable extension of the H.264/AVC standard, was developed by the Joint Video Team (JVT) of ISO/IEC MPEG (Moving Picture Experts Group) and ITU-T VCEG (Video Coding Experts Group). SVC is designed to provide adaptation capability for heterogeneous network structures and different receiving devices with the help of temporal, spatial, and quality scalabilities. It is challenging to achieve graceful quality degradation in an error-prone environment, since channel errors can drastically deteriorate the quality of the video. Error resilient coding and error concealment techniques have been introduced into SVC to reduce the quality degradation impact of transmission errors. Some of the techniques are inherited from or applicable also to H.264/AVC, while some of them take advantage of the SVC coding structure and coding tools. In this paper, the error resilient coding and error concealment tools in SVC are first reviewed. Then, several important tools such as loss-aware rate-distortion optimized macroblock mode decision algorithm and error concealment methods in SVC are discussed and experimental results are provided to show the benefits from them. The results demonstrate that PSNR gains can be achieved for the conventional inter prediction (IPPP) coding structure or the hierarchical bi-predictive (B) picture coding structure with large group of pictures size, for all the tested sequences and under various combinations of packet loss rates, compared with the basic joint scalable video model (JSVM) design applying no error resilient tools at the encoder and only picture copy error concealment method at the decoder.


IEEE Transactions on Image Processing | 2013

Multiview-Video-Plus-Depth Coding Based on the Advanced Video Coding Standard

Miska Hannuksela; Dmytro Rusanovskyy; Wenyi Su; Lulu Chen; Ri Li; Payman Aflaki; Deyan Lan; Michal Joachimiak; Houqiang Li; Moncef Gabbouj

This paper presents a multiview-video-plus-depth coding scheme, which is compatible with the advanced video coding (H.264/AVC) standard and its multiview video coding (MVC) extension. This scheme introduces several encoding and in-loop coding tools for depth and texture video coding, such as depth-based texture motion vector prediction, depth-range-based weighted prediction, joint inter-view depth filtering, and gradual view refresh. The presented coding scheme is submitted to the 3D video coding (3DV) call for proposals (CfP) of the Moving Picture Experts Group standardization committee. When measured with commonly used objective metrics against the MVC anchor, the proposed scheme provides an average bitrate reduction of 26% and 35% for the 3DV CfP test scenarios with two and three views, respectively. The observed bitrate reduction is similar according to an analysis of the results obtained for the subjective tests on the 3DV CfP submissions.

Collaboration


Dive into the Houqiang Li's collaboration.

Top Co-Authors

Avatar

Wengang Zhou

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Qi Tian

University of Texas at San Antonio

View shared research outputs
Top Co-Authors

Avatar

Weiping Li

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Feng Wu

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Dong Liu

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Li Li

University of Missouri–Kansas City

View shared research outputs
Top Co-Authors

Avatar

Lei Yu

University of Science and Technology of China

View shared research outputs
Researchain Logo
Decentralizing Knowledge