Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Qingming Huang is active.

Publication


Featured researches published by Qingming Huang.


european conference on computer vision | 2016

The Visual Object Tracking VOT2014 Challenge Results

Matej Kristan; Roman P. Pflugfelder; Aleš Leonardis; Jiri Matas; Luka Cehovin; Georg Nebehay; Tomas Vojir; Gustavo Fernández; Alan Lukezic; Aleksandar Dimitriev; Alfredo Petrosino; Amir Saffari; Bo Li; Bohyung Han; CherKeng Heng; Christophe Garcia; Dominik Pangersic; Gustav Häger; Fahad Shahbaz Khan; Franci Oven; Horst Bischof; Hyeonseob Nam; Jianke Zhu; Jijia Li; Jin Young Choi; Jin-Woo Choi; João F. Henriques; Joost van de Weijer; Jorge Batista; Karel Lebeda

Visual tracking has attracted a significant attention in the last few decades. The recent surge in the number of publications on tracking-related problems have made it almost impossible to follow the developments in the field. One of the reasons is that there is a lack of commonly accepted annotated data-sets and standardized evaluation protocols that would allow objective comparison of different tracking methods. To address this issue, the Visual Object Tracking (VOT) workshop was organized in conjunction with ICCV2013. Researchers from academia as well as industry were invited to participate in the first VOT2013 challenge which aimed at single-object visual trackers that do not apply pre-learned models of object appearance (model-free). Presented here is the VOT2013 benchmark dataset for evaluation of single-object visual trackers as well as the results obtained by the trackers competing in the challenge. In contrast to related attempts in tracker benchmarking, the dataset is labeled per-frame by visual attributes that indicate occlusion, illumination change, motion change, size change and camera motion, offering a more systematic comparison of the trackers. Furthermore, we have designed an automated system for performing and evaluating the experiments. We present the evaluation protocol of the VOT2013 challenge and the results of a comparison of 27 trackers on the benchmark dataset. The dataset, the evaluation tools and the tracker rankings are publicly available from the challenge website (http://votchallenge.net).


Image and Vision Computing | 2005

Fast and robust text detection in images and video frames

Qixiang Ye; Qingming Huang; Wen Gao; Debin Zhao

Text in images and video frames carries important information for visual content understanding and retrieval. In this paper, by using multiscale wavelet features, we propose a novel coarse-to-fine algorithm that is able to locate text lines even under complex background. First, in the coarse detection, after the wavelet energy feature is calculated to locate all possible text pixels, a density-based region growing method is developed to connect these pixels into regions which are further separated into candidate text lines by structural information. Secondly, in the fine detection, with four kinds of texture features extracted to represent the texture pattern of a text line, a forward search algorithm is applied to select the most effective features. Finally, an SVM classifier is used to identify true text from the candidates based on the selected features. Experimental results show that this approach can fast and robustly detect text lines under various conditions.


acm multimedia | 2009

Descriptive visual words and visual phrases for image applications

Shiliang Zhang; Qi Tian; Gang Hua; Qingming Huang; Shipeng Li

The Bag-of-visual Words (BoW) image representation has been applied for various problems in the fields of multimedia and computer vision. The basic idea is to represent images as visual documents composed of repeatable and distinctive visual elements, which are comparable to the words in texts. However, massive experiments show that the commonly used visual words are not as expressive as the text words, which is not desirable because it hinders their effectiveness in various applications. In this paper, Descriptive Visual Words (DVWs) and Descriptive Visual Phrases (DVPs) are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs. Since images are the carriers of visual objects and scenes, novel descriptive visual element set can be composed by the visual words and their combinations which are effective in representing certain visual objects or scenes. Based on this idea, a general framework is proposed for generating DVWs and DVPs from classic visual words for various applications. In a large-scale image database containing 1506 object and scene categories, the visual words and visual word pairs descriptive to certain scenes or objects are identified as the DVWs and DVPs. Experiments show that the DVWs and DVPs are compact and descriptive, thus are more comparable with the text words than the classic visual words. We apply the identified DVWs and DVPs in several applications including image retrieval, image re-ranking, and object recognition. The DVW and DVP combination outperforms the classic visual words by 19.5% and 80% in image retrieval and object recognition tasks, respectively. The DVW and DVP based image re-ranking algorithm: DWPRank outperforms the state-of-the-art VisualRank by 12.4% in accuracy and about 11 times faster in efficiency.


computer vision and pattern recognition | 2010

Measuring visual saliency by Site Entropy Rate

Wei Wang; Yizhou Wang; Qingming Huang; Wen Gao

In this paper, we propose a new computational model for visual saliency derived from the information maximization principle. The model is inspired by a few well acknowledged biological facts. To compute the saliency spots of an image, the model first extracts a number of sub-band feature maps using learned sparse codes. It adopts a fully-connected graph representation for each feature map, and runs random walks on the graphs to simulate the signal/information transmission among the interconnected neurons. We propose a new visual saliency measure called Site Entropy Rate (SER) to compute the average information transmitted from a node (neuron) to all the others during the random walk on the graphs/network. This saliency definition also explains the center-surround mechanism from computation aspect. We further extend our model to spatial-temporal domain so as to detect salient spots in videos. To evaluate the proposed model, we do extensive experiments on psychological stimuli, two well known image data sets, as well as a public video dataset. The experiments demonstrate encouraging results that the proposed model achieves the state-of-the-art performance of saliency detection in both still images and videos.


IEEE Transactions on Multimedia | 2007

Joint Source-Channel Rate-Distortion Optimization for H.264 Video Coding Over Error-Prone Networks

Yuan Zhang; Wen Gao; Yan Lu; Qingming Huang; Debin Zhao

For a typical video distribution system, the video contents are first compressed and then stored in the local storage or transmitted to the end users through networks. While the compressed videos are transmitted through error-prone networks, error robustness becomes an important issue. In the past years, a number of rate-distortion (R-D) optimized coding mode selection schemes have been proposed for error-resilient video coding, including a recursive optimal per-pixel estimate (ROPE) method. However, the ROPE-related approaches assume integer-pixel motion-compensated prediction rather than subpixel prediction, whose extension to H.264 is not straightforward. Alternatively, an error-robust R-D optimization (ER-RDO) method has been included in H.264 test model, in which the estimate of pixel distortion is derived by simulating decoding process multiple times in the encoder. Obviously, the computing complexity is very high. To address this problem, we propose a new end-to-end distortion model for R-D optimized coding mode selection, in which the overall distortion is taken as the sum of several separable distortion items. Thus, it can suppress the approximation errors caused by pixel averaging operations such as subpixel prediction. Based on the proposed end-to-end distortion model, a new Lagrange multiplier is derived for R-D optimized coding mode selection in packet-loss environment by taking into account of the network conditions. The rate control and complexity issues are also discussed in this paper


IEEE Transactions on Multimedia | 2008

Using Webcast Text for Semantic Event Detection in Broadcast Sports Video

Changsheng Xu; Yifan Zhang; Guangyu Zhu; Yong Rui; Hanqing Lu; Qingming Huang

Sports video semantic event detection is essential for sports video summarization and retrieval. Extensive research efforts have been devoted to this area in recent years. However, the existing sports video event detection approaches heavily rely on either video content itself, which face the difficulty of high-level semantic information extraction from video content using computer vision and image processing techniques, or manually generated video ontology, which is domain specific and difficult to be automatically aligned with the video content. In this paper, we present a novel approach for sports video semantic event detection based on analysis and alignment of Webcast text and broadcast video. Webcast text is a text broadcast channel for sports game which is co-produced with the broadcast video and is easily obtained from the Web. We first analyze Webcast text to cluster and detect text events in an unsupervised way using probabilistic latent semantic analysis (pLSA). Based on the detected text event and video structure analysis, we employ a conditional random field model (CRFM) to align text event and video event by detecting event moment and event boundary in the video. Incorporation of Webcast text into sports video analysis significantly facilitates sports video semantic event detection. We conducted experiments on 33 hours of soccer and basketball games for Webcast analysis, broadcast video analysis and text/video semantic alignment. The results are encouraging and compared with the manually labeled ground truth.


acm multimedia | 2010

Building contextual visual vocabulary for large-scale image applications

Shiliang Zhang; Qingming Huang; Gang Hua; Shuqiang Jiang; Wen Gao; Qi Tian

Not withstanding its great success and wide adoption in Bag-of-visual Words representation, visual vocabulary created from single image local features is often shown to be ineffective largely due to three reasons. First, many detected local features are not stable enough, resulting in many noisy and non-descriptive visual words in images. Second, single visual word discards the rich spatial contextual information among the local features, which has been proven to be valuable for visual matching. Third, the distance metric commonly used for generating visual vocabulary does not take the semantic context into consideration, which renders them to be prone to noise. To address these three confrontations, we propose an effective visual vocabulary generation framework containing three novel contributions: 1) we propose an effective unsupervised local feature refinement strategy; 2) we consider local features in groups to model their spatial contexts; 3) we further learn a discriminant distance metric between local feature groups, which we call discriminant group distance. This group distance is further leveraged to induce visual vocabulary from groups of local features. We name it contextual visual vocabulary, which captures both the spatial and semantic contexts. We evaluate the proposed local feature refinement strategy and the contextual visual vocabulary in two large-scale image applications: large-scale near-duplicate image retrieval on a dataset containing 1.5 million images and image search re-ranking tasks. Our experimental results show that the contextual visual vocabulary shows significant improvement over the classic visual vocabulary. Moreover, it outperforms the state-of-the-art Bundled Feature in the terms of retrieval precision, memory consumption and efficiency.


computer vision and pattern recognition | 2016

Hedged Deep Tracking

Yuankai Qi; Shengping Zhang; Lei Qin; Hongxun Yao; Qingming Huang; Jongwoo Lim; Ming-Hsuan Yang

In recent years, several methods have been developed to utilize hierarchical features learned from a deep convolutional neural network (CNN) for visual tracking. However, as features from a certain CNN layer characterize an object of interest from only one aspect or one level, the performance of such trackers trained with features from one layer (usually the second to last layer) can be further improved. In this paper, we propose a novel CNN based tracking framework, which takes full advantage of features from different CNN layers and uses an adaptive Hedge method to hedge several CNN based trackers into a single stronger one. Extensive experiments on a benchmark dataset of 100 challenging image sequences demonstrate the effectiveness of the proposed algorithm compared to several state-of-theart trackers.


Signal Processing-image Communication | 2009

Joint video/depth rate allocation for 3D video coding based on view synthesis distortion model

Yanwei Liu; Qingming Huang; Siwei Ma; Debin Zhao; Wen Gao

Joint video/depth rate allocation is an important optimization problem in 3D video coding. To address this problem, this paper proposes a distortion model to evaluate the synthesized view without access to the captured original view. The proposed distortion model is an additive model that accounts for the video-coding-induced distortion and the depth-quantization-induced distortion, as well as the inherent geometry distortion. Depth-quantization-induced distortion not only considers the warping error distortion, which is described by a piecewise linear model with the video power spectral property, but also takes into account the warping error correlation distortion between two sources reference views. Geometry distortion is approximated from that of the adjacent view synthesis. Based on the proposed distortion model, a joint rate allocation method is proposed to seek the optimal trade-off between video bit-rate and depth bit-rate for maximizing the view synthesis quality. Experimental results show that the proposed distortion model is capable of approximately estimating the actual distortion for the synthesized view, and that the proposed rate allocation method can almost achieve the identical rate allocation performance as the full-search method at less computational cost. Moreover, the proposed rate allocation method consumes less computational cost than the hierarchical-search method at high bit-rates while providing almost the equivalent rate allocation performance.


Pattern Recognition | 2009

A configurable method for multi-style license plate recognition

Jianbin Jiao; Qixiang Ye; Qingming Huang

Despite the success of license plate recognition (LPR) methods in the past decades, few of them can process multi-style license plates (LPs), especially LPs from different nations, effectively. In this paper, we propose a new method for multi-style LP recognition by representing the styles with quantitative parameters, i.e., plate rotation angle, plate line number, character type and format. In the recognition procedure these four parameters are managed by relevant algorithms, i.e., plate rotation, plate line segmentation, character recognition and format matching algorithm, respectively. To recognize special style LPs, users can configure the method by defining corresponding parameter values, which will be processed by the relevant algorithms. In addition, the probabilities of the occurrence of every LP style are calculated based on the previous LPR results, which will result in a faster and more precise recognition. Various LP images were used to test the proposed method and the results proved its effectiveness.

Collaboration


Dive into the Qingming Huang's collaboration.

Top Co-Authors

Avatar

Shuqiang Jiang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Qi Tian

University of Texas at San Antonio

View shared research outputs
Top Co-Authors

Avatar

Shuhui Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Weigang Zhang

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Guorong Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Junbiao Pang

Beijing University of Technology

View shared research outputs
Top Co-Authors

Avatar

Chunjie Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Hongxun Yao

Harbin Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge