Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chong-Wah Ngo is active.

Publication


Featured researches published by Chong-Wah Ngo.


conference on image and video retrieval | 2007

Towards optimal bag-of-features for object categorization and semantic video retrieval

Yu-Gang Jiang; Chong-Wah Ngo; Jun Yang

Bag-of-features (BoF) deriving from local keypoints has recently appeared promising for object and scene classification. Whether BoF can naturally survive the challenges such as reliability and scalability of visual classification, nevertheless, remains uncertain due to various implementation choices. In this paper, we evaluate various factors which govern the performance of BoF. The factors include the choices of detector, kernel, vocabulary size and weighting scheme. We offer some practical insights in how to optimize the performance by choosing good keypoint detector and kernel. For the weighting scheme, we propose a novel soft-weighting method to assess the significance of a visual word to an image. We experimentally show that the proposed soft-weighting scheme can consistently offer better performance than other popular weighting methods. On both PASCAL-2005 and TRECVID-2006 datasets, our BoF setting generates competitive performance compared to the state-of-the-art techniques. We also show that the BoF is highly complementary to global features. By incorporating the BoF with color and texture features, an improvement of 50% is reported on TRECVID-2006 dataset.


IEEE Transactions on Circuits and Systems for Video Technology | 2005

Video summarization and scene detection by graph modeling

Chong-Wah Ngo; Yu-Fei Ma; HongJiang Zhang

We propose a unified approach for video summarization based on the analysis of video structures and video highlights. Two major components in our approach are scene modeling and highlight detection. Scene modeling is achieved by normalized cut algorithm and temporal graph analysis, while highlight detection is accomplished by motion attention modeling. In our proposed approach, a video is represented as a complete undirected graph and the normalized cut algorithm is carried out to globally and optimally partition the graph into video clusters. The resulting clusters form a directed temporal graph and a shortest path algorithm is proposed to efficiently detect video scenes. The attention values are then computed and attached to the scenes, clusters, shots, and subshots in a temporal graph. As a result, the temporal graph can inherently describe the evolution and perceptual importance of a video. In our application, video summaries that emphasize both content balance and perceptual quality can be generated directly from a temporal graph that embeds both the structure and attention information.


acm multimedia | 2007

Practical elimination of near-duplicates from web video search

Xiao Wu; Alexander G. Hauptmann; Chong-Wah Ngo

Current web video search results rely exclusively on text keywords or user-supplied tags. A search on typical popular video often returns many duplicate and near-duplicate videos in the top results. This paper outlines ways to cluster and filter out the near-duplicate video using a hierarchical approach. Initial triage is performed using fast signatures derived from color histograms. Only when a video cannot be clearly classified as novel or near-duplicate using global signatures, we apply a more expensive local feature based near-duplicate detection which provides very accurate duplicate analysis through more costly computation. The results of 24 queries in a data set of 12,790 videos retrieved from Google, Yahoo! and YouTube show that this hierarchical approach can dramatically reduce redundant video displayed to the user in the top result set, at relatively small computational cost.


IEEE Transactions on Multimedia | 2010

Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study

Yu-Gang Jiang; Jun Yang; Chong-Wah Ngo; Alexander G. Hauptmann

Based on the local keypoints extracted as salient image patches, an image can be described as a ¿bag-of-visual-words (BoW)¿ and this representation has appeared promising for object and scene classification. The performance of BoW features in semantic concept detection for large-scale multimedia databases is subject to various representation choices. In this paper, we conduct a comprehensive study on the representation choices of BoW, including vocabulary size, weighting scheme, stop word removal, feature selection, spatial information, and visual bi-gram. We offer practical insights in how to optimize the performance of BoW by choosing appropriate representation choices. For the weighting scheme, we elaborate a soft-weighting method to assess the significance of a visual word to an image. We experimentally show that the soft-weighting outperforms other popular weighting schemes such as TF-IDF with a large margin. Our extensive experiments on TRECVID data sets also indicate that BoW feature alone, with appropriate representation choices, already produces highly competitive concept detection performance. Based on our empirical findings, we further apply our method to detect a large set of 374 semantic concepts. The detectors, as well as the features and detection scores on several recent benchmark data sets, are released to the multimedia community.


european conference on computer vision | 2012

Trajectory-Based modeling of human actions with motion reference points

Yu-Gang Jiang; Qi Dai; Xiangyang Xue; Wei Liu; Chong-Wah Ngo

Human action recognition in videos is a challenging problem with wide applications. State-of-the-art approaches often adopt the popular bag-of-features representation based on isolated local patches or temporal patch trajectories, where motion patterns like object relationships are mostly discarded. This paper proposes a simple representation specifically aimed at the modeling of such motion relationships. We adopt global and local reference points to characterize motion information, so that the final representation can be robust to camera movement. Our approach operates on top of visual codewords derived from local patch trajectories, and therefore does not require accurate foreground-background separation, which is typically a necessary step to model object relationships. Through an extensive experimental evaluation, we show that the proposed representation offers very competitive performance on challenging benchmark datasets, and combining it with the bag-of-features representation leads to substantial improvement. On Hollywood2, Olympic Sports, and HMDB51 datasets, we obtain 59.5%, 80.6% and 40.7% respectively, which are the best reported results to date.


IEEE Transactions on Multimedia | 2007

Near-Duplicate Keyframe Identification With Interest Point Matching and Pattern Learning

Wan-Lei Zhao; Chong-Wah Ngo; Hung-Khoon Tan; Xiao Wu

This paper proposes a new approach for near-duplicate keyframe (NDK) identification by matching, filtering and learning of local interest points (LIPs) with PCA-SIFT descriptors. The issues in matching reliability, filtering efficiency and learning flexibility are novelly exploited to delve into the potential of LIP-based retrieval and detection. In matching, we propose a one-to-one symmetric matching (OOS) algorithm which is found to be highly reliable for NDK identification, due to its capability in excluding false LIP matches compared with other matching strategies. For rapid filtering, we address two issues: speed efficiency and search effectiveness, to support OOS with a new index structure called LIP-IS. By exploring the properties of PCA-SIFT, the filtering capability and speed of LIP-IS are asymptotically estimated and compared to locality sensitive hashing (LSH). Owing to the robustness consideration, the matching of LIPs across keyframes forms vivid patterns that are utilized for discriminative learning and detection with support vector machines. Experimental results on TRECVID-2003 corpus show that our proposed approach outperforms other popular methods including the techniques with LSH in terms of retrieval and detection effectiveness. In addition, the proposed LIP-IS successfully speeds up OOS for more than ten times and possesses several avorable properties compared to LSH.


IEEE Transactions on Multimedia | 2009

Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context

Xiao Wu; Chong-Wah Ngo; Alexander G. Hauptmann; Hung-Khoon Tan

With the exponential growth of social media, there exist huge numbers of near-duplicate web videos, ranging from simple formatting to complex mixture of different editing effects. In addition to the abundant video content, the social Web provides rich sets of context information associated with web videos, such as thumbnail image, time duration and so on. At the same time, the popularity of Web 2.0 demands for timely response to user queries. To balance the speed and accuracy aspects, in this paper, we combine the contextual information from time duration, number of views, and thumbnail images with the content analysis derived from color and local points to achieve real-time near-duplicate elimination. The results of 24 popular queries retrieved from YouTube show that the proposed approach integrating content and context can reach real-time novelty re-ranking of web videos with extremely high efficiency, where the majority of duplicates can be rapidly detected and removed from the top rankings. The speedup of the proposed approach can reach 164 times faster than the effective hierarchical method proposed in , with just a slight loss of performance.


IEEE Transactions on Multimedia | 2010

On the Annotation of Web Videos by Efficient Near-Duplicate Search

Wan-Lei Zhao; Xiao Wu; Chong-Wah Ngo

With the proliferation of Web 2.0 applications, user-supplied social tags are commonly available in social media as a means to bridge the semantic gap. On the other hand, the explosive expansion of social web makes an overwhelming number of web videos available, among which there exists a large number of near-duplicate videos. In this paper, we investigate techniques which allow effective annotation of web videos from a data-driven perspective. A novel classifier-free video annotation framework is proposed by first retrieving visual duplicates and then suggesting representative tags. The significance of this paper lies in the addressing of two timely issues for annotating query videos. First, we provide a novel solution for fast near-duplicate video retrieval. Second, based on the outcome of near-duplicate search, we explore the potential that the data-driven annotation could be successful when huge volume of tagged web videos is freely accessible online. Experiments on cross sources (annotating Google videos and Yahoo! videos using YouTube videos) and cross time periods (annotating YouTube videos using historical data) show the effectiveness and efficiency of the proposed classifier-free approach for web video tag annotation.


IEEE Transactions on Multimedia | 2002

On clustering and retrieval of video shots through temporal slices analysis

Chong-Wah Ngo; Ting-Chuen Pong; Hong-Jiang Zhang

Based on the analysis of temporal slices, we propose novel approaches for clustering and retrieval of video shots. Temporal slices are a set of two-dimensional (2-D) images extracted along the time dimension of an image volume. They encode rich set of visual patterns for similarity measure. In this paper, we first demonstrate that tensor histogram features extracted from temporal slices are suitable for motion retrieval. Subsequently, we integrate both tensor and color histograms for constructing a two-level hierarchical clustering structure. Each cluster in the top level contains shots with similar color while each cluster in bottom level consists of shots with similar motion. The constructed structure is then used for the cluster-based retrieval. The proposed approaches are found to be useful particularly for sports games, where motion and color are important visual cues when searching and browsing the desired video shots.


IEEE Transactions on Image Processing | 2003

Motion analysis and segmentation through spatio-temporal slices processing

Chong-Wah Ngo; Ting-Chuen Pong; Hong-Jiang Zhang

This paper presents new approaches in characterizing and segmenting the content of video. These approaches are developed based upon the pattern analysis of spatio-temporal slices. While traditional approaches to motion sequence analysis tend to formulate computational methodologies on two or three adjacent frames, spatio-temporal slices provide rich visual patterns along a larger temporal scale. We first describe a motion computation method based on a structure tensor formulation. This method encodes visual patterns of spatio-temporal slices in a tensor histogram, on one hand, characterizing the temporal changes of motion over time, on the other hand, describing the motion trajectories of different moving objects. By analyzing the tensor histogram of an image sequence, we can temporally segment the sequence into several motion coherent subunits, in addition, spatially segment the sequence into various motion layers. The temporal segmentation of image sequences expeditiously facilitates the motion annotation and content representation of a video, while the spatial decomposition of image sequences leads to a prominent way of reconstructing background panoramic images and computing foreground objects.

Collaboration


Dive into the Chong-Wah Ngo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiao Wu

Southwest Jiaotong University

View shared research outputs
Top Co-Authors

Avatar

Wan-Lei Zhao

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Ting-Chuen Pong

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Hung-Khoon Tan

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Lei Pang

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Feng Wang

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Wei Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Shiai Zhu

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Hao Zhang

City University of Hong Kong

View shared research outputs
Researchain Logo
Decentralizing Knowledge