Is this you? Create Your Porfile

Kongwah Wan

Agency for Science, Technology and Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kongwah Wan is active.

Explore More

Publication

Featured researches published by Kongwah Wan.

acm multimedia | 2006

Live sports event detection based on broadcast video and web-casting text

Changsheng Xu; Jinjun Wang; Kongwah Wan; Yiqun Li; Ling-Yu Duan

Event detection is essential for sports video summarization, indexing and retrieval and extensive research efforts have been devoted to this area. However, the previous approaches are heavily relying on video content itself and require the whole video content for event detection. Due to the semantic gap between low-level features and high-level events, it is difficult to come up with a generic framework to achieve a high accuracy of event detection. In addition, the dynamic structures from different sports domains further complicate the analysis and impede the implementation of live event detection systems. In this paper, we present a novel approach for event detection from the live sports game using web-casting text and broadcast video. Web-casting text is a text broadcast source for sports game and can be live captured from the web. Incorporating web-casting text into sports video analysis significantly improves the event detection accuracy. Compared with previous approaches, the proposed approach is able to: (1) detect live event only based on the partial content captured from the web and TV; (2) extract detailed event semantics and detect exact event boundary, which are very difficult or impossible to be handled by previous approaches; and (3) create personalized summary related to certain event, player or team according to users preference. We present the framework of our approach and details of text analysis, video analysis and text/video alignment. We conducted experiments on both live games and recorded games. The results are encouraging and comparable to the manually detected events. We also give scenarios to illustrate how to apply the proposed solution to professional and consumer services.

acm multimedia | 2005

Automatic generation of personalized music sports video

Jinjun Wang; Changsheng Xu; Eng Siong Chng; Ling-Yu Duan; Kongwah Wan; Qi Tian

In this paper, we propose a novel automatic approach for personalized music sports video generation. Two research challenges, semantic sports video content selection and automatic video composition, are addressed. For the first challenge, we propose to use multi-modal (audio, video and text) feature analysis and alignment to detect the semantic of events in sports video. For the second challenge, we propose video-centric and music-centric music video composition schemes to automatically generate personalized music sports video based on users preference. The experimental results and user evaluations are promising and show that our systems generated music sports video is comparable to manually generated ones. The proposed approach greatly facilitates the automatic music sports video generation for both professionals and amateurs.

acm multimedia | 2003

Real-time goal-mouth detection in MPEG soccer video

Kongwah Wan; Xin Yan; Xinguo Yu; Changsheng Xu

We report our work in real-time detection of goal-mouth appearances in MPEG soccer video. Processing on sub-optimal quality images after MPEG-decoding, the system constrains the Hough Transform-based line-mark detection to only the dominant green regions typically seen in soccer video. The vertical goal-posts and horizontal goal-bar are then isolated by color-based region (pole)-growing. We demonstrate its application for quick video browsing and virtual content insertion. Extensive test over a large data set of about 15 hours of MPEG-1 soccer video @1.15Mbps, CIF-resolution, shows the robustness of our method.

international conference on acoustics, speech, and signal processing | 2003

Real-time camera field-view tracking in soccer video

Kongwah Wan; L. Joo-Hwee; Changsheng Xu; Xinguo Yu

Soccer video content-based analysis remains a challenging problem due to the lack of structure in a soccer game. To automate game and tactic analysis, we need to detect and track important activities such as ball possession in a soccer video that is highly correlated to the cameras field-view. In this paper, we present a system that tracks the cameras field-view in a soccer video in real-time. It utilizes a host of content-based visual cues that are obtained by independent threads running in parallel. The result is visualized as an active rectangular bounding box that approximates the cameras field of view superimposed on a virtual soccer field. Experimental results show that the system can reliably track the camera field-view as the game progresses.

international conference on multimedia and expo | 2005

Automatic mobile sports highlights

Kongwah Wan; Xin Yan; Changsheng Xu

We report on our development of a real-time system to deliver sports video highlights of a live game to mobile videophones over existing GPRS networks. To facilitate real-time analysis, a circular buffer receives live video data from which simple audio/visual features are computed to detect for highlight-worthiness according to a priori decision scheme. A separate module runs algorithms to insert content into the highlight for mobile advertising. The system is now under trial over new 3G networks.

advances in multimedia | 2004

Automatic sports highlights extraction with content augmentation

Kongwah Wan; Jinjun Wang; Changsheng Xu; Qi Tian

We describe novel methods to automatically augment content into video highlights detected from soccer and tennis video. First, we extract generic and domain-specific features from the video to isolate key audio-visual events that we have empirically found to correlate well with the ground-truth highlights. Next, based on a set of heuristics-driven rules to minimize view disruption, spatial regions in the image frames of these video highlight segments are segmented for content augmentation. Preliminary trials from subjective viewing indicate a high level of acceptance for the content insertions.

british machine vision conference | 2009

A Latent Model for Visual Disambiguation of Keyword-based Image Search

Kongwah Wan; Ah-Hwee Tan; Joo-Hwee Lim; Liang-Tien Chia; Sujoy Roy

The problem of polysemy in keyword-based image search arises mainly from the inherent ambiguity in user queries. We propose a latent model based approach that resolves user search ambiguity by allowing sense specific diversity in search results. Given a query keyword and the images retrieved by issuing the query to an image search engine, we first learn a latent visual sense model of these polysemous images. Next, we use Wikipedia to disambiguate the word sense of the original query, and issue these Wiki-senses as new queries to retrieve sense specific images. A sense-specific image classifier is then learnt by combining information from the latent visual sense model, and used to cluster and re-rank the polysemous images from the original query keyword into its specific senses. Results on a ground truth of 17K image set returned by 10 keyword searches and their 62 word senses provides empirical indications that our method can improve upon existing keyword based search engines. Our method learns the visual word sense models in a totally unsupervised manner, effectively filters out irrelevant images, and is able to mine the long tail of image search.

international conference on multimedia and expo | 2010

Faceted topic retrieval of news video using joint topic modeling of visual features and speech transcripts

Kongwah Wan; Ah-Hwee Tan; Joo-Hwee Lim; Liang-Tien Chia

Because of the inherent ambiguity in user queries, an important task of modern retrieval systems is faceted topic retrieval (FTR), which relates to the goal of returning diverse or novel information elucidating the wide range of topics or facets of the query need. We introduce a generative model for hypothesizing facets in the (news) video domain by combining the complementary information in the visual keyframes and the speech transcripts. We evaluate the efficacy of our multimodal model on the standard TRECVID-2005 video corpus annotated with facets. We find that: (1) the joint modeling of the visual and text (speech transcripts) information can achieve significant F-score improvement over a text-alone system; (2) our model compares favorably with standard diverse ranking algorithms such as the MMR [1]. Our FTR model has been implemented on a news search prototype that is undergoing commercial trial.

international conference on acoustics, speech, and signal processing | 2006

Video Clock Time Reconition Based on Temporal Periodic Pattern Change of the Digit Characters

Yiqun Li; Kongwah Wan; Xin Yan; Xinguo Yu; Changsheng Xu

A novel temporal neighboring pattern similarity (TNPS) measure is proposed to recognize the time of a digital clock overlay. TNPS detects the presence of a clock overlay by monitoring the periodic changes of the clock digit, and infers the clock time by its natural transition cycle. Compared to traditional methods such as OCR, this method is faster and more reliable because it converts a pattern recognition problem to a pattern change detection problem. Experiments show the recognition result is promising and accurate. One of the applications for this method is to detect the start time of soccer game for our real time live soccer video highlights and event alerts

acm multimedia | 2013

Metadata enrichment for news video retrieval: a graph-based propagation approach

Kongwah Wan; Wei-Yun Yau; Sujoy Roy

This paper summarizes our contribution to the Technicolor Rich Multimedia Retrieval from Input Videos Grand Challenge. We hold the view that semantic analysis of a given news video is best performed in the text domain. Starting with a noisy text obtained from applying Automatic Speech Recognition (ASR), a graph-based approach is then used to enrich the text by propagating labels from visually similar videos culled from parallel (YouTube) News sources. From the enriched text, we next extract salient keywords to form a query to a news video search engine, retrieving a larger corpus of related news video. Compared to a baseline method that only uses the ASR text, significant improvement in precision has been obtained, indicating that retrieval has benefited from the ingestion of the external labels. Capitalizing on the enriched metadata, we find that videos are more amenable to the Wikipedia-based Explicit Semantic Analysis (ESA), resulting in better support for subtopic news video retrieval. We apply our methods to an in-house live news search portal, and report on several best practices.

Explore More