Shi-Yong Neo
National University of Singapore
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shi-Yong Neo.
computer vision and pattern recognition | 2008
Yan-Tao Zheng; Ming Zhao; Shi-Yong Neo; Tat-Seng Chua; Qi Tian
We present a higher-level visual representation, visual synset, for object categorization. The visual synset improves the traditional bag of words representation with better discrimination and invariance power. First, the approach strengthens the inter-class discrimination power by constructing an intermediate visual descriptor, delta visual phrase, from frequently co-occurring visual word-set with similar spatial context. Second, the approach achieves better intra-class invariance power, by clustering delta visual phrases into visual synset, based their probabilistic dasiasemanticspsila, i.e. class probability distribution. Hence, the resulting visual synset can partially bridge the visual differences of images of same class. The tests on Caltech-101 and Pascal-VOC 05 dataset demonstrated that the proposed image representation can achieve good accuracies.
acm multimedia | 2003
Hui Yang; Lekha Chaisorn; Yunlong Zhao; Shi-Yong Neo; Tat-Seng Chua
When querying a news video archive, the users are interested in retrieving precise answers in the form of a summary that best answers the query. However, current video retrieval systems, including the search engines on the web, are designed to retrieve documents instead of precise answers. This research explores the use of question answering (QA) techniques to support personalized news video retrieval. Users interact with our system, VideoQA, using short natural language questions with implicit constraints on contents, context, duration, and genre of expected videos. VideoQA returns short precise news video summaries as answers. The main contributions of this research are: (a) the extension of QA technology to support QA in news video; and (b) the use of multi-modal features, including visual, audio, textual, and external resources, to help correct speech recognition errors and to perform precise question answering. The system has been tested on 7 days of news video and has been found to be effective.
international world wide web conferences | 2008
Xiao Wu; Jintao Li; Yongdong Zhang; Sheng Tang; Shi-Yong Neo
In this paper, we highlight the use of multimedia technology in generating intrinsic summaries of tourism related information. The system utilizes an automated process to gather, filter and classify information on various tourist spots on the Web. The end result present to the user is a personalized multimedia summary generated with respect to users queries filled with text, image, video and real-time news made retrievable for mobile devices. Preliminary experiments demonstrate the superiority of our presentation scheme to traditional methods.
conference on image and video retrieval | 2008
Yan-Tao Zheng; Shi-Yong Neo; Tat-Seng Chua; Qi Tian
We present a probabilistic ranking-driven classifier for the detection of video semantic concept, such as airplane, building, etc. Most existing concept detection systems utilize Support Vector Machines (SVM) to perform the detection and ranking of retrieved video shots. However, the margin maximization principle of SVM does not perform ranking optimization but merely classification error minimization. To tackle this problem, we exploit the sparse Bayesian kernel model, namely the relevance vector machine (RVM), as the classifier for semantic concept detection. Based on automatic relevance determination principle, RVM outputs the posterior probabilistic prediction of the semantic concepts. This inference output is optimal for ranking the target video shots, according to the Probabilistic Ranking Principle. The probability output of RVM on individual uni-modal features also facilitates probabilistic fusion of multi-modal evidences to minimize Bayes risk. We demonstrate both theoretically and empirically that RVM outperforms SVM for video semantic concept detection. The testings on TRECVID 07 dataset show that RVM produces statically significant improvements in MAP scores over the SVM-based methods.
acm multimedia | 2007
Huanbo Luan; Shi-Yong Neo; Hai-Kiat Goh; Yongdong Zhang; Shouxun Lin; Tat-Seng Chua
Existing video research incorporates the use of relevance feedback based on user-dependent interpretations to improve the retrieval results. In this paper, we segregate the process of relevance feedback into 2 distinct facets: (a) recall-directed feedback; and (b) precision-directed feedback. The recall-directed facet employs general features such as text and high level features (HLFs) to maximize efficiency and recall during feedback, making it very suitable for large corpuses. The precision-directed facet on the other hand uses many other multimodal features in an active learning environment for improved accuracy. Combined with a performance-based adaptive sampling strategy, this process continuously re-ranks a subset of instances as the user annotates. Experiments done using TRECVID 2006 dataset show that our approach is efficient and effective.
conference on image and video retrieval | 2008
Huanbo Luan; Yan-Tao Zheng; Shi-Yong Neo; Yongdong Zhang; Shouxun Lin; Tat-Seng Chua
In this paper, we propose adaptive multiple feedback strategies for interactive video retrieval. We first segregate interactive feedback into 3 distinct types (recall-driven relevance feedback, precision-driven active learning and locality-driven relevance feedback) so that a generic interaction mechanism with more flexibility can be performed to cover different search queries and different video corpuses. Our system facilitates expert searchers to flexibly decide on the types of feedback they want to employ under different situations. To cater to the large number of novice users (non-expert users), an adaptive option is built-in to learn the expert user behavior so as to provide recommendations on the next feedback strategy, leading to a more precise and personalized search for the novice users. Experimental results on TRECVID news video corpus demonstrate that our proposed adaptive multiple feedback strategies are effective.
conference on image and video retrieval | 2007
Yan-Tao Zheng; Shi-Yong Neo; Tat-Seng Chua; Qi Tian
Near-duplicate keyframes (NDKs) are important visual cues to link news stories from different TV channel, time, language, etc. However, the quadratic complexity required for NDK detection renders it intractable in large-scale news video corpus. To address this issue, we propose a temporal, semantic and visual partitioning model to divide the corpus into small overlapping partitions by exploiting domain knowledge and corpus characteristics. This enables us to efficiently detect NDKs in each partition separately and then link them together across partitions. We divide the corpus temporally into sequential partitions and semantically into news story genre groups; and within each partition, we visually group potential NDKs by using asymmetric hierarchical k-means clustering on our proposed semi-global image features. In each visual group, we detect NDK pairs by exploiting our proposed SIFT-based fast keypoint matching scheme based on local color information of keypoints. Finally, the detected NDK groups in each partition are linked up via transitivity propagation of NDKs shared by different partitions. The testing on TRECVID 06 corpus with 62k keyframes shows that our proposed approach could result in multifold increase in speed as compared to the best reported approach and complete the NDK detection in a manageable time with satisfactory accuracy.
conference on image and video retrieval | 2008
Shi-Yong Neo; Huanbo Luan; Yan-Tao Zheng; Hai-Kiat Goh; Tat-Seng Chua
This paper describes our system VisionGo which provides an interactive platform for video retrieval. The system is fitted with an intuitive interface and an automated backend recommender that recommends users the optimal feedback technique during retrieval.
acm multimedia | 2006
Shi-Yong Neo; Yan-Tao Zheng; Tat-Seng Chua; Qi Tian
Precise automated video search is gaining in importance as the amount of multimedia information is increasing at exponential rates. One of the drawbacks that make video retrieval difficult is the lack of available semantics. In this paper, we propose to supplement the semantic knowledge for retrieval by providing useful semantic clusters derived from event entities present in the news video. These entities include the output from keywords derived from the automated speech recognition (ASR) and event-related High-level Features (HLF) extracted from the news video at the pseudo story level. Fuzzy clustering is then carried out to group similar stories together to form semantic clusters. The retrieval system utilizes these clusters to refine the re-ranking process in the Pseudo Relevance Feedback (PRF) step. Initial experiments performed on video search task using the TRECVID 2005 dataset show that the proposed approach can improve the search performance significantly.
international convention on rehabilitation engineering & assistive technology | 2007
Shi-Yong Neo; Hai-Kiat Goh; Wendy Yen-Ni Ng; Jun-Da Ong; Wilson Pang
One of the common difficulties faced by the visually impaired is the inability to read and thus affecting their way of life. Existing portable reading devices (using character recognition and speech synthesis) have many limitations and poor in accuracy due to restrictive processing power. In this paper, we introduce our robust online multimedia content processing framework to alleviate the limitations of such portable devices. We leverage high transfer speed using existing wireless networks to send multimedia information captured from mobile devices to high-end processing servers and subsequently stream the desired output back to users. The resultant framework enables more complex processes as they are carried out on the servers and thus outperforms standard portable devices in terms of accuracy and functionalities. In addition, we describe a new approach to improve optical character recognition (OCR) results by using consecutive video frames for automatic character correction. Experiments using consecutive frames show an improvement in 25% accuracy over traditional OCR using a single image. The application is also trialed by several visually impaired personnel and the feedback obtained is encouraging.