Ling-Yu Duan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ling-Yu Duan is active.

Explore More

Publication

Featured researches published by Ling-Yu Duan.

acm multimedia | 2006

Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis

Ling-Yu Duan; Jinqiao Wang; Yan-Tao Zheng; Jesse S. Jin; Hanqing Lu; Changsheng Xu

TV advertising is ubiquitous, perseverant, and economically vital. Millions of peoples living and working habits are affected by TV commercials. In this paper, we present a multimodal (visual + audio + text) commercial video digest scheme to segment individual commercials and carry out semantic content analysis within a detected commercial segment from TV streams.Two challenging issues are addressed. Firstly, we propose a multimodal approach to robustly detect the boundaries of individual commercials. Secondly, we attempt to classify a commercial with respect to advertised products/services. For the first, the boundary detection of individual commercials is reduced to the problem of binary classification of shot boundaries via the mid-level features derived from two concepts: Image Frames Marked with Product Information (FMPI) and Audio Scene Change Indicator (ASCI). Moreover, the accurate individual boundary enables us to perform commercial identification by clip matching via a spatial-temporal signature. For the second, commercial classification is formulated as the task of text categorization by expanding sparse texts from ASR/OCR with external knowledge. Our boundary detection has achieved a good result of F1 = 93.7% on the dataset comprising 499 individual commercials from TRECVID05 video corpus. Commercial classification has obtained a promising accuracy of 80.9% on 141 distinct ones. Based on these achievements, various applications such as an intelligent digital TV set-top box can be accomplished to enhance the TV viewers capabilities in monitoring and managing commercials from TV streams.

acm multimedia | 2008

Hierarchical movie affective content analysis based on arousal and valence features

Min Xu; Jesse S. Jin; Suhuai Luo; Ling-Yu Duan

Emotional factors directly reflect audiences attention, evaluation and memory. Affective contents analysis not only create an index for users to access their interested movie segments, but also provide feasible entry for video highlights. Most of the work focus on emotion type detection. Besides emotion type, emotion intensity is also a significant clue for users to find their interested content. For some film genres (Horror, Action, etc), the segments with high emotion intensity have the most possibilities to be video highlights. In this paper, we propose a hierarchical structure for emotion categories and analyze emotion intensity and emotion type by using arousal and valence related features hierarchically. Firstly, High, Medium and Low are detected as emotion intensity levels by using fuzzy c-mean clustering on arousal features. Fuzzy clustering provides a mathematical model to represent vagueness, which is close to human perception. After that, valence related features are used to detect emotion types (Anger, Sad, Fear, Happy and Neutral). Considering video is continuous time series data and the occurrence of a certain emotion is affected by recent emotional history, Hidden Markov Models (HMMs) are used to capture the context information. Experimental results shows the movie segments with high emotion intensity cover over 80% of the movie highlights in Horror and Action movies and the hierarchical method outperforms the one-step method on emotion type detection. Meanwhile, it is flexible for user to pick up their favorite affective content by choosing both emotion intensity levels and emotion types.

international conference on multimedia and expo | 2006

A Robust Method for TV Logo Tracking in Video Streams

Jinqiao Wang; Ling-Yu Duan; Zhenglong Li; Jing Liu; Hanqing Lu; Jesse S. Jin

Most broadcast stations rely on TV logos to claim video content ownership or visually distinguish the broadcast from the interrupting commercial block. Detecting and tracking a TV logo is of interest to TV commercial skipping applications and logo-based broadcasting surveillance (abnormal signal is accompanied by logo absence). Pixel-wise difference computing within predetermined logo regions cannot address semi-transparent TV logos well for the blending effects of a logo itself and inconstant background images. Edge-based template matching is weak for semi-transparent ones when incomplete edges appear. In this paper we present a more robust approach to detect and track TV logos in video streams on the basis of multispectral images gradient. Instead of single frame based detection, our approach makes use of the temporal correlation of multiple consecutive frames. Since it is difficult to manually delineate logos of irregular shape, an adaptive threshold is applied to the gradient image in subpixel space to extract the logo mask. TV logo tracking is finally carried out by matching the masked region with a known template. An extensive comparison experiment has shown our proposed algorithm outperforms traditional methods such as frame difference, single frame-based edge matching. Our experimental dataset comes from part of TRECVID2005 news corpus and several Chinese TV channels with challenging TV logos

acm multimedia | 2009

Consumer video retargeting: context assisted spatial-temporal grid optimization

Liang Shi; Jinqiao Wang; Ling-Yu Duan; Hanqing Lu

Pervasive multimedia devices require accurate video retargeting, especially in connected consumer electronics platforms. In this paper, we present a context assisted spatialtemporal grid scheme for consumer video retargeting. First, we parse consumer videos from low-level features to highlevel visual concepts, combining visual attention into a more accurate importance description. Then, a semantic importance map is built up representing the spatial importance and temporal continuity, which is incorporated with a 3D rectilinear grid scaleplate to map frames to the target display, thereby keeping the aspect ratio of semantically salient objects as well as the perceptual coherency. Extensive evaluations were done on two popular video genres, sports and advertisements. The comparison with state-of-the-art approaches on both images and videos have demonstrated the advantages of the proposed approach.

acm multimedia | 2009

Automatic sports genre categorization and view-type classification over large-scale dataset

Lingfang Li; Ning Zhang; Ling-Yu Duan; Qingming Huang; Jun Du; Ling Guan

This paper presents a framework with two automatic tasks targeting large-scale and low quality sports video archives collected from online video streams. The framework is based on the bag of visual-words model using speeded-up robust features (SURF). The first task is sports genre categorization based on hierarchical structure. Following on the second task which is based on automatically obtained genre, views are classified using support vector machines (SVMs). As a consequence, the views classification result can be used in video parsing and highlight extraction. As compared with state-of-the-art methods, our approach is fully automatic as well as domain knowledge free and thus provides a better extensibility. Furthermore, our dataset consists of 14 sport genres with 6850 minutes in total. Both sport genre categorization and view type classification have more than 80% accuracy rates, which validate this frameworks robustness and potential in web-based applications.

acm multimedia | 2002

A unified framework for semantic shot classification in sports videos

Ling-Yu Duan; Min Xu; Xiao-Dong Yu; Qi Tian

The extensive amount of multimedia information available necessitates content-based video indexing and retrieval methods. Since humans tend to use high-level semantic concepts when querying and browsing multimedia databases, there is an increasing need for semantic video indexing and analysis. For this purpose, we present a unified framework for semantic shot classification in sports video, which has been widely studied due to tremendous commercial potentials. Unlike most existing approaches, which focus on clustering by aggregating shots or key-frames with similar low-level features, the proposed scheme employs supervised learning to perform a top-down video shot classification. Moreover, the supervised learning procedure is constructed on the basis of effective mid-level representations instead of exhaustive low-level features. This framework consists of three main steps: 1) identify video shot classes for each sport; 2) develop a common set of motion, color, shot length-related mid-level representations; and 3) supervised learning of the given sports video shots. It is observed that for each sport we can predefine a small number of semantic shot classes, about 5-10, which covers 90%-95% of broadcast sports video. We employ nonparametric feature space analysis to map low-level features to mid-level semantic video shot attributes such as dominant object (a player) motion, camera motion patterns, and court shape, etc. Based on the fusion of those mid-level shot attributes, we classify video shots into the predefined shot classes, each of which has clear semantic meanings. With this framework we have achieved good classification accuracy of 85%-95% on the game videos of five typical ball type sports (i.e., tennis, basketball, volleyball, soccer, and table tennis) with over 5500 shots of about 8 h. With correctly classified sports video shots, further structural and temporal analysis, such as event detection, highlight extraction, video skimming, and table of content, will be greatly facilitated.

international conference on multimedia and expo | 2007

Robust Commercial Retrieval in Video Streams

Jinqiao Wang; Ling-Yu Duan; Qingshan Liu; Hanqing Lu; Jesse S. Jin

TV commercial video is a kind of informative medium. To fast and robustly index and retrieve commercial videos is of interest to commercial monitor, copyright protection, and commercial management, we propose a coarse-to-fine scheme to robustly retrieve commercial videos. Different from previous work using clip or key frames-based matching, our scheme has incorporated the commercial production knowledge to search the candidate commercial positions. Color and ordinal features are extracted for locating the exact commercial positions with dynamic time warping distance. Comparison experiments were carried out over TRECVID 2006 news videos and some videos from Chinese channels. Our scheme has achieved promising simulation results.

international conference on multimedia and expo | 2009

Linking video ADS with product or service information by web search

Jinqiao Wang; Ling-Yu Duan; Bo Wang; Shi Chen; Yi Ouyang; Jing Liu; Hanqing Lu; Wen Gao

With the proliferation of online media services, video ads are pervasive across various platforms involving internet services and interactive TV services. Existing research efforts such as Google AdSense and MSRA VideoSense/ImageSense have been devoted to the less intrusive insertion of relevant textual or video ads in streams or web pages through text/image/video content analysis whereas the inherent semantics of video ads is much less exploited. In this paper, we propose to link video ads with relevant product/service information across E-commerce websites or portals towards ad recommendation in a cross-media manner. Firstly, we carry out semantic analysis within ad videos in which Frames Marked with Product Images (FMPI) are extracted. Secondly, we link ad videos with relevant ads on the Web by utilizing FMPI to search visually similar Product Images (e.g. appearance or logo) and to collect their accompanying text (brand name, category, description, or other tags) over popular E-commerce websites or portals such as EBay, Amazon, Taobao, etc. We search visually similar product images with Local Sensitive Hashing (LSH) in a Naïve Bayes Near Neighbor classifier. Finally, we may recommend more relevant products/services for ad videos through ranking those matched product images and categorizing useful tags of top ranked ads from the Web. Preliminary experiments have been carried out to demonstrate the idea of linking ad videos with product/service information from the Web.

pacific rim conference on multimedia | 2002

Foreground Segmentation Using Motion Vectors in Sports Video

Ling-Yu Duan; Xiao-Dong Yu; Min Xu; Qi Tian

In this paper, we present an effective algorithm for foreground objects segmentation for sports video. This algorithm consists of three steps: low-level features extraction, camera motion estimate, and foreground object extraction. We employ a robust M-estimator to motion vectors fields to estimate global camera motion parameters based on a four-parameter camera motion model, followed by outliers analysis using robust weights instead of the residuals to extract foreground objects. Based on the fact that foreground objects motion patterns are independent of the global motion model caused by camera motions such as pan, tilt, and zooming, we considers those macro-blocks as foreground, which corresponds to the outliers blocks during robust regression procedure. Experiments showed that the proposed algorithm can robustly extract foreground objects like tennis players and estimate camera motion parameters. Based on these results, high-level semantic video indexing such as event detection and sports video structure analysis can be greatly facilitated. Furthermore, basing the algorithm on compressed domain features can achieve great saving in computation.

international conference on data mining | 2009

Semantic Linking between Video Ads and Web Services with Progressive Search

Bo Wang; Jinqiao Wang; Shi Chen; Ling-Yu Duan; Hanqing Lu

With the proliferation of online media services, ad video has become an important way to promote various products, services and ideas. Research efforts have been devoted to the contextual advertising whereas comprehensive recommendation of video ads is less exploited. In this paper, we propose to establish a semantic linking between video ads and relevant product/service online in a cross-media manner. First, we extract a representative key frame from the ad video and then conduct a three-step progressive search (i. e., visual search, tag aggregation and textual re-search) to link video ads with relevant Web service. We search visually similar product images, rank the context textual information by tags aggregation, and refine the results by textual re-search. Finally, we collect relevant products for user recommendation. Experiments on some popular E-commerce websites like eBay and Amazon have demonstrated the attractiveness of the semantic linking.

Explore More