Shiyang Lu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shiyang Lu is active.

Explore More

Publication

Featured researches published by Shiyang Lu.

IEEE Transactions on Circuits and Systems for Video Technology | 2013

Keypoint-Based Keyframe Selection

Genliang Guan; Zhiyong Wang; Shiyang Lu; Jeremiah D. Deng; David Dagan Feng

Keyframe selection has been crucial for effective and efficient video content analysis. While most of the existing approaches represent individual frames with global features, we, for the first time, propose a keypoint-based framework to address the keyframe selection problem so that local features can be employed in selecting keyframes. In general, the selected keyframes should both be representative of video content and containing minimum redundancy. Therefore, we introduce two criteria, coverage and redundancy, based on keypoint matching in the selection process. Comprehensive experiments demonstrate that our approach outperforms the state of the art.

IEEE Transactions on Multimedia | 2014

A Bag-of-Importance Model With Locality-Constrained Coding Based Feature Learning for Video Summarization

Shiyang Lu; Zhiyong Wang; Tao Mei; Genliang Guan; David Dagan Feng

Video summarization helps users obtain quick comprehension of video content. Recently, some studies have utilized local features to represent each video frame and formulate video summarization as a coverage problem of local features. However, the importance of individual local features has not been exploited. In this paper, we propose a novel Bag-of-Importance (BoI) model for static video summarization by identifying the frames with important local features as keyframes, which is one of the first studies formulating video summarization at local feature level, instead of at global feature level. That is, by representing each frame with local features, a video is characterized with a bag of local features weighted with individual importance scores and the frames with more important local features are more representative, where the representativeness of each frame is the aggregation of the weighted importance of the local features contained in the frame. In addition, we propose to learn a transformation from a raw local feature to a more powerful sparse nonlinear representation for deriving the importance score of each local feature, rather than directly utilize the hand-crafted visual features like most of the existing approaches. Specifically, we first employ locality-constrained linear coding (LCC) to project each local feature into a sparse transformed space. LCC is able to take advantage of the manifold geometric structure of the high dimensional feature space and form the manifold of the low dimensional transformed space with the coordinates of a set of anchor points. Then we calculate the l2 norm of each anchor point as the importance score of each local feature which is projected to the anchor point. Finally, the distribution of the importance scores of all the local features in a video is obtained as the BoI representation of the video. We further differentiate the importance of local features with a spatial weighting template by taking the perceptual difference among spatial regions of a frame into account. As a result, our proposed video summarization approach is able to exploit both the inter-frame and intra-frame properties of feature representations and identify keyframes capturing both the dominant content and discriminative details within a video. Experimental results on three video datasets across various genres demonstrate that the proposed approach clearly outperforms several state-of-the-art methods.

acm multimedia | 2012

Local visual words coding for low bit rate mobile visual search

Yue Wu; Shiyang Lu; Tao Mei; Jian Zhang; Shipeng Li

Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 30-40KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit rate mobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server.

Journal of Visual Communication and Image Representation | 2013

Fast human action classification and VOI localization with enhanced sparse coding

Shiyang Lu; Jian Zhang; Zhiyong Wang; David Dagan Feng

Sparse coding which encodes the natural visual signal into a sparse space for visual codebook generation and feature quantization, has been successfully utilized for many image classification applications. However, it has been seldom explored for many video analysis tasks. In particular, the increased complexity in characterizing the visual patterns of diverse human actions with both the spatial and temporal variations imposes more challenges to the conventional sparse coding scheme. In this paper, we propose an enhanced sparse coding scheme through learning discriminative dictionary and optimizing the local pooling strategy. Localizing when and where a specific action happens in realistic videos is another challenging task. By utilizing the sparse coding based representations of human actions, this paper further presents a novel coarse-to-fine framework to localize the Volumes of Interest (VOIs) for the actions. Firstly, local visual features are transformed into the sparse signal domain through our enhanced sparse coding scheme. Secondly, in order to avoid exhaustive scan of entire videos for the VOI localization, we extend the Spatial Pyramid Matching into temporal domain, namely Spatial Temporal Pyramid Matching, to obtain the VOI candidates. Finally, a multi-level branch-and-bound approach is developed to refine the VOI candidates. The proposed framework is also able to avoid prohibitive computations in local similarity matching (e.g., nearest neighbors voting). Experimental results on both two popular benchmark datasets (KTH and YouTube UCF) and the widely used localization dataset (MSR) demonstrate that our approach reduces computational cost significantly while maintaining comparable classification accuracy to that of the state-of-the-art methods.

international conference on image processing | 2010

Adaptive reference frame selection for near-duplicate video shot detection

Shiyang Lu; Zhiyong Wang; Meng Wang; Max Ott; David Dagan Feng

Near-duplicate video shots provide critical visual link between videos and detecting such video shots efficiently and effectively is of paramount importance in many applications such as detecting copyright infringement. In this paper, we propose an improved near-duplicate video shot detection approach by adaptively selecting reference frames for more effective shot representation. The correlation between adjacent frames is measured with Pearsons Correlation Coefficient (PCC) so that a set of compact yet representative reference frames can be selected adaptively in terms of content variation within video shots. Interest points are further extracted from the selected frames to effectively represent shot contents for similarity matching. Comprehensive experimental results on TRECVID-2008 corpus demonstrate that our proposed approach outperforms the state-of-the-art method effectively.

international conference on multimedia and expo | 2013

A bag-of-importance model for video summarization

Shiyang Lu; Zhiyong Wang; Yuan Song; Tao Mei; David Dagan Feng

In this paper, we propose a novel local feature based approach, namely Bag-of-Importance (BoI) model, for static video summarization, while most of the existing approaches characterize each video frame with global features to derive the importance of each frame. Since local features such as interest points are more discriminative in characterizing visual content, we formulate static video summarization as a problem of identifying representative frames which contain more important local features, where the representativeness of each frame is the aggregation of the importance of the local features contained in the frame. In order to derive the importance of each local feature for a given video, we employ sparse coding to project each local feature into a sparse space, calculate the l2 norm of the sparse coefficients for each local feature, and generate the BoI representation with the distribution of the importance over all the local features in the video. We further take the perceptual difference among spatial regions of a frame into account, a spatial weighting template is utilized to differentiate the importance of local features for the individual frames. With the proposed video summarization scheme, both the inter-frame and intra-frame properties of local features are exploited, which allows the selected frames capture both the dominant content and discriminative details within a video. Experimental results on a dataset across several genres demonstrate that the proposed approach clearly outperforms the state-of-the-art method.

acm multimedia | 2012

Browse-to-search

Shiyang Lu; Tao Mei; Jingdong Wang; Jian Zhang; Zhiyong Wang; David Dagan Feng; Jian-Tao Sun; Shipeng Li

This demonstration presents a novel interactive online shopping application based on visual search technologies. When users want to buy something on a shopping site, they usually have the requirement of looking for related information from other web sites. Therefore users need to switch between the web page being browsed and other websites that provide search results. The proposed application enables users to naturally search products of interest when they browse a web page, and make their even causal purchase intent easily satisfied. The interactive shopping experience is characterized by: 1) in session---it allows users to specify the purchase intent in the browsing session, instead of leaving the current page and navigating to other websites; 2) in context---the browsed web page provides implicit context information which helps infer user purchase preferences; 3) in focus---users easily specify their search interest using gesture on touch devices and do not need to formulate queries in search box; 4) natural-gesture inputs and visual-based search provides users a natural shopping experience. The system is evaluated against a data set consisting of several millions commercial product images.

Signal Processing | 2016

Investigating the impact of frame rate towards robust human action recognition

Fredro Harjanto; Zhiyong Wang; Shiyang Lu; Ah Chung Tsoi; David Dagan Feng

Human action recognition from videos is very important for visual analytics. Due to increasing abundance of diverse video content in the era of big data, research on human action recognition has recently shifted towards more challenging and realistic settings. Frame rate is one of key issues in diverse and realistic video settings. While there have been several evaluation studies investigating different aspects of action recognition such as different visual descriptors, the frame rate issue has been seldom addressed in the literature. Therefore, in this paper, we investigate the impact of frame rate on human action recognition with several state-of-the-art approaches and three benchmark datasets. Our experimental results indicate that those state-of-the-art approaches are not robust to the variations of frame rate. As a result, more robust visual features and advanced learning algorithms are required to further improve human action recognition performance towards its more practical deployments. In addition, we investigate key-frame selection techniques for choosing a set of suitable frames from an action sequence for action recognition. Promising results indicate that well designed key-frame selection methods can produce a set of representative frames and eventually reduce the impact of frame rate on the performance of human action recognition. HighlightsOne of the first studies on evaluating the impact of frame rate on video based human action recognition.Evaluation with four state-of-the-art methods and three widely used datasets.Investigation on novel key frame selection method for action recognition.A comprehensive study of exploring the impact of frame rate on action recognition performance.

image and vision computing new zealand | 2012

Evaluating the impact of frame rate on video based human action recognition

Fredro Harjanto; Zhiyong Wang; Shiyang Lu; David Dagan Feng

Human action recognition in videos is one of the classic problems in computer vision domain due to its wide range of applications as well as its challenges. Although most existing approaches perform very well for specific datasets, there is little research on how practical and robust it is to extend those approaches into realistic scenarios where videos are often acquired with different frame rates from diverse imaging devices. In this paper, we investigate and evaluate recognition performance of four state-of-the-art human action recognition approaches across two widely used benchmark datasets with three different frame rate settings. It is observed in our comprehensive experiments that frame rate does affect recognition performance. Particularly, the impact of frame rate is not consistent across different scenarios. Therefore, better recognition approaches, including novel visual features and learning algorithms robust to frame rate variation, are demanded to further advance human action recognition towards more practical deployment.

international conference on computer vision | 2011

Structure context of local features in realistic human action recognition

Qiuxia Wu; Shiyang Lu; Zhiyong Wang; Feiqi Deng; Wenxiong Kang; David Dagan Feng

Realistic human action recognition has been emerging as a challenging research topic due to the diffculties of representing different human actions in diverse realistic scenes. In the bag-of-features model, human actions are generally represented with the distribution of local features derived from the keypoints of action videos. Various local features have been proposed to characterize those key points. However, the important structural information among the key points has not been well investigated yet. In this paper, we propose to characterize such structure information with shape context. Therefore, each keypoint is characterized with both its local visual attributes and its global structural context contributed by other keypoints. The bag-of-features model is utilized for representing each human action and SVM is employed to perform human action recognition. Experimental results on the challenging YouTube dataset and HOHA-2 dataset demonstrate that our proposed approach accounting for structural information is more effective in representing realistic human actions. In addition, we also investigate the impact of choosing different local features such as SIFT, HOG, and HOF descriptors in human action representation. It is observed that dense keypoints can better exploit the advantages of our proposed approach.

Explore More