Shuqiang Jiang
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shuqiang Jiang.
acm multimedia | 2010
Shiliang Zhang; Qingming Huang; Gang Hua; Shuqiang Jiang; Wen Gao; Qi Tian
Not withstanding its great success and wide adoption in Bag-of-visual Words representation, visual vocabulary created from single image local features is often shown to be ineffective largely due to three reasons. First, many detected local features are not stable enough, resulting in many noisy and non-descriptive visual words in images. Second, single visual word discards the rich spatial contextual information among the local features, which has been proven to be valuable for visual matching. Third, the distance metric commonly used for generating visual vocabulary does not take the semantic context into consideration, which renders them to be prone to noise. To address these three confrontations, we propose an effective visual vocabulary generation framework containing three novel contributions: 1) we propose an effective unsupervised local feature refinement strategy; 2) we consider local features in groups to model their spatial contexts; 3) we further learn a discriminant distance metric between local feature groups, which we call discriminant group distance. This group distance is further leveraged to induce visual vocabulary from groups of local features. We name it contextual visual vocabulary, which captures both the spatial and semantic contexts. We evaluate the proposed local feature refinement strategy and the contextual visual vocabulary in two large-scale image applications: large-scale near-duplicate image retrieval on a dataset containing 1.5 million images and image search re-ranking tasks. Our experimental results show that the contextual visual vocabulary shows significant improvement over the classic visual vocabulary. Moreover, it outperforms the state-of-the-art Bundled Feature in the terms of retrieval precision, memory consumption and efficiency.
acm multimedia | 2007
Guangyu Zhu; Qingming Huang; Changsheng Xu; Yong Rui; Shuqiang Jiang; Wen Gao; Hongxun Yao
Most of existing approaches on event detection in sports video are general audience oriented. The extracted events are then presented to the audience without further analysis. However, professionals, such as soccer coaches, are more interested in the tactics used in the events. In this paper, we present a novel approach to extract tactic information from the goal event in broadcast soccer video and present the goal event in a tactic mode to the coaches and sports professionals. We first extract goal events with far-view shots based on analysis and alignment of web-casting text and broadcast video. For a detected goal event, we employ a multi-object detection and tracking algorithm to obtain the players and ball trajectories in the shot. Compared with existing work, we proposed an effective tactic representation called aggregate trajectory which is constructed based on multiple trajectories using a novel analysis of temporal-spatial interaction among the players and the ball. The interactive relationship with play region information and hypothesis testing for trajectory temporal-spatial distribution are exploited to analyze the tactic patterns in a hierarchical coarse-to-fine framework. The experimental results on the data of FIFA World Cup 2006 are promising and demonstrate our approach is effective.
IEEE Transactions on Multimedia | 2009
Guangyu Zhu; Changsheng Xu; Qingming Huang; Yong Rui; Shuqiang Jiang; Wen Gao; Hongxun Yao
Most existing approaches on sports video analysis have concentrated on semantic event detection. Sports professionals, however, are more interested in tactic analysis to help improve their performance. In this paper, we propose a novel approach to extract tactic information from the attack events in broadcast soccer video and present the events in a tactic mode to the coaches and sports professionals. We extract the attack events with far-view shots using the analysis and alignment of web-casting text and broadcast video. For a detected event, two tactic representations, aggregate trajectory and play region sequence, are constructed based on multi-object trajectories and field locations in the event shots. Based on the multi-object trajectories tracked in the shot, a weighted graph is constructed via the analysis of temporal-spatial interaction among the players and the ball. Using the Viterbi algorithm, the aggregate trajectory is computed based on the weighted graph. The play region sequence is obtained using the identification of the active field locations in the event based on line detection and competition network. The interactive relationship of aggregate trajectory with the information of play region and the hypothesis testing for trajectory temporal-spatial distribution are employed to discover the tactic patterns in a hierarchical coarse-to-fine framework. Extensive experiments on FIFA World Cup 2006 show that the proposed approach is highly effective.
acm multimedia | 2007
Huiying Liu; Shuqiang Jiang; Qingming Huang; Changsheng Xu; Wen Gao
Visual attention has been a hot research point for many years and many new applications are emerging especially for wireless multimedia services. In this paper a novel region-based visual attention is proposed to detect the Regions of Interest (ROI) of images. In the proposed method, density based image segmentation is first performed by regarding region as the perceptive unit, which makes the model robust to the scale of ROIs and contains more perceptive information. To generate region saliency map to detect ROI, global effect and contextual difference are covered in the form of distance factor and adjacency factor respectively. Since different ROIs may have different importance for different purposes, a ROI ranking algorithm is designed for browsing large images on small displays. Experimental results and evaluation reveal that our method works effectively to detect ROIs from images and the users are satisfied with the browsing sequence on small displays.
IEEE Transactions on Multimedia | 2010
Shiliang Zhang; Qingming Huang; Shuqiang Jiang; Wen Gao; Qi Tian
In modern times, music video (MV) has become an important favorite pastime to people because of its conciseness, convenience, and the ability to bring both audio and visual experiences to audiences. As the amount of MVs is explosively increasing, it has become an important task to develop new techniques for effective MV analysis, retrieval, and management. By stimulating the human affective response mechanism, affective video content analysis extracts the affective information contained in videos, and, with the affective information, natural, user-friendly, and effective MV access strategies could be developed. In this paper, a novel integrated system (i.MV) is proposed for personalized MV affective analysis, visualization, and retrieval. In i.MV, we not only perform the personalized MV affective analysis, which is a challenging and insufficiently covered problem in current affective content analysis field, but also propose novel affective visualization to convert the abstract affective states intuitive and friendly to users. Based on the affective analysis and visualization, affective information based MV retrieval is achieved. Both comprehensive experiments and subjective user studies on a large MV dataset demonstrate that our personalized affective analysis is more effective than the previous algorithms. In addition, affective visualization is proved to be more suitable for affective information-based MV retrieval than the commonly used affective state representation strategies.
acm multimedia | 2008
Huiying Liu; Shuqiang Jiang; Qingming Huang; Changsheng Xu
This paper presents a generic Virtual Content Insertion (VCI) system based on visual attention analysis. VCI is an emerging application of video analysis and has been used in video augmentation and advertisement insertion. There are three critical issues for a VCI system: when (time), where (place) and how (method) to insert the Virtual Content (VC) into the video. Our system selects the insertion time and place by performing temporal and spatial attention analysis, which predicts the attention change along time and the attended region over space. In order to enable the inserted VC to be noticed by audience while not to interrupt the audiences viewing experience to the original content, the VC should be inserted at the time when the video content attracts much audience attention and at the place where attracts less. Dynamic insertion is performed by using Global Motion Estimation (GME) and affine transformation. Our VCI system is able to obtain an optimal balance between the notice of the VC by audience and disruption of viewing experience to the original content. Extensive subjective evaluations based on user study on the VCI result have verified the effectiveness of the system.
pacific rim conference on multimedia | 2008
Yu Gong; Weiqiang Wang; Shuqiang Jiang; Qingming Huang; Wen Gao
To detect violence in movies, we present a three-stage method integrating visual and auditory cues. In our method, those shots with potential violent content are first identified according to universal film-making rules. A modified semi-supervised learning technique based on semi-supervised cross feature learning (SCFL) is exploited, since it is capable to combine different types of features and use unlabeled data to improve the classification performance. Then, typical violence-related audio effects are further detected for the candidate shots, and we manage to transform the confidences outputted by the classifiers of various audio events into a shot-based violence score. Finally, the first two-stage probabilistic outputs are integrated in a boosting way to generate the final inference. The experimental results on four typical action movies preliminarily show the effectiveness of our method.
international conference on multimedia and expo | 2008
Shiliang Zhang; Qi Tian; Shuqiang Jiang; Qingming Huang; Wen Gao
Nowadays, MTV has become an important favorite pastime to modern people because of its conciseness, convenience to play and the characteristic that can bring both audio and visual experiences to audiences. In this paper, we propose an affective MTV analysis framework, which realizes MTV affective state extraction, representation and clustering. Firstly, affective features are extracted from both audio and visual signals. Then, the affective state of each MTV is modeled with 2D dimensional affective model and visualized in the Arousal-Valence space. Finally the MTVs having similar affective states are clustered into same categories. The validity of proposed framework is proved by subjective user study. The comparisons between our selected features and those in related work prove that our features improve the performance by a significant margin.
international conference on acoustics, speech, and signal processing | 2005
Yang Liu; Shuqiang Jiang; Qixiang Ye; Wen Gao; Qingming Huang
Playfield detection is a key step in sports video content analysis, since many semantic clues could be inferred from it. In this paper we propose an adaptive GMM based algorithm for playfield detection. Its advantages are twofold. First, it can update model parameters by the incremental expectation maximization (IEM) algorithm, which enables the model to adapt to the playfield variation with time; Second, online training is performed, which saves buffer for training samples. Then, the playfield detection results are applied in recognizing the key zone of the current playfield in soccer video, in which a fast algorithm based on playfield contour and least square is proposed. Experimental results show that the proposed algorithms are encouraging.
acm multimedia | 2004
Shuqiang Jiang; Qixiang Ye; Wen Gao; Tiejun Huang
With the growing popularity of digitized sports video, automatic analysis of them need be processed to facilitate semantic summarization and retrieval. Playfield plays the fundamental role in automatically analyzing many sports programs. Many semantic clues could be inferred from the results of playfield segmentation. In this paper, a novel playfield segmentation method based on Gaussian mixture models (GMMs) is proposed. Firstly, training pixels are automatically sampled from frames. Then, by supposing that field pixels are the dominant components in most of the video frames, we build the GMMs of the field pixels and use these models to detect playfield pixels. Finally region-growing operation is employed to segment the playfield regions from the background. Experimental results show that the proposed method is robust to various sports videos even for very poor grass field conditions. Based on the results of playfield segmentation, match situation analysis is investigated, which is also desired for sports professionals and longtime fanners. The results are encouraging.