Weigang Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Weigang Zhang is active.

Explore More

Publication

Featured researches published by Weigang Zhang.

international conference on multimedia and expo | 2013

Cross-media topic detection: A multi-modality fusion framework

Yanyan Zhang; Guorong Li; Lingyang Chu; Shuhui Wang; Weigang Zhang; Qingming Huang

Detecting topics from Web data attracts increasing attention in recent years. Most previous works on topic detection mainly focus on the data from single medium, however, the rich and complementary information carried by multiple media can be used to effectively enhance the topic detection performance. In this paper, we propose a flexible data fusion framework to detect topics that simultaneously exist in different mediums. The framework is based on a multi-modality graph (MMG), which is obtained by fusing two single-modality graphs together: a text graph and a visual graph. Each node of MMGrepresents a multi-modal data and the edge weight between two nodes jointly measures their content and upload-time similarities. Since the data about the same topic often have similar content and are usually uploaded in a similar period of time, they would naturally form a dense (namely, strongly connected) subgraph in MMG. Such dense subgraph is robust to noise and can be efficiently detected by pair-wise clustering methods. The experimental results on single-medium and cross-media datasets demonstrate the flexibility and effectiveness of our method.

IEEE Transactions on Circuits and Systems for Video Technology | 2016

Effective Multimodality Fusion Framework for Cross-Media Topic Detection

Lingyang Chu; Yanyan Zhang; Guorong Li; Shuhui Wang; Weigang Zhang; Qingming Huang

Due to the prevalence of We-Media, information is quickly published and received in various forms anywhere and anytime through the Internet. The rich cross-media information carried by the multimodal data in multiple media has a wide audience, deeply reflects the social realities, and brings about much greater social impact than any single media information. Therefore, automatically detecting topics from cross media is of great benefit for the organizations (i.e., advertising agencies and governments) that care about the social opinions. However, cross-media topic detection is challenging from the following aspects: 1) the multimodal data from different media often involve distinct characteristics and 2) topics are presented in an arbitrary manner among the noisy web data. In this paper, we propose a multimodality fusion framework and a topic recovery (TR) approach to effectively detect topics from cross-media data. The multimodality fusion framework flexibly incorporates the heterogeneous multimodal data into a multimodality graph, which takes full advantage from the rich cross-media information to effectively detect topic candidates (T.C.). The TR approach solidly improves the entirety and purity of detected topics by: 1) merging the T.C. that are highly relevant themes of the same real topic and 2) filtering out the less-relevant noise data in the merged T.C. Extensive experiments on both single-media and cross-media data sets demonstrate the promising flexibility and effectiveness of our method in detecting topics from cross media.

visual communications and image processing | 2005

Unsupervised sports video scene clustering and its applications to story units detection

Weigang Zhang; Qixiang Ye; Liyuan Xing; Qingming Huang; Wen Gao

In this paper, we present a new and efficient clustering approach for scene analysis in sports video. This method is generic and does not require any prior domain knowledge. It performs in an unsupervised manner and relies on the scene likeness analysis of the shots in the video. The two most similar shots are merged into the same scene in each iteration. And this procedure is repeated until the merging stop criterion is satisfied. The stop criterion is defined based on a J value which is defined according to the Fisher Discriminant Function. We call this method J-based Scene Clustering. By using this method, the low-level video content representation-shots could be clustered into the midlevel video content representation-scenes, which are useful for high-level sports video content analysis such as playbreak parsing, story units detection, highlights extraction and summarization, etc. Experimental results obtained from various types of broadcast sports videos demonstrate the efficacy of the proposed approach. Moreover, in this paper, we also present a simple application of our scene clustering method to story units detection in periodic sports videos like archery video, diving video and so on. The experimental results are encouraging.

IEEE Transactions on Multimedia | 2015

Unsupervised Web Topic Detection Using A Ranked Clustering-like Pattern Across Similarity Cascades

Junbiao Pang; Fei Jia; Chunjie Zhang; Weigang Zhang; Qingming Huang; Baocai Yin

Despite the massive growth of social media on the Internet, the process of organizing, understanding, and monitoring user generated content (UGC) has become one of the most pressing problems in todays society. Discovering topics on the web from a huge volume of UGC is one of the promising approaches to achieve this goal. Compared with classical topic detection and tracking in news articles, identifying topics on the web is by no means easy due to the noisy, sparse, and less- constrained data on the Internet. In this paper, we investigate methods from the perspective of similarity diffusion, and propose a clustering-like pattern across similarity cascades (SCs). SCs are a series of subgraphs generated by truncating a similarity graph with a set of thresholds, and then maximal cliques are used to capture topics. Finally, a topic-restricted similarity diffusion process is proposed to efficiently identify real topics from a large number of candidates. Experiments demonstrate that our approach outperforms the state-of-the-art methods on three public data sets.

visual communications and image processing | 2005

A scheme for racquet sports video analysis with the combination of audio-visual information

Liyuan Xing; Qixiang Ye; Weigang Zhang; Qingming Huang; Hua Yu

As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is employed to detect important audio symbols including impact (ball hit), audience cheers, commentator speech, etc. Meanwhile an unsupervised algorithm is proposed to group video shots into various clusters. Then, by taking advantage of temporal relationship between audio and visual signals, we can specify the scene clusters with semantic labels including rally scenes and break scenes. Thirdly, a refinement procedure is developed to reduce false rally scenes by further audio analysis. Finally, an exciting model is proposed to rank the detected rally scenes from which many exciting video clips such as game (match) points can be correctly retrieved. Experiments on two types of representative racquet sports video, table tennis video and tennis video, demonstrate encouraging results.

international conference on multimedia and expo | 2015

Group sensitive Classifier Chains for multi-label classification

Jun Huang; Guorong Li; Shuhui Wang; Weigang Zhang; Qingming Huang

In multi-label classification, labels often have correlations with each other. Exploiting label correlations can improve the performances of classifiers. Current multi-label classification methods mainly consider the global label correlations. However, the label correlations may be different over different data groups. In this paper, we propose a simple and efficient framework for multi-label classification, called Group sensitive Classifier Chains. We assume that similar examples not only share the same label correlations, but also tend to have similar labels. We augment the original feature space with label space and cluster them into groups, then learn the label dependency graph in each group respectively and build the classifier chains on each group specific label dependency graph. The group specific classifier chains which are built on the nearest group of the test example are used for prediction. Comparison results with the state-of-the-art approaches manifest competitive performances of our method.

Multimedia Tools and Applications | 2014

Web video thumbnail recommendation with content-aware analysis and query-sensitive matching

Weigang Zhang; Chunxi Liu; Zhenjun Wang; Guorong Li; Qingming Huang; Wen Gao

In this paper, a unified and adaptive web video thumbnail recommendation framework is proposed, which recommends thumbnails both for video owners and browsers on the basis of image quality assessment, image accessibility analysis, video content representativeness analysis and query-sensitive matching. At the very start, video shot detection is performed and the highest image quality video frame is extracted as the key frame for each shot on the basis of our proposed image quality assessment method. These key frames are utilized as the thumbnail candidates for the following processes. In the image quality assessment, the normalized variance autofocusing function is employed to evaluate the image blur and ensures that the selected video thumbnail candidates are clear and have high image quality. For accessibility analysis, color moment, visual salience and texture are used with a support vector regression model to predict the candidates’ accessibility score, which ensures that the recommended thumbnail’s ROIs are big enough and it is very accessible for users. For content representativeness analysis, the mutual reinforcement algorithm is adopted in the entire video to obtain the candidates’ representativeness score, which ensures that the final thumbnail is representative enough for users to catch the main video contents at a glance. Considering browsers’ query intent, a relevant model is designed to recommend more personalized thumbnails for certain browsers. Finally, by flexibly fusing the above analysis results, the final adaptive recommendation work is accomplished. Experimental results and subjective evaluations demonstrate the effectiveness of the proposed approach. Compared with the existing web video thumbnail generation methods, the thumbnails for video owners not only reflect the contents of the video better, but also make users feel more comfortable. The thumbnails for video browsers directly reflect their preference, which greatly enhances their user experience.

international conference on multimedia and expo | 2006

Extracting Story Units in Sports Video Based on Unsupervised Video Scene Clustering

Chunxi Liu; Qingming Huang; Shuqiang Jiang; Weigang Zhang

Many sports videos such as archery, diving and tennis have repetitive structure patterns. They are reliable clues to generate highlights, summarization and automatic annotation. In this paper, we present a novel approach to analyze these structure patterns in sports video to extract story units. First, an unsupervised scene clustering method for sports video is adopted to automatically categorize the video shots into several disparate scenes. Then, the clustering results are modeled by a transition matrix. Finally, the key scene shots are detected to analyze the structure patterns and extract the story units. Experimental results on several types of broadcast sports video demonstrate that our approach is effective

Neurocomputing | 2015

Fusing cross-media for topic detection by dense keyword groups

Weigang Zhang; Tianlong Chen; Guorong Li; Junbiao Pang; Qingming Huang; Wen Gao

Abstract Events are real-world occurrences that lead to the explosive growth of web multimedia content such as images, videos and texts. Efficient organization and navigation of multimedia data in the topic level can boost users׳ understanding and enhance their experience of the events that have happened. Due to the potential application prospects, multimedia topic detection has been an active area of research with notable progress in the last decade. Traditional methods mainly focus on single media, so the results only reflect the characteristics of one certain media and topic browsing was not comprehensive enough. In this paper, we propose a method of utilizing and fusing rich media information from web videos and news reports to extract weighted keyword groups, which are used for cross-media topic detection. Firstly by utilizing the video-related textual information and the titles of news articles, a maximum local average score is proposed to find coarse weighted dense keyword groups; after that, textual linking and visual linking are applied to refine the keyword groups and update the weights; finally, the documents are re-linked with the refined keyword groups to form an event-related document set. Experiments are conducted on cross-media datasets containing web videos and news reports. The web videos are from Youku, YouTube׳s equivalent in China, the news reports from sina.com, some of which contain topic-related images. The experimental results demonstrate the effectiveness and efficiency of the proposed approach.

international conference on image processing | 2014

DA-CCD: A novel action representation by Deep Architecture of local depth feature

Yi Liu; Lei Qin; Zhongwei Cheng; Yanhao Zhang; Weigang Zhang; Qingming Huang

With the widespread use of depth sensors, it is crucial to provide an effective and efficient solution for human action analysis applications upon the informative depth data. In this paper, we present a generic framework of modeling the human action by deep architecture enhanced local features with depth data. To introduce robust higher-level representations, we augment the adaptive and scalable local depth features in a deep feature learning manner. Specifically, a Deep Architecture of Comparative Coding Descriptor (DA-CCD) is proposed to represent the depth action data. Our approach obtains consistently superior recognition precisions on view specific/non-specific scenarios compared with other leading action representations of depth data on the Huawei/3DLife Dataset.

Explore More