Shengsheng Qian
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shengsheng Qian.
IEEE Transactions on Multimedia | 2016
Shengsheng Qian; Tianzhu Zhang; Changsheng Xu; Jie Shao
With the massive growth of social events in Internet , it has become more and more difficult to exactly find and organize the interesting events from massive social media data, which is useful to browse, search, and monitor social events by users or governments . To deal with this problem, we propose a novel multi-modal social event tracking and evolution framework to not only effectively capture multi-modal topics of social events, but also obtain the evolutionary trends of social events and generate effective event summary details over time. To achieve this goal, we propose a novel multi-modal event topic model (mmETM), which can effectively model social media documents, including long text with related images, and learn the correlations between textual and visual modalities to separate the visual-representative topics and non-visual-representative topics. To apply the mmETM model to social event tracking, we adopt an incremental learning strategy denoted as incremental mmETM, which can obtain informative textual and visual topics of social events over time to help understand these events and their evolutionary trends. To evaluate the effectiveness of our proposed algorithm, we collect a real-world dataset to conduct various experiments. Both qualitative and quantitative evaluations demonstrate that the proposed mmETM algorithm performs favorably against several state-of-the-art methods.
ACM Transactions on Multimedia Computing, Communications, and Applications | 2015
Shengsheng Qian; Tianzhu Zhang; Changsheng Xu; M. Shamim Hossain
With the rapidly increasing popularity of social media sites (e.g., Flickr, YouTube, and Facebook), it is convenient for users to share their own comments on many social events, which successfully facilitates social event generation, sharing and propagation and results in a large amount of user-contributed media data (e.g., images, videos, and text) for a wide variety of real-world events of different types and scales. As a consequence, it has become more and more difficult to exactly find the interesting events from massive social media data, which is useful to browse, search and monitor social events by users or governments. To deal with these issues, we propose a novel boosted multimodal supervised Latent Dirichlet Allocation (BMM-SLDA) for social event classification by integrating a supervised topic model, denoted as multi-modal supervised Latent Dirichlet Allocation (mm-SLDA), in the boosting framework. Our proposed BMM-SLDA has a number of advantages. (1) Our mm-SLDA can effectively exploit the multimodality and the multiclass property of social events jointly, and make use of the supervised category label information to classify multiclass social event directly. (2) It is suitable for large-scale data analysis by utilizing boosting weighted sampling strategy to iteratively select a small subset of data to efficiently train the corresponding topic models. (3) It effectively exploits social event structure by the document weight distribution with classification error and can iteratively learn new topic model to correct the previously misclassified event documents. We evaluate our BMM-SLDA on a real world dataset and show extensive experimental results, which demonstrate that our model outperforms state-of-the-art methods.
acm multimedia | 2015
Shengsheng Qian; Tianzhu Zhang; Changsheng Xu
Cross-domain data analysis is one of the most important tasks in social multimedia. It has a wide range of real-world applications, including cross-platform event analysis, cross-domain multi-event tracking, cross-domain video recommendation, etc. It is also very challenging because the data have multi-modal and multi-domain properties, and there are no explicit correlations to link different domains. To deal with these issues, we propose a generic Cross-Domain Collaborative Learning (CDCL) framework based on non-parametric Bayesian dictionary learning model for cross-domain data analysis. In the proposed CDCL model, it can make use of the shared domain priors and modality priors to collaboratively learn the datas representations by considering the domain discrepancy and the multi-modal property. As a result, our CDCL model can effectively explore the virtues of different information sources to complement and enhance each other for cross-domain data analysis. To evaluate the proposed model, we apply it for two different applications: cross-platform event recognition and cross-network video recommendation. The extensive experimental evaluations well demonstrate the effectiveness of the proposed algorithm for cross-domain data analysis.
international conference on pattern recognition | 2014
Shengsheng Qian; Tianzhu Zhang; Changsheng Xu
With the rapidly increasing popularity of Social Media sites (e.g., Flickr, YouTube, and Facebook), it is convenient for users to share their own comments on many social events, which successfully facilitates social event generation, sharing and propagation and results in a large amount of user-contributed media data (e.g., images, videos, and texts) for a wide variety of real-world events of different types and scales. As a consequence, it has become more and more difficult to find exactly the interesting events from massive social media data, which is useful to browse, search and monitor social events by users or governments. To deal with these issues, we propose a novel boosted multi-modal supervised Latent Dirichlet Allocation (BMM-SLDA) for social event classification. Our BMM-SLDA has a number of advantages. (1) It can effectively exploit the multi-modality and the supervised information of social events jointly. (2) It is suitable to large-scale data analysis by utilizing boosting weighted sampling strategy to iteratively select a small subset data to efficiently train the corresponding topic models. (3) It effectively exploits boosting document weight distribution by classification error, and can iteratively learn new topic model to correct the previously misclassified documents. We evaluate our BMM-SLDA on a real-world dataset and show extensive results, which show that our model outperforms state-of-the-art methods.
acm multimedia | 2016
Shengsheng Qian; Tianzhu Zhang; Changsheng Xu
In this paper, we propose a novel multi-modal multi-view topic-opinion mining (MMTOM) model for social event analysis in multiple collection sources. Compared with existing topic-opinion mining methods, our proposed model has several advantages: (1) The proposed MMTOM can effectively take into account multi-modal and multi-view properties jointly in a unified and principled way for social event modeling. (2) Our model is general and can be applied to many other applications in multimedia, such as opinion mining and sentiment analysis, multi-view association visualization, and topic-opinion mining for movie review. (3) The proposed MMTOM is able to not only discover multi-modal common topics from all collections as well as summarize the similarities and differences of these collections along each specific topic, but also automatically mine multi-view opinions on the learned topics across different collections. (4) Our topic-opinion mining results can be effectively applied to many applications including multi-modal multi-view topic-opinion retrieval and visualization, which achieve much better performance than existing methods. To evaluate the proposed model, we collect a real-world dataset for research on multi-modal multi-view social event analysis, and will release it for academic use. We have conducted extensive experiments, and both qualitative and quantitative evaluation results have demonstrated the effectiveness of the proposed MMTOM.
international conference on internet multimedia computing and service | 2014
Shengsheng Qian; Tianzhu Zhang; Changsheng Xu
In social media, many existing websites (e.g., Flickr, YouTube, and Facebook) are for users to share their own interests and opinions of many popular events, and successfully facilitate the event generation, sharing and propagation. As a result, there are substantial amounts of user-contributed media data (e.g., images, videos, and textual content) for a wide variety of real-world events of different types and scales. The aim of this paper is to automatically identify the interesting events from massive social media data, which are useful to browse, search and monitor social events by users or governments. To achieve this goal, we propose a novel multi-modal supervised latent dirichlet allocation (mm-SLDA) for social event classification. Our proposed mm-SLDA has a number of advantages. (1) It can effectively exploit the multi-modality and the multi-class property of social events jointly. (2) It makes use of the supervised social event category label information and is able to classify multi-class social event directly. We evaluate our proposed mm-SLDA on a real world dataset and show extensive experimental results, which demonstrate that our model outperforms state-of-the-art methods.
Multimedia Tools and Applications | 2018
Feng Xue; Jianwei Wang; Shengsheng Qian; Tianzhu Zhang; Xueliang Liu; Changsheng Xu
In this paper, we proposed a novel multi-modal max-margin supervised topic model (MMSTM) for social event analysis by jointly learning the representation together with the classifier in a unified framework. Compared with existing methods, the proposed MMSTM model has several advantages. (1) The proposed model can utilize the classifier as the regularization term of our model to jointly learn the parameters in the generative model and max-margin classifier, and use the Gibbs sampling to learn parameters of the representation model and max-margin classifier by minimizing the expected loss function. (2) The proposed model is able to not only effectively mine the multi-modal property by jointly learning the latent topic relevance among multiple modalities for social event representation, but also exploit the supervised information by considering a discriminative max-margin classifier for event classification to boost the classification performance. (3) In order to validate the effectiveness of the proposed model, we collect a large-scale real-world dataset for social event analysis, and both qualitative and quantitative evaluation results have demonstrated the effectiveness of the proposed MMSTM.
advances in multimedia | 2013
Long Ying; Tianzhu Zhang; Shengsheng Qian; Changsheng Xu
Tracking multiple objects is critical to automatic video content analysis and virtual reality. The major problem is how to solve data association problem when ambiguous observations are caused by objects in close proximity or occlusion. To tackle this problem, we propose a boosted multiple hypotheses tracking (BMHT) algorithm for multiobject tracking. Here, on-line boosting learning is adopted to enhance the discriminative property and enlarge search space of the generative tracker MHT. To make the tracker be more reliable, a multi-cue integration strategy is adopted to consider different kinds of features under the on-line boosting framework. In this paper, we integrate both appearance and motion pattern information. For simplicity, Haar-like features and optical flow are adopted. We test our BMHT tracker on several challenging video sequences that involve heavy occlusion and pose variations. Experimental results show that the proposed BMHT achieves good performance.
acm multimedia | 2018
Xiaowen Huang; Shengsheng Qian; Quan Fang; Jitao Sang; Changsheng Xu
The sequential recommendation is an important task for online user-oriented services, such as purchasing products, watching videos, and social media consumption. Recent work usually used RNN-based methods to derive an overall embedding of the whole behavior sequence, which fails to discriminate the significance of individual user behaviors and thus decreases the recommendation performance. Besides, RNN-based encoding has fixed size and makes further recommendation application inefficient and inflexible. The online sequential behaviors of a user are generally heterogeneous, polysemous, and dynamically context-dependent. In this paper, we propose a unified Contextual Self-Attention Network (CSAN) to address the three properties. Heterogeneous user behaviors are considered in our model that are projected into a common latent semantic space. Then the output is fed into the feature-wise self-attention network to capture the polysemy of user behaviors. In addition, the forward and backward position encoding matrices are proposed to model dynamic contextual dependency. Through extensive experiments on two real-world datasets, we demonstrate the superior performance of the proposed model compared with other state-of-the-art algorithms.
acm multimedia | 2018
Huaiwen Zhang; Quan Fang; Shengsheng Qian; Changsheng Xu
Taxonomy learning is an important problem and facilitates various applications such as semantic understanding and information retrieval. Previous work for building semantic taxonomies has primarily relied on labor-intensive human contributions or focused on text-based extraction. In this paper, we investigate the problem of automatically learning multimodal taxonomies from the multimedia data on the Web. A systematic framework called Variational Deep Graph Embedding and Clustering (VDGEC) is proposed consisting of two stages as concept graph construction and taxonomy induction via variational deep graph embedding and clustering. VDGEC discovers hierarchical concept relationships by exploiting the semantic textual-visual correspondences and contextual co-occurrences in an unsupervised manner. The unstructured semantics and noisy issues of multimedia documents are carefully addressed by VDGEC for high quality taxonomy induction. We conduct extensive experiments on the real-world datasets. Experimental results demonstrate the effectiveness of the proposed framework, where VDGEC outperforms previous unsupervised approaches by a large gap.