Yuan Zuo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuan Zuo is active.

Explore More

Publication

Featured researches published by Yuan Zuo.

Knowledge and Information Systems | 2016

Word network topic model: a simple but general solution for short and imbalanced texts

Yuan Zuo; Jichang Zhao; Ke Xu

The short text has been the prevalent format for information of Internet, especially with the development of online social media. Although sophisticated signals delivered by the short text make it a promising source for topic modeling, its extreme sparsity and imbalance bring unprecedented challenges to conventional topic models like LDA and its variants. Aiming at presenting a simple but general solution for topic modeling in short texts, we present a word co-occurrence network-based model named WNTM to tackle the sparsity and imbalance simultaneously. Different from previous approaches, WNTM models the distribution over topics for each word instead of learning topics for each document, which successfully enhances the semantic density of data space without importing too much time or space complexity. Meanwhile, the rich contextual information preserved in the word–word space also guarantees its sensitivity in identifying rare topics with convincing quality. Furthermore, employing the same Gibbs sampling as LDA makes WNTM easily to be extended to various application scenarios. Extensive validations on both short and normal texts testify the outperformance of WNTM as compared to baseline methods. And we also demonstrate its potential in precisely discovering newly emerging topics or unexpected events in Weibo at pretty early stages.

knowledge discovery and data mining | 2016

Topic Modeling of Short Texts: A Pseudo-Document View

Yuan Zuo; Junjie Wu; Hui Zhang; Hao Lin; Fei Wang; Ke Xu; Hui Xiong

Recent years have witnessed the unprecedented growth of online social media, which empower short texts as the prevalent format for information of Internet. Given the nature of sparsity, however, short text topic modeling remains a critical yet much-watched challenge in both academy and industry. Rich research efforts have been put on building different types of probabilistic topic models for short texts, among which the self aggregation methods without using auxiliary information become an emerging solution for providing informative cross-text word co-occurrences. However, models along this line are still rarely seen, and the representative one Self-Aggregation Topic Model (SATM) is prone to overfitting and computationally expensive. In light of this, in this paper, we propose a novel probabilistic model called Pseudo-document-based Topic Model (PTM) for short text topic modeling. PTM introduces the concept of pseudo document to implicitly aggregate short texts against data sparsity. By modeling the topic distributions of latent pseudo documents rather than short texts, PTM is expected to gain excellent performance in both accuracy and efficiency. A Sparsity-enhanced PTM (SPTM for short) is also proposed by applying Spike and Slab prior, with the purpose of eliminating undesired correlations between pseudo documents and latent topics. Extensive experiments on various real-world data sets with state-of-the-art baselines demonstrate the high quality of topics learned by PTM and its robustness with reduced training samples. It is also interesting to show that i) SPTM gains a clear edge over PTM when the number of pseudo documents is relatively small, and ii) the constraint that a short text belongs to only one pseudo document is critically important for the success of PTM. We finally take an in-depth semantic analysis to unveil directly the fabulous function of pseudo documents in finding cross-text word co-occurrences for topic modeling.

international conference on data mining | 2015

Complementary Aspect-Based Opinion Mining Across Asymmetric Collections

Yuan Zuo; Junjie Wu; Hui Zhang; Deqing Wang; Hao Lin; Fei Wang; Ke Xu

Aspect-based opinion mining is to find elaborate opinions towards an underlying theme, perspective or viewpoint as to a subject such as a product or an event. Nowadays, with rapid growing of opinionated text on the Web, mining aspect-level opinions has become a promising means for online public opinion analysis. In particular, the booming of various types of online media provide diverse yet complementary information, bringing unprecedented opportunities for public opinion analysis across different populations. Along this line, in this paper, we propose CAMEL, a novel topic model for complementary aspect-based opinion mining across asymmetric collections. CAMEL gains complementarity by modeling both common and specific aspects across different collections, and keeping all the corresponding opinions for contrastive study. To further boost CAMEL, we propose AME, an automatic labeling scheme for maximum entropy model, to help discriminate aspect and opinion words without heavy human labeling. Extensive experiments on synthetic multicollection data sets demonstrate the superiority of CAMEL to baseline methods, in leveraging cross-collection complementarity to find higher-quality aspects and more coherent opinions as well as aspect-opinion relationships. This is particularly true when the collections get seriously imbalanced. Experimental results also show that the AME model indeed outperforms manual labeling in suggesting true opinion words. Finally, case study on two public events further demonstrates the practical value of CAMEL for real-world public opinion analysis.

computational science and engineering | 2013

An Improved Regularized Latent Semantic Indexing with L1/2 Regularization and Non-negative Constraints

Yong Chen; Hui Zhang; Yuan Zuo; Deqing Wang

Recently topic model has been more and more popular in lots of fields such as information retrieval and semantic relatedness computing, but its practical application is limited to the scalability of data. It cannot be efficiently executed on large-scale datasets in a parallel way. In this paper, we introduce an improved Regularized Latent Semantic Indexing(RLSI) with L1/2 regularization and non-negative constraints. This method formalizes topic model as a problem of minimizing a quadratic loss function regularized by L1/2 and L2 norm with non-negative constraints. This formulation allows the learning process to be decomposed into a series of mutually independent sub-optimization problems which can be processed in parallel, therefore, it has the ability to handle large-scale data. The non-negative constraints and L1/2 regularization allow our model to be more practical and more conducive to information retrieval and semantic relatedness computing. Extensive experimental results show that our improved model can deal with large-scale text data, and compared with some of the-state-of-the-art topic models, it is also very effective.

IEEE Transactions on Knowledge and Data Engineering | 2018

Complementary Aspect-Based Opinion Mining

Yuan Zuo; Junjie Wu; Hui Zhang; Deqing Wang; Ke Xu

Aspect-based opinion mining is finding elaborate opinions towards a subject such as a product or an event. With explosive growth of opinionated texts on the Web, mining aspect-level opinions has become a promising means for online public opinion analysis. In particular, the boom of various types of online media provides diverse yet complementary information, bringing unprecedented opportunities for cross media aspect-opinion mining. Along this line, we propose CAMEL, a novel topic model for complementary aspect-based opinion mining across asymmetric collections. CAMEL gains information complementarity by modeling both common and specific aspects across collections, while keeping all the corresponding opinions for contrastive study. An auto-labeling scheme called AME is also proposed to help discriminate between aspect and opinion words without elaborative human labeling, which is further enhanced by adding word embedding-based similarity as a new feature. Moreover, CAMEL-DP, a nonparametric alternative to CAMEL is also proposed based on coupled Dirichlet Processes. Extensive experiments on real-world multi-collection reviews data demonstrate the superiority of our methods to competitive baselines. This is particularly true when the information shared by different collections becomes seriously fragmented. Finally, a case study on the public event “2014 Shanghai Stampede” demonstrates the practical value of CAMEL for real-world applications.

2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA) | 2016

A diversifying hidden units method based on NMF for document representation

X. Jiang; He Zhang; Rui Liu; Yuan Zuo

Document modeling with hidden units as known as topics are very popular. Non-negative matrix factorization(NMF) is one of the most important techniques in document representation, which decomposes a document-term matrix into a document-topic matrix and a topic-term matrix. Since orthogonal constraint would limit terms occur only in one topic, we abandon this strong constraint. Furthermore, in order to represent documents in a certain number of topics with more semantic information, we add diversifying regularization and sparse constraint into NMF, which shows a great improvement in text classification and clustering. In the end, we draw the figure of topics similarities and display the top 20 weighted words in each topic to reveal that diversifying regularization can efficiently reduce the overlapping terms.

knowledge discovery and data mining | 2018

Embedding Temporal Network via Neighborhood Formation

Yuan Zuo; Guannan Liu; Hao Lin; Jia Guo; Xiaoqian Hu; Junjie Wu

Given the rich real-life applications of network mining as well as the surge of representation learning in recent years, network embedding has become the focal point of increasing research interests in both academic and industrial domains. Nevertheless, the complete temporal formation process of networks characterized by sequential interactive events between nodes has yet seldom been modeled in the existing studies, which calls for further research on the so-called temporal network embedding problem. In light of this, in this paper, we introduce the concept of neighborhood formation sequence to describe the evolution of a node, where temporal excitation effects exist between neighbors in the sequence, and thus we propose a Hawkes process based Temporal Network Embedding (HTNE) method. HTNE well integrates the Hawkes process into network embedding so as to capture the influence of historical neighbors on the current neighbors. In particular, the interactions of low-dimensional vectors are fed into the Hawkes process as base rate and temporal influence, respectively. In addition, attention mechanism is also integrated into HTNE to better determine the influence of historical neighbors on current neighbors of a node. Experiments on three large-scale real-life networks demonstrate that the embeddings learned from the proposed HTNE model achieve better performance than state-of-the-art methods in various tasks including node classification, link prediction, and embedding visualization. In particular, temporal recommendation based on arrival rate inferred from node embeddings shows excellent predictive power of the proposed model.

international conference on tools with artificial intelligence | 2016

Robust Word-Network Topic Model for Short Texts

Fei Wang; Rui Liu; Yuan Zuo; Hui Zhang; He Zhang; Junjie Wu

With the rapid development of online social media, the short text has become the prevalent format for information of Internet. Due to the severe data sparsity issue, accurately discovering knowledge behind these short texts remains a critical challenge. Since regular topic models, such as the Latent Dirichlet Allocation (LDA), can not perform well on short texts, many efforts have been put on building different types of probabilistic topic models for short texts. Inducing topics from dense word-word space instead of sparse document-word space becomes an emerging solution for avoiding data sparsity issue, and the representative one is the Word Network Topic Model (WNTM). However, the word-word space building procedure of WNTM often imports much irrelevant information. In light of this, we propose the Robust WNTM (RWNTM), which can filter out unrelated information during the sampling. The experimental results demonstrate that our method can learn more coherent topics and is more accurate in text classification, as compared with WNTM and other state-of-the-arts.

national conference on artificial intelligence | 2017