Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Qiaozhu Mei is active.

Publication


Featured researches published by Qiaozhu Mei.


international world wide web conferences | 2007

Topic sentiment mixture: modeling facets and opinions in weblogs

Qiaozhu Mei; Xu Ling; Matthew Wondra; Hang Su; ChengXiang Zhai

In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent topical facets in a Weblog collection, the subtopics in the results of an ad hoc query, and their associated sentiments. It could also provide general sentiment models that are applicable to any ad hoc topics. With a specifically designed HMM structure, the sentiment models and topic models estimated with TSM can be utilized to extract topic life cycles and sentiment dynamics. Empirical experiments on different Weblog datasets show that this approach is effective for modeling the topic facets and sentiments and extracting their dynamics from Weblog collections. The TSM model is quite general; it can be applied to any text collections with a mixture of topics and sentiments, thus has many potential applications, such as search result summarization, opinion tracking, and user behavior prediction.


knowledge discovery and data mining | 2005

Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Qiaozhu Mei; ChengXiang Zhai

Temporal Text Mining (TTM) is concerned with discovering temporal patterns in text information collected over time. Since most text information bears some time stamps, TTM has many applications in multiple domains, such as summarizing events in news articles and revealing research trends in scientific literature. In this paper, we study a particular TTM task -- discovering and summarizing the evolutionary patterns of themes in a text stream. We define this new text mining problem and present general probabilistic methods for solving this problem through (1) discovering latent themes from text; (2) constructing an evolution graph of themes; and (3) analyzing life cycles of themes. Evaluation of the proposed methods on two different domains (i.e., news articles and literature) shows that the proposed methods can discover interesting evolutionary theme patterns effectively.


international world wide web conferences | 2008

Topic modeling with network regularization

Qiaozhu Mei; Deng Cai; Duo Zhang; ChengXiang Zhai

In this paper, we formally define the problem of topic modeling with network structure (TMN). We propose a novel solution to this problem, which regularizes a statistical topic model with a harmonic regularizer based on a graph structure in the data. The proposed method bridges topic modeling and social network analysis, which leverages the power of both statistical topic models and discrete regularization. The output of this model well summarizes topics in text, maps a topic on the network, and discovers topical communities. With concrete selection of a topic model and a graph-based regularizer, our model can be applied to text mining problems such as author-topic analysis, community discovery, and spatial text mining. Empirical experiments on two different genres of data show that our approach is effective, which improves text-oriented methods as well as network-oriented methods. The proposed model is general; it can be applied to any text collections with a mixture of topics and an associated network structure.


conference on information and knowledge management | 2008

Query suggestion using hitting time

Qiaozhu Mei; Dengyong Zhou; Kenneth Ward Church

Generating alternative queries, also known as query suggestion, has long been proved useful to help a user explore and express his information need. In many scenarios, such suggestions can be generated from a large scale graph of queries and other accessory information, such as the clickthrough. However, how to generate suggestions while ensuring their semantic consistency with the original query remains a challenging problem. In this work, we propose a novel query suggestion algorithm based on ranking queries with the hitting time on a large scale bipartite graph. Without involvement of twisted heuristics or heavy tuning of parameters, this method clearly captures the semantic consistency between the suggested query and the original query. Empirical experiments on a large scale query log of a commercial search engine and a scientific literature collection show that hitting time is effective to generate semantically consistent query suggestions. The proposed algorithm and its variations can successfully boost long tail queries, accommodating personalized query suggestion, as well as finding related authors in research.


international world wide web conferences | 2006

A probabilistic approach to spatiotemporal theme pattern mining on weblogs

Qiaozhu Mei; Chao Liu; Hang Su; ChengXiang Zhai

Mining subtopics from weblogs and analyzing their spatiotemporal patterns have applications in multiple domains. In this paper, we define the novel problem of mining spatiotemporal theme patterns from weblogs and propose a novel probabilistic approach to model the subtopic themes and spatiotemporal theme patterns simultaneously. The proposed model discovers spatiotemporal theme patterns by (1) extracting common themes from weblogs; (2) generating theme life cycles for each given location; and (3) generating theme snapshots for each given time period. Evolution of patterns can be discovered by comparative analysis of theme life cycles and theme snapshots. Experiments on three different data sets show that the proposed approach can discover interesting spatiotemporal theme patterns effectively. The proposed probabilistic model is general and can be used for spatiotemporal text mining on any domain with time and location information.


knowledge discovery and data mining | 2007

Automatic labeling of multinomial topic models

Qiaozhu Mei; Xuehua Shen; ChengXiang Zhai

Multinomial distributions over words are frequently used to model topics in text collections. A common, major challenge in applying all such topic models to any text mining problem is to label a multinomial topic model accurately so that a user can interpret the discovered topic. So far, such labels have been generated manually in a subjective way. In this paper, we propose probabilistic approaches to automatically labeling multinomial topic models in an objective way. We cast this labeling problem as an optimization problem involving minimizing Kullback-Leibler divergence between word distributions and maximizing mutual information between a label and a topic model. Experiments with user study have been done on two text data sets with different genres.The results show that the proposed labeling methods are quite effective to generate labels that are meaningful and useful for interpreting the discovered topic models. Our methods are general and can be applied to labeling topics learned through all kinds of topic models such as PLSA, LDA, and their variations.


knowledge discovery and data mining | 2010

PET: a statistical model for popular events tracking in social communities

Cindy Xide Lin; Bo Zhao; Qiaozhu Mei; Jiawei Han

User generated information in online communities has been characterized with the mixture of a text stream and a network structure both changing over time. A good example is a web-blogging community with the daily blog posts and a social network of bloggers. An important task of analyzing an online community is to observe and track the popular events, or topics that evolve over time in the community. Existing approaches usually focus on either the burstiness of topics or the evolution of networks, but ignoring the interplay between textual topics and network structures. In this paper, we formally define the problem of popular event tracking in online communities (PET), focusing on the interplay between texts and networks. We propose a novel statistical method that models the the popularity of events over time, taking into consideration the burstiness of user interest, information diffusion on the network structure, and the evolution of textual topics. Specifically, a Gibbs Random Field is defined to model the influence of historic status and the dependency relationships in the graph; thereafter a topic model generates the words in text content of the event, regularized by the Gibbs Random Field. We prove that two classic models in information diffusion and text burstiness are special cases of our model under certain situations. Empirical experiments with two different communities and datasets (i.e., Twitter and DBLP) show that our approach is effective and outperforms existing approaches.


international world wide web conferences | 2012

We know what @you #tag: does the dual role affect hashtag adoption?

Lei Yang; Tao Sun; Ming Zhang; Qiaozhu Mei

Researchers and social observers have both believed that hashtags, as a new type of organizational objects of information, play a dual role in online microblogging communities (e.g., Twitter). On one hand, a hashtag serves as a bookmark of content, which links tweets with similar topics; on the other hand, a hashtag serves as the symbol of a community membership, which bridges a virtual community of users. Are the real users aware of this dual role of hashtags? Is the dual role affecting their behavior of adopting a hashtag? Is hashtag adoption predictable? We take the initiative to investigate and quantify the effects of the dual role on hashtag adoption. We propose comprehensive measures to quantify the major factors of how a user selects content tags as well as joins communities. Experiments using large scale Twitter datasets prove the effectiveness of the dual role, where both the content measures and the community measures significantly correlate to hashtag adoption on Twitter. With these measures as features, a machine learning model can effectively predict the future adoption of hashtags that a user has never used before.


knowledge discovery and data mining | 2010

DivRank: the interplay of prestige and diversity in information networks

Qiaozhu Mei; Jian Guo; Dragomir R. Radev

Information networks are widely used to characterize the relationships between data items such as text documents. Many important retrieval and mining tasks rely on ranking the data items based on their centrality or prestige in the network. Beyond prestige, diversity has been recognized as a crucial objective in ranking, aiming at providing a non-redundant and high coverage piece of information in the top ranked results. Nevertheless, existing network-based ranking approaches either disregard the concern of diversity, or handle it with non-optimized heuristics, usually based on greedy vertex selection. We propose a novel ranking algorithm, DivRank, based on a reinforced random walk in an information network. This model automatically balances the prestige and the diversity of the top ranked vertices in a principled way. DivRank not only has a clear optimization explanation, but also well connects to classical models in mathematics and network science. We evaluate DivRank using empirical experiments on three different networks as well as a text summarization task. DivRank outperforms existing network-based ranking methods in terms of enhancing diversity in prestige.


knowledge discovery and data mining | 2015

PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Jian Tang; Meng Qu; Qiaozhu Mei

Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, have been attracting increasing attention due to their simplicity, scalability, and effectiveness. However, comparing to sophisticated deep learning architectures such as convolutional neural networks, these methods usually yield inferior results when applied to particular machine learning tasks. One possible reason is that these text embedding methods learn the representation of text in a fully unsupervised way, without leveraging the labeled information available for the task. Although the low dimensional representations learned are applicable to many different tasks, they are not particularly tuned for any task. In this paper, we fill this gap by proposing a semi-supervised representation learning method for text data, which we call the predictive text embedding (PTE). Predictive text embedding utilizes both labeled and unlabeled data to learn the embedding of text. The labeled information and different levels of word co-occurrence information are first represented as a large-scale heterogeneous text network, which is then embedded into a low dimensional space through a principled and efficient algorithm. This low dimensional embedding not only preserves the semantic closeness of words and documents, but also has a strong predictive power for the particular task. Compared to recent supervised approaches based on convolutional neural networks, predictive text embedding is comparable or more effective, much more efficient, and has fewer parameters to tune.

Collaboration


Dive into the Qiaozhu Mei's collaboration.

Top Co-Authors

Avatar

Kai Zheng

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cheng Li

University of Michigan

View shared research outputs
Top Co-Authors

Avatar

Hua Xu

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Yue Wang

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge