Wayne Xin Zhao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wayne Xin Zhao is active.

Explore More

Publication

Featured researches published by Wayne Xin Zhao.

european conference on information retrieval | 2011

Comparing twitter and traditional media using topic models

Wayne Xin Zhao; Jing Jiang; Jianshu Weng; Jing He; Ee-Peng Lim; Hongfei Yan; Xiaoming Li

Twitter as a new form of social media can potentially contain much useful information, but content analysis on Twitter has not been well studied. In particular, it is not clear whether as an information source Twitter can be simply regarded as a faster news feed that covers mostly the same information as traditional news media. In This paper we empirically compare the content of Twitter with a traditional news medium, New York Times, using unsupervised topic modeling. We use a Twitter-LDA model to discover topics from a representative sample of the entire Twitter. We then use text mining techniques to compare these Twitter topics with topics from New York Times, taking into consideration topic categories and types. We also study the relation between the proportions of opinionated tweets and retweets and topic categories and types. Our comparisons show interesting and useful findings for downstream IR or DM applications.

conference on information and knowledge management | 2012

Visualizing timelines: evolutionary summarization via iterative reinforcement between text and image streams

Rui Yan; Xiaojun Wan; Mirella Lapata; Wayne Xin Zhao; Pu-Jen Cheng; Xiaoming Li

We present a novel graph-based framework for timeline summarization, the task of creating different summaries for different timestamps but for the same topic. Our work extends timeline summarization to a multimodal setting and creates timelines that are both textual and visual. Our approach exploits the fact that news documents are often accompanied by pictures and the two share some common content. Our model optimizes local summary creation and global timeline generation jointly following an iterative approach based on mutual reinforcement and co-ranking. In our algorithm, individual summaries are generated by taking into account the mutual dependencies between sentences and images, and are iteratively refined by considering how they contribute to the global timeline and its coherence. Experiments on real-world datasets show that the timelines produced by our model outperform several competitive baselines both in terms of ROUGE and when assessed by human evaluators.

ACM Transactions on Information Systems | 2015

A General SIMD-Based Approach to Accelerating Compression Algorithms

Wayne Xin Zhao; Xudong Zhang; Daniel Lemire; Dongdong Shan; Jian-Yun Nie; Hongfei Yan; Ji-Rong Wen

Compression algorithms are important for data-oriented tasks, especially in the era of “Big Data.” Modern processors equipped with powerful SIMD instruction sets provide us with an opportunity for achieving better compression performance. Previous research has shown that SIMD-based optimizations can multiply decoding speeds. Following these pioneering studies, we propose a general approach to accelerate compression algorithms. By instantiating the approach, we have developed several novel integer compression algorithms, called Group-Simple, Group-Scheme, Group-AFOR, and Group-PFD, and implemented their corresponding vectorized versions. We evaluate the proposed algorithms on two public TREC datasets, a Wikipedia dataset, and a Twitter dataset. With competitive compression ratios and encoding speeds, our SIMD-based algorithms outperform state-of-the-art nonvectorized algorithms with respect to decoding speeds.

string processing and information retrieval | 2012

Position-Aligned translation model for citation recommendation

Jing He; Jian-Yun Nie; Yang Lu; Wayne Xin Zhao

The goal of a citation recommendation system is to suggest some references for a snippet in an article or a book, and this is very useful for both authors and the readers. The citation recommendation problem can be cast as an information retrieval problem, in which the query is the snippet from an article, and the relevant documents are the cited articles. In reality, the citation snippet and the cited articles may be described in different terms, and this makes the citation recommendation task difficult. Translation model is very useful in bridging the vocabulary gap between queries and documents in information retrieval. It can be trained on a collection of query and document pairs, which are assumed to be parallel. However, such training data contains much noise: a relevant document usually contains some relevant parts along with irrelevant ones. In particular, the citation snippet may only mention only some parts of the cited articles content. To cope with this problem, in this paper, we propose a method to train translation models on such noisy data, called position-aligned translation model. This model tries to align the query to the most relevant parts of the document, so that the estimated translation probabilities could rely more on them. We test this model in a citation recommendation task for scientific papers. Our experiments show that the proposed method can significantly improve the previous retrieval methods based on translation models.

IEEE Transactions on Knowledge and Data Engineering | 2016

A General Multi-Context Embedding Model for Mining Human Trajectory Data

Ningnan Zhou; Wayne Xin Zhao; Xiao Zhang; Ji-Rong Wen; Shan Wang

The proliferation of location-based social networks, such as Foursquare and Facebook Places, offers a variety of ways to record human mobility, including user generated geo-tagged contents, check-in services, and mobile apps. Although trajectory data is of great value to many applications, it is challenging to analyze and mine trajectory data due to the complex characteristics reflected in human mobility, which is affected by multiple contextual information. In this paper, we propose a Multi-Context Trajectory Embedding Model, called MC-TEM, to explore contexts in a systematic way. MC-TEM is developed in the distributed representation learning framework, and it is flexible to characterize various kinds of useful contexts for different applications. To the best of our knowledge, it is the first time that the distributed representation learning methods apply to trajectory data. We formally incorporate multiple context information of trajectory data into the proposed model, including user-level, trajectory-level, location-level, and temporal contexts. All the context information is represented in the same embedding space. We apply MC-TEM to two challenging tasks, namely location recommendation and social link prediction. We conduct extensive experiments on three real-world datasets. Extensive experiment results have demonstrated the superiority of our MC-TEM model over several state-of-the-art methods.

Knowledge and Information Systems | 2016

Exploring demographic information in social media for product recommendation

Wayne Xin Zhao; Sui Li; Yulan He; Liwei Wang; Ji-Rong Wen; Xiaoming Li

In many e-commerce Web sites, product recommendation is essential to improve user experience and boost sales. Most existing product recommender systems rely on historical transaction records or Web-site-browsing history of consumers in order to accurately predict online users’ preferences for product recommendation. As such, they are constrained by limited information available on specific e-commerce Web sites. With the prolific use of social media platforms, it now becomes possible to extract product demographics from online product reviews and social networks built from microblogs. Moreover, users’ public profiles available on social media often reveal their demographic attributes such as age, gender, and education. In this paper, we propose to leverage the demographic information of both products and users extracted from social media for product recommendation. In specific, we frame recommendation as a learning to rank problem which takes as input the features derived from both product and user demographics. An ensemble method based on the gradient-boosting regression trees is extended to make it suitable for our recommendation task. We have conducted extensive experiments to obtain both quantitative and qualitative evaluation results. Moreover, we have also conducted a user study to gauge the performance of our proposed recommender system in a real-world deployment. All the results show that our system is more effective in generating recommendation results better matching users’ preferences than the competitive baselines.

conference on information and knowledge management | 2010

Context modeling for ranking and tagging bursty features in text streams

Wayne Xin Zhao; Jing Jiang; Jing He; Dongdong Shan; Hongfei Yan; Xiaoming Li

Bursty features in text streams are very useful in many text mining applications. Most existing studies detect bursty features based purely on term frequency changes without taking into account the semantic contexts of terms, and as a result the detected bursty features may not always be interesting or easy to interpret. In this paper we propose to model the contexts of bursty features using a language modeling approach. We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of a stream of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features.

ACM Transactions on Information Systems | 2017

A Neural Network Approach to Jointly Modeling Social Networks and Mobile Trajectories

Cheng Yang; Maosong Sun; Wayne Xin Zhao; Zhiyuan Liu; Edward Y. Chang

Two characteristics of location-based services are mobile trajectories and the ability to facilitate social networking. The recording of trajectory data contributes valuable resources towards understanding users’ geographical movement behaviors. Social networking is possible when users are able to quickly connect to anyone nearby. A social network with location based services is known as location-based social network (LBSN). As shown in Cho et al. [2013], locations that are frequently visited by socially related persons tend to be correlated, which indicates the close association between social connections and trajectory behaviors of users in LBSNs. To better analyze and mine LBSN data, we need to have a comprehensive view of each of these two aspects, i.e., the mobile trajectory data and the social network. Specifically, we present a novel neural network model that can jointly model both social networks and mobile trajectories. Our model consists of two components: the construction of social networks and the generation of mobile trajectories. First we adopt a network embedding method for the construction of social networks: a networking representation can be derived for a user. The key to our model lies in generating mobile trajectories. Second, we consider four factors that influence the generation process of mobile trajectories: user visit preference, influence of friends, short-term sequential contexts, and long-term sequential contexts. To characterize the last two contexts, we employ the RNN and GRU models to capture the sequential relatedness in mobile trajectories at the short or long term levels. Finally, the two components are tied by sharing the user network representations. Experimental results on two important applications demonstrate the effectiveness of our model. In particular, the improvement over baselines is more significant when either network structure or trajectory data is sparse.

conference on information and knowledge management | 2011

Efficient phrase querying with flat position index

Dongdong Shan; Wayne Xin Zhao; Jing He; Rui Yan; Hongfei Yan; Xiaoming Li

A large proportion of search engine queries contain phrases,namely a sequence of adjacent words. In this paper, we propose to use flat position index (a.k.a schema-independent index) for phrase query evaluation. In the flat position index, the entire document collection is viewed as a huge sequence of tokens. Each token is represented by one flat position, which is a unique position offset from the beginning of the collection. Each indexed term is associated with a list of the flat positions about that term in the sequence. To recover DocID from flat positions efficiently, we propose a novel cache sensitive look-up table (CSLT), which is much faster than existing search algorithms. Experiments on TREC GOV2 data collection show that flat position index can reduce the index size and speed up phrase querying substantially, compared with traditional word-level index.

ACM Transactions on Knowledge Discovery From Data | 2016

Mining Product Adopter Information from Online Reviews for Improving Product Recommendation

Wayne Xin Zhao; Jinpeng Wang; Yulan He; Ji-Rong Wen; Edward Y. Chang; Xiaoming Li

We present in this article an automated framework that extracts product adopter information from online reviews and incorporates the extracted information into feature-based matrix factorization for more effective product recommendation. In specific, we propose a bootstrapping approach for the extraction of product adopters from review text and categorize them into a number of different demographic categories. The aggregated demographic information of many product adopters can be used to characterize both products and users in the form of distributions over different demographic categories. We further propose a graph-based method to iteratively update user- and product-related distributions more reliably in a heterogeneous user--product graph and incorporate them as features into the matrix factorization approach for product recommendation. Our experimental results on a large dataset crawled from JingDong, the largest B2C e-commerce website in China, show that our proposed framework outperforms a number of competitive baselines for product recommendation.

Explore More