Kai Lei | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kai Lei is active.

Explore More

Publication

Featured researches published by Kai Lei.

global communications conference | 2013

Understanding Sina Weibo online social network: A community approach

Kai Lei; Kai Zhang; Kuai Xu

Sina Weibo, one of the most popular online social networks in China, has recently become a critical medium for Internet users to disseminate and discuss breaking news, social events and other information. Although online social networks and social media have received significant attention from the research community, few studies have focused on Sina Weibo due to the lack of data collection. Given the sheer size of Sina Weibo online social network and vast amount of tweets, retweets and comments, this paper introduces a novel community approach for understanding Sina Weibo online social network. Specifically, we collect all Weibo users registered with Shenzhen as primary geographic location, and build a Shenzhen Weibo community graph based on their following or follower relationships. Our experimental results describe interesting graphical characteristics such as clustering coefficients of this community graph, and reveal the impact of user popularity on tweet influence. Through modeling interactions of Shenzhen Weibo users and their tweeted messages with bipartite graphs and one-mode projections, we analyze the similarity of retweeting and commenting activities among these users, and discuss the implications of the findings on understanding different types of user accounts and the motivations of their following and retweeting behaviors. To the best of our knowledge, this study is the first effort to introduce a community approach for understanding the community characteristics of Sina Weibo and characterizing the similarity of retweeting behaviors and following relationships.

conference on information and knowledge management | 2015

ASEM: Mining Aspects and Sentiment of Events from Microblog

Ruhui Wang; Weijing Huang; Tengjiao Wang; Kai Lei

Microblogs contain the most up-to-date and abundant opinion information on current events. Aspect-based opinion mining is a good way to get a comprehensive summarization of events. The most popular aspect based opinion mining models are used in the field of product and service. However, existing models are not suitable for event mining. In this paper we propose a novel probabilistic generative model (ASEM) to simultaneously discover aspects and the specified opinions. ASEM incorporate a sequence labeling model(CRF) into a generative topic model. Additionally, we adopt a set of features for separating aspects and sentiments. Moreover, we novelly present a continuously learning model. It can utilize the knowledge of one event to learn another, and get a better performance. We use five real world events to do experiment. The experimental results show that ASEM extracts aspects and sentiments well, and ASEM outperforms other state-of-art models and the intuitive two-step method.

web age information management | 2014

Sarcasm Detection in Social Media Based on Imbalanced Classification

Peng Liu; Gaoyan Ou; Tengjiao Wang; Dongqing Yang; Kai Lei

Sarcasm is a pervasive linguistic phenomenon in online documents that express subjective and deeply-felt opinions. Detection of sarcasm is of great importance and beneficial to many NLP applications, such as sentiment analysis, opinion mining and advertising. Current studies consider automatic sarcasm detection as a simple text classification problem. They do not use explicit features to detect sarcasm and ignore the imbalance between sarcastic and non-sarcastic samples in real applications. In this paper, we first explore the characteristics of both English and Chinese sarcastic sentences and introduce a set of features specifically for detecting sarcasm in social media. Then, we propose a novel multi-strategy ensemble learning approach(MSELA) to handle the imbalance problem. We evaluate our proposed model on English and Chinese data sets. Experimental results show that our ensemble approach outperforms the state-of-the-art sarcasm detection approaches and popular imbalanced classification methods.

international symposium on neural networks | 2013

Massively parallel learning of Bayesian networks with MapReduce for factor relationship analysis

Tengjiao Wang; Dongqing Yang; Kai Lei; Yueqin Liu

Bayesian Network (BN) is one of the most popular models in data mining technologies. Most of the algorithms of BN structure learning are developed for the centralized datasets, where all the data are gathered into a single computer node. They are often too costly or impractical for learning BN structures from large scale data. Through a simple interface with two functions, map and reduce, MapReduce facilitates parallel implementation of many real-world tasks such as data processing for search engines and machine learning. In this paper, we present a parallel algorithm for BN structure leaning from large-scale dateset by using a MapReduce cluster. We discuss the benefits of using MapReduce for BN structure learning, and demonstrate the performance of this approach by applying it to a real world financial factor relationships learning task from the domain of financial analysis.

asia-pacific web conference | 2014

An Adaptive Skew Insensitive Join Algorithm for Large Scale Data Analytics

Wenjing Liao; Tengjiao Wang; Hongyan Li; Dongqing Yang; Zhen Qiu; Kai Lei

With data explosion in recent years, timely and cost-effective analytics over large scale data has been a hotspot of data management research. Join is an important operation in database query. However, data skew happens naturally in many applications, which will severely degrade the performance of most join algorithms. To address this problem, this paper introduces an Adaptive Skew Insensitive(ASI) join algorithm to handle with serious data skew. Based on our cost analysis, ASI join algorithm can adaptively choose the best join algorithm for different inputs. Compared with several state-of-the-art join methods through adequate experiments, our method achieves significant improvement of join efficiency dealing with data skew.

web age information management | 2014

Characterizing Tweeting Behaviors of Sina Weibo Users via Public Data Streaming

Kai Zhang; Qian Yu; Kai Lei; Kuai Xu

Since the initial launch in August 2009, SinaWeibo, a Twitter-like microblogging service, has grown rapidly to become a major and influential site for millions of Internet users in China to disseminate news and urgent information, promote new productions, and express opinions and comments on controversial issues [4, 6]. However, unlike Twitter which attracts much attentions from the research community due to its popularity in United States and Europe, few studies have been done to characterize tweeting behaviors of Sina Weibo users.

international performance computing and communications conference | 2014

Hot topic analysis and content mining in social media

Qian Yu; Wei Tao Weng; Kai Zhang; Kai Lei; Kuai Xu

Sina Weibo has become an increasingly critical social media in China for sharing latest news, marketing new products, and discussing controversial issues. The rising importance of Sina Weibo on the society makes it very important to understand “what”, “when”, “who” on hot topics that are being continuously tweeted and searched by millions of active users. In this paper, we develop a systematic approach to characterize temporal distribution of hot topics searched by Sina Weibo users over a four-month time-span and to uncover correlated hot topics that are not only tweeted by the same users, but also appear in the similar set of tweet messages. We analyze real-time Sina Weibo tweet data streams and study volume correlations and temporal gaps between user searches and tweeting activities on hot topics. In addition, we examine the correlations between hot topic searches on social media and on search engines to understand hot topics and user behaviors across different platforms. Given the challenges of analyzing massive amount of tweet data, we explore Hadoop MapReduce framework to effectively process millions of tweets from the collected data-sets, and quantify the performance benefits of MapReduce on analyzing tweet streams. To the best of our knowledge, this paper is the first effort to characterize temporal search patterns of hot topics on Sina Weibo and to study their correlations with tweeting data streams as well as search engine statistics.

knowledge discovery and data mining | 2016

Dboost: A Fast Algorithm for DBSCAN-based Clustering on High Dimensional Data

Yuxiao Zhang; Xiaorong Wang; Bingyang Li; Tengjiao Wang; Kai Lei

DBSCAN is a classic density-based clustering technique, which is well known in discovering clusters of arbitrary shapes and handling noise. However, it is very time-consuming in density calculation when facing high dimensional data, which makes it inefficient in many areas, such as multi-document summarization, product recommendation, etc. Therefore, how to efficiently calculate the density on high dimensional data becomes one key issue for DBSCAN-based clustering technique. In this paper, we propose a fast algorithm for DBSCAN-based clustering on high dimensional data, named Dboost. In our algorithm, a ranked retrieval technique adaption named

web age information management | 2015