Quanzhi Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Quanzhi Li is active.

Explore More

Publication

Featured researches published by Quanzhi Li.

conference on information and knowledge management | 2016

Reuters Tracer: A Large Scale System of Detecting & Verifying Real-Time News Events from Twitter

Xiaomo Liu; Quanzhi Li; Armineh Nourbakhsh; Rui Fang; Merine Thomas; Kajsa Anderson; Russ Kociuba; Mark Vedder; Steven Pomerville; Ramdev Wudali; Robert Martin; John Duprey; Arun Vachher; William M. Keenan; Sameena Shah

News professionals are facing the challenge of discovering news from more diverse and unreliable information in the age of social media. More and more news events break on social media first and are picked up by news media subsequently. The recent Brussels attack is such an example. At Reuters, a global news agency, we have observed the necessity of providing a more effective tool that can help our journalists to quickly discover news on social media, verify them and then inform the public. In this paper, we describe Reuters Tracer, a system for sifting through all noise to detect news events on Twitter and assessing their veracity. We disclose the architecture of our system and discuss the various design strategies that facilitate the implementation of machine learning models for noise filtering and event detection. These techniques have been implemented at large scale and successfully discovered breaking news faster than traditional journalism

conference on information and knowledge management | 2016

TweetSift: Tweet Topic Classification Based on Entity Knowledge Base and Topic Enhanced Word Embedding

Quanzhi Li; Sameena Shah; Xiaomo Liu; Armineh Nourbakhsh; Rui Fang

Classifying tweets into topic categories is necessary and important for many applications, since tweets are about a variety of topics and users are only interested in certain topical areas. Many tweet classification approaches fail to achieve high accuracy due to data sparseness issue. Tweet, as a special type of short text, in additional to its text, also has other metadata that can be used to enrich its context, such as user name, mention, hashtag and embedded link. In this demonstration, we present TweetSift, an efficient and effective real time tweet topic classifier. TweetSift exploits external tweet-specific entity knowledge to provide more topical context for a tweet, and integrates them with topic enhanced word embeddings for topic classification. The demonstration will show how TweetSift works and how it is incorporated with our social media event detection system.

web intelligence | 2016

Tweet Topic Classification Using Distributed Language Representations

Quanzhi Li; Sameena Shah; Xiaomo Liu; Armineh Nourbakhsh; Rui Fang

Many classification tasks on short text, such as tweet, fail to achieve high accuracy due to data sparseness. One approach to solving this problem is to enrich the context of data by using external data sources, or distributed language representations trained on huge amount of data. In this paper, we present several tweet topic classification methods by exploiting different types of data: tweet text, tweet text plus entity knowledge base, word embeddings derived from tweet text, distributed representations of tweets, and topical word embeddings. The word embedding, topical word embedding and sentence representation models are generated from billions of words from tweets without supervision. To the best of our knowledge, this is the first study of applying distributed language representations to tweet topic classification task.

international conference on data engineering | 2017

Real-Time Novel Event Detection from Social Media

Quanzhi Li; Armineh Nourbakhsh; Sameena Shah; Xiaomo Liu

In this paper, we present a new approach for detecting novel events from social media, specially Twitter, at real-time. An event is usually defined by who, what, where and when, and an event tweet usually contains terms corresponding to these aspects. To exploit this information, we propose a method that incorporates simple semantics by splitting the tweet term space into groups of terms that have the meaning of the same type. These groups are called semantic categories (classes) and each reflects one or more event aspects. The semantic classes include named entity, mention, location, hashtag, verb, noun and embedded link. To group tweets talking about the same event into the same cluster, similarity measuring is conducted by calculating class-wise similarity and then aggregating them together. Users of a real-time event detection system are usually only interested in novel (new) events, which are happening now or just happened a short time ago. To fulfill this requirement, a temporal identification module is used to filter out event clusters that are about old stories. The clustering module also computes a novelty score for each event cluster, which reflects how novel the event is, compared to previous events. We evaluated our event detection method using multiple quality metrics and a large-scale event corpus having millions of tweets. The experiment results show that the proposed online event detection method achieves the state-of-the-art performance. Our experiment also shows that the temporal identification module can effectively detect old events.

conference on computational natural language learning | 2017

Learning Stock Market Sentiment Lexicon and Sentiment-Oriented Word Vector from StockTwits

Quanzhi Li; Sameena Shah

Previous studies have shown that investor sentiment indicators can predict stock market change. A domain-specific sentiment lexicon and sentiment-oriented word embedding model would help the sentiment analysis in financial domain and stock market. In this paper, we present a new approach to learning stock market lexicon from StockTwits, a popular financial social network for investors to share ideas. It learns word polarity by predicting message sentiment, using a neural net-work. The sentiment-oriented word embeddings are learned from tens of millions of StockTwits posts, and this is the first study presenting sentiment-oriented word embeddings for stock market. The experiments of predicting investor sentiment show that our lexicon outperformed other lexicons built by the state-of-the-art methods, and the sentiment-oriented word vector was much better than the general word embeddings.

international conference on data mining | 2015

Newsworthy Rumor Events: A Case Study of Twitter

Armineh Nourbakhsh; Xiaomo Liu; Sameena Shah; Rui Fang; Mohammad M. Ghassemi; Quanzhi Li

Rumor events differ in how and where they originate, what topics they address, the emotions they invoke, and how they engage their audience. In this paper, we study various semantic aspects of rumors and analyze the motivational and functional roles they play. Using Twitter as a case study, we develop a framework to characterize rumors. Our characterization covers intrinsic and extrinsic factors, tweet and event-level, as well as usage analysis. We determine the roles various user-types play and analyze rumor propagation from both a re-tweeting and burstiness perspective.

conference on information and knowledge management | 2016

Hashtag Recommendation Based on Topic Enhanced Embedding, Tweet Entity Data and Learning to Rank

Quanzhi Li; Sameena Shah; Armineh Nourbakhsh; Xiaomo Liu; Rui Fang

In this paper, we present a new approach of recommending hashtags for tweets. It uses Learning to Rank algorithm to incorporate features built from topic enhanced word embeddings, tweet entity data, hashtag frequency, hashtag temporal data and tweet URL domain information. The experiments using millions of tweets and hashtags show that the proposed approach outperforms the three baseline methods -- the LDA topic, the tf.idf based and the general word embedding approaches.

empirical methods in natural language processing | 2016

Witness Identification in Twitter.

Rui Fang; Armineh Nourbakhsh; Xiaomo Liu; Sameena Shah; Quanzhi Li

Identifying witness accounts is important for rumor debunking, crises management, and basically any task that involves on the ground eyes. The prevalence of social media has provided citizen journalism with scale and eye witnesses prominence. However, the amount of noise on social media also makes it likely that witness accounts get buried too deep in the noise and are never discovered. In this paper, we explore automatic witness identification in Twitter during emergency events. We attempt to create a generalizable system that not only detects witness reports for unseen events, but also on true out-of-sample “real time streaming set” that may or may not have witness accounts. We attempt to detect the presence or surge of witness accounts, which is the first step in developing a model for detecting crisis-related events. We collect and annotate witness tweets for different types of events (earthquake, car accident, fire, cyclone, etc.) explore the related features and build a classifier to identify witness tweets in real time. Our system is able to significantly outperform prior methods with an average F-score of 89.7% on previously unseen events.

Archive | 2017

Hashtag Mining: Discovering Relationship Between Health Concepts and Hashtags

Quanzhi Li; Sameena Shah; Rui Fang; Armineh Nourbakhsh; Xiaomo Liu

Social media hashtags are useful in many applications, such as tweet classification, clustering, searching, indexing, and social network analysis. In this chapter, we present a Big Data mining technology on social media, and demonstrate how to use it to address the following three problems: discovering relevant hashtags for health concepts, discovering the meaning of health-related hashtags, and identifying hashtags relevant to each other in the health domain. The proposed approach is based on the distributed word representations, which are learned, by applying the state-of-the-art deep learning technology, from billions of tweet words without supervision. The experiment shows that this approach outperformed the baseline approach. To the best of our knowledge, this is the first study of applying distributed language representations to discovering relationships between health concepts and hashtags.

web intelligence | 2016

Tweet Sentiment Analysis by Incorporating Sentiment-Specific Word Embedding and Weighted Text Features

Quanzhi Li; Sameena Shah; Rui Fang; Armineh Nourbakhsh; Xiaomo Liu

Previous studies have used many manually identified features and word embeddings for tweet sentiment classification. In this paper, we propose a new approach, which incorporates sentiment-specific word embeddings (SSWE) and a weighted text feature model (WTFM). WTFM produces features based on text negation, tf.idf weighting scheme, and a Rocchio text classification method. Compared to other tweet sentiment feature generation approaches, WTFM is easy to build, simple, yet effective. Experiments show that the proposed approach outperforms the two state-of-the-art tweet sentiment classification methods, SSWE and National Research Council Canadas (NRC) model.

Explore More