Armineh Nourbakhsh
Thomson Reuters
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Armineh Nourbakhsh.
conference on information and knowledge management | 2016
Xiaomo Liu; Quanzhi Li; Armineh Nourbakhsh; Rui Fang; Merine Thomas; Kajsa Anderson; Russ Kociuba; Mark Vedder; Steven Pomerville; Ramdev Wudali; Robert Martin; John Duprey; Arun Vachher; William M. Keenan; Sameena Shah
News professionals are facing the challenge of discovering news from more diverse and unreliable information in the age of social media. More and more news events break on social media first and are picked up by news media subsequently. The recent Brussels attack is such an example. At Reuters, a global news agency, we have observed the necessity of providing a more effective tool that can help our journalists to quickly discover news on social media, verify them and then inform the public. In this paper, we describe Reuters Tracer, a system for sifting through all noise to detect news events on Twitter and assessing their veracity. We disclose the architecture of our system and discuss the various design strategies that facilitate the implementation of machine learning models for noise filtering and event detection. These techniques have been implemented at large scale and successfully discovered breaking news faster than traditional journalism
conference on information and knowledge management | 2016
Quanzhi Li; Sameena Shah; Xiaomo Liu; Armineh Nourbakhsh; Rui Fang
Classifying tweets into topic categories is necessary and important for many applications, since tweets are about a variety of topics and users are only interested in certain topical areas. Many tweet classification approaches fail to achieve high accuracy due to data sparseness issue. Tweet, as a special type of short text, in additional to its text, also has other metadata that can be used to enrich its context, such as user name, mention, hashtag and embedded link. In this demonstration, we present TweetSift, an efficient and effective real time tweet topic classifier. TweetSift exploits external tweet-specific entity knowledge to provide more topical context for a tweet, and integrates them with topic enhanced word embeddings for topic classification. The demonstration will show how TweetSift works and how it is incorporated with our social media event detection system.
web intelligence | 2016
Quanzhi Li; Sameena Shah; Xiaomo Liu; Armineh Nourbakhsh; Rui Fang
Many classification tasks on short text, such as tweet, fail to achieve high accuracy due to data sparseness. One approach to solving this problem is to enrich the context of data by using external data sources, or distributed language representations trained on huge amount of data. In this paper, we present several tweet topic classification methods by exploiting different types of data: tweet text, tweet text plus entity knowledge base, word embeddings derived from tweet text, distributed representations of tweets, and topical word embeddings. The word embedding, topical word embedding and sentence representation models are generated from billions of words from tweets without supervision. To the best of our knowledge, this is the first study of applying distributed language representations to tweet topic classification task.
international conference on data engineering | 2017
Quanzhi Li; Armineh Nourbakhsh; Sameena Shah; Xiaomo Liu
In this paper, we present a new approach for detecting novel events from social media, specially Twitter, at real-time. An event is usually defined by who, what, where and when, and an event tweet usually contains terms corresponding to these aspects. To exploit this information, we propose a method that incorporates simple semantics by splitting the tweet term space into groups of terms that have the meaning of the same type. These groups are called semantic categories (classes) and each reflects one or more event aspects. The semantic classes include named entity, mention, location, hashtag, verb, noun and embedded link. To group tweets talking about the same event into the same cluster, similarity measuring is conducted by calculating class-wise similarity and then aggregating them together. Users of a real-time event detection system are usually only interested in novel (new) events, which are happening now or just happened a short time ago. To fulfill this requirement, a temporal identification module is used to filter out event clusters that are about old stories. The clustering module also computes a novelty score for each event cluster, which reflects how novel the event is, compared to previous events. We evaluated our event detection method using multiple quality metrics and a large-scale event corpus having millions of tweets. The experiment results show that the proposed online event detection method achieves the state-of-the-art performance. Our experiment also shows that the temporal identification module can effectively detect old events.
international conference on data mining | 2015
Armineh Nourbakhsh; Xiaomo Liu; Sameena Shah; Rui Fang; Mohammad M. Ghassemi; Quanzhi Li
Rumor events differ in how and where they originate, what topics they address, the emotions they invoke, and how they engage their audience. In this paper, we study various semantic aspects of rumors and analyze the motivational and functional roles they play. Using Twitter as a case study, we develop a framework to characterize rumors. Our characterization covers intrinsic and extrinsic factors, tweet and event-level, as well as usage analysis. We determine the roles various user-types play and analyze rumor propagation from both a re-tweeting and burstiness perspective.
conference on information and knowledge management | 2016
Quanzhi Li; Sameena Shah; Armineh Nourbakhsh; Xiaomo Liu; Rui Fang
In this paper, we present a new approach of recommending hashtags for tweets. It uses Learning to Rank algorithm to incorporate features built from topic enhanced word embeddings, tweet entity data, hashtag frequency, hashtag temporal data and tweet URL domain information. The experiments using millions of tweets and hashtags show that the proposed approach outperforms the three baseline methods -- the LDA topic, the tf.idf based and the general word embedding approaches.
empirical methods in natural language processing | 2016
Rui Fang; Armineh Nourbakhsh; Xiaomo Liu; Sameena Shah; Quanzhi Li
Identifying witness accounts is important for rumor debunking, crises management, and basically any task that involves on the ground eyes. The prevalence of social media has provided citizen journalism with scale and eye witnesses prominence. However, the amount of noise on social media also makes it likely that witness accounts get buried too deep in the noise and are never discovered. In this paper, we explore automatic witness identification in Twitter during emergency events. We attempt to create a generalizable system that not only detects witness reports for unseen events, but also on true out-of-sample “real time streaming set” that may or may not have witness accounts. We attempt to detect the presence or surge of witness accounts, which is the first step in developing a model for detecting crisis-related events. We collect and annotate witness tweets for different types of events (earthquake, car accident, fire, cyclone, etc.) explore the related features and build a classifier to identify witness tweets in real time. Our system is able to significantly outperform prior methods with an average F-score of 89.7% on previously unseen events.
knowledge discovery and data mining | 2018
Fabio Petroni; Natraj Raman; Tim Nugent; Armineh Nourbakhsh; Žarko Panić; Sameena Shah; Jochen L. Leidner
The automatic extraction of breaking news events from natural language text is a valuable capability for decision support systems. Traditional systems tend to focus on extracting events from a single media source and often ignore cross-media references. Here, we describe a large-scale automated system for extracting natural disasters and critical events from both newswire text and social media. We outline a comprehensive architecture that can identify, categorize and summarize seven different event types - namely floods, storms, fires, armed conflict, terrorism, infrastructure breakdown, and labour unavailability. The system comprises fourteen modules and is equipped with a novel coreference mechanism, capable of linking events extracted from the two complementary data sources. Additionally, the system is easily extensible to accommodate new event types. Our experimental evaluation demonstrates the effectiveness of the system.
Archive | 2017
Quanzhi Li; Sameena Shah; Rui Fang; Armineh Nourbakhsh; Xiaomo Liu
Social media hashtags are useful in many applications, such as tweet classification, clustering, searching, indexing, and social network analysis. In this chapter, we present a Big Data mining technology on social media, and demonstrate how to use it to address the following three problems: discovering relevant hashtags for health concepts, discovering the meaning of health-related hashtags, and identifying hashtags relevant to each other in the health domain. The proposed approach is based on the distributed word representations, which are learned, by applying the state-of-the-art deep learning technology, from billions of tweet words without supervision. The experiment shows that this approach outperformed the baseline approach. To the best of our knowledge, this is the first study of applying distributed language representations to discovering relationships between health concepts and hashtags.
web intelligence | 2016
Quanzhi Li; Sameena Shah; Rui Fang; Armineh Nourbakhsh; Xiaomo Liu
Previous studies have used many manually identified features and word embeddings for tweet sentiment classification. In this paper, we propose a new approach, which incorporates sentiment-specific word embeddings (SSWE) and a weighted text feature model (WTFM). WTFM produces features based on text negation, tf.idf weighting scheme, and a Rocchio text classification method. Compared to other tweet sentiment feature generation approaches, WTFM is easy to build, simple, yet effective. Experiments show that the proposed approach outperforms the two state-of-the-art tweet sentiment classification methods, SSWE and National Research Council Canadas (NRC) model.