Geli Fei
University of Illinois at Chicago
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Geli Fei.
knowledge discovery and data mining | 2016
Shuai Wang; Zhiyuan Chen; Geli Fei; Bing Liu; Sherry Emery
One of the overarching tasks of document analysis is to find what topics people talk about. One of the main techniques for this purpose is topic modeling. So far many models have been proposed. However, the existing models typically perform full analysis on the whole data to find all topics. This is certainly useful, but in practice we found that the user almost always also wants to perform more detailed analyses on some specific aspects, which we refer to as targets (or targeted aspects). Current full-analysis models are not suitable for such analyses as their generated topics are often too coarse and may not even be on target. For example, given a set of tweets about e-cigarette, one may want to find out what topics under discussion are specifically related to children. Likewise, given a collection of online reviews about a camera, a consumer or camera manufacturer may be interested in finding out all topics about the cameras screen, the targeted aspect. As we will see in our experiments, current full topic models are ineffective for such targeted analyses. This paper studies this problem and proposes a novel targeted topic model (TTM) to enable focused analyses on any specific aspect of interest. Our experimental results demonstrate the effectiveness of the TTM.
knowledge discovery and data mining | 2016
Geli Fei; Shuai Wang; Bing Liu
In classic supervised learning, a learning algorithm takes a fixed training data of several classes to build a classifier. In this paper, we propose to study a new problem, i.e., building a learning system that learns cumulatively. As time goes by, the system sees and learns more and more classes of data and becomes more and more knowledgeable. We believe that this is similar to human learning. We humans learn continuously, retaining the learned knowledge, identifying and learning new things, and updating the existing knowledge with new experiences. Over time, we cumulate more and more knowledge. A learning system should be able to do the same. As algorithmic learning matures, it is time to tackle this cumulative machine learning (or simply cumulative learning) problem, which is a kind of lifelong machine learning problem. It presents two major challenges. First, the system must be able to detect data from unseen classes in the test set. Classic supervised learning, however, assumes all classes in testing are known or seen at the training time. Second, the system needs to be able to selectively update its models whenever a new class of data arrives without re-training the whole system using the entire past and present training data. This paper proposes a novel approach and system to tackle these challenges. Experimental results on two datasets with learning from 2 classes to up to 100 classes show that the proposed approach is highly promising in terms of both classification accuracy and computational efficiency.
empirical methods in natural language processing | 2015
Geli Fei; Bing Liu
In a typical social media content analysis task, the user is interested in analyzing posts of a particular topic. Identifying such posts is often formulated as a classification problem. However, this problem is challenging. One key issue is covariate shift. That is, the training data is not fully representative of the test data. We observed that the covariate shift mainly occurs in the negative data because topics discussed in social media are highly diverse and numerous, but the user-labeled negative training data may cover only a small number of topics. This paper proposes a novel technique to solve the problem. The key novelty of the technique is the transformation of document representation from the traditional ngram feature space to a center-based similarity (CBS) space. In the CBS space, the covariate shift problem is significantly mitigated, which enables us to build much better classifiers. Experiment results show that the proposed approach markedly improves classification.
north american chapter of the association for computational linguistics | 2016
Geli Fei; Bing Liu
Existing research on multiclass text classification mostly makes the closed world assumption, which focuses on designing accurate classifiers under the assumption that all test classes are known at training time. A more realistic scenario is to expect unseen classes during testing (open world). In this case, the goal is to design a learning system that classifies documents of the known classes into their respective classes and also to reject documents from unknown classes. This problem is called open (world) classification. This paper approaches the problem by reducing the open space risk while balancing the empirical risk. It proposes to use a new learning strategy, called center-based similarity (CBS) space learning (or CBS learning), to provide a novel solution to the problem. Extensive experiments across two datasets show that CBS learning gives promising results on multiclass open text classification compared to state-ofthe-art baselines.
conference on intelligent text processing and computational linguistics | 2016
Geli Fei; Zhiyuan Brett Chen; Arjun Mukherjee; Bing Liu
Extracting aspects and sentiments is a key problem in sentiment analysis. Existing models rely on joint modeling with supervised aspect and sentiment switching. This paper explores unsupervised models by exploiting a novel angle – correspondence of sentiments with aspects via topic modeling under two views. The idea is to split documents into two views and model the topic correspondence across the two views. We propose two new models that work on a set of document pairs (documents with two views) to discover their corresponding topics. Experimental results show that the proposed approach significantly outperforms strong baselines.
Sentiment Analysis in Social Networks | 2017
Geli Fei; Huayi Li; Bing Liu
As social media websites have emerged as popular platforms for sharing and spreading real-time information on the Internet, impostors see huge opportunities in taking advantage of such systems to spread distorted information. Online social networks and review websites have become targets of opinion spamming. More and more traditional review websites allow users to “friend” or “follow” each other so as to enhance overall user experience. This has brought significant advances of opinion spam detection using users’ social networks or heterogeneous networks of various entities in a broader sense. In this chapter, different techniques are introduced, such as belief propagation and collective positive-unlabeled learning, that leverage the intricate relations between different entities in the network. We discuss these methods via the application of spam detection on review-hosting websites and popular social media platforms.
international conference on weblogs and social media | 2013
Geli Fei; Arjun Mukherjee; Bing Liu; Meichun Hsu; Malu Castellanos; Riddhiman Ghosh
international conference on computational linguistics | 2012
Geli Fei; Bing Liu; Meichun Hsu; Malu Castellanos; Riddhiman Ghosh
international world wide web conferences | 2017
Huayi Li; Geli Fei; Shuai Wang; Bing Liu; Weixiang Shao; Arjun Mukherjee; Jidong Shao
international conference on computational linguistics | 2014
Geli Fei; Zhiyuan Chen; Bing Liu