Yunqing Xia | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yunqing Xia is active.

Explore More

Publication

Featured researches published by Yunqing Xia.

IEEE Intelligent Systems | 2013

New Avenues in Opinion Mining and Sentiment Analysis

Erik Cambria; Björn W. Schuller; Yunqing Xia; Catherine Havasi

The Web holds valuable, vast, and unstructured information about public opinion. Here, the history, current use, and future of opinion mining and sentiment analysis are discussed, along with relevant techniques and tools.

acm transactions on management information systems | 2011

Text mining and probabilistic language modeling for online review spam detection

Raymond Y. K. Lau; Stephen Shaoyi Liao; Ron Chi-Wai Kwok; Kaiquan Xu; Yunqing Xia; Yuefeng Li

In the era of Web 2.0, huge volumes of consumer reviews are posted to the Internet every day. Manual approaches to detecting and analyzing fake reviews (i.e., spam) are not practical due to the problem of information overload. However, the design and development of automated methods of detecting fake reviews is a challenging research problem. The main reason is that fake reviews are specifically composed to mislead readers, so they may appear the same as legitimate reviews (i.e., ham). As a result, discriminatory features that would enable individual reviews to be classified as spam or ham may not be available. Guided by the design science research methodology, the main contribution of this study is the design and instantiation of novel computational models for detecting fake reviews. In particular, a novel text mining model is developed and integrated into a semantic language model for the detection of untruthful reviews. The models are then evaluated based on a real-world dataset collected from amazon.com. The results of our experiments confirm that the proposed models outperform other well-known baseline models in detecting fake reviews. To the best of our knowledge, the work discussed in this article represents the first successful attempt to apply text mining methods and semantic language models to the detection of fake consumer reviews. A managerial implication of our research is that firms can apply our design artifacts to monitor online consumer reviews to develop effective marketing or product design strategies based on genuine consumer feedback posted to the Internet.

Cognitive Computation | 2015

Word Polarity Disambiguation Using Bayesian Model and Opinion-Level Features

Yunqing Xia; Erik Cambria; Amir Hussain; Huan Zhao

Contextual polarity ambiguity is an important problem in sentiment analysis. Many opinion keywords carry varying polarities in different contexts, posing huge challenges for sentiment analysis research. Previous work on contextual polarity disambiguation makes use of term-level context, such as words and patterns, and resolves the polarity with a range of rule-based, statistics-based or machine learning methods. The major shortcoming of these methods lies in that the term-level features sometimes are ineffective in resolving the polarity. In this work, opinion-level context is explored, in which intra-opinion features and inter-opinion features are finely defined. To enable effective use of opinion-level features, the Bayesian model is adopted to resolve the polarity in a probabilistic manner. Experiments with the Opinmine corpus demonstrate that opinion-level features can make a significant contribution in word polarity disambiguation in four domains.

meeting of the association for computational linguistics | 2008

Sentiment vector space model for lyric-based song sentiment classification

Yunqing Xia; Linlin Wang; Kam-Fai Wong; Mingxing Xu

Lyric-based song sentiment classification seeks to assign songs appropriate sentiment labels such as light-hearted and heavy-hearted. Four problems render vector space model (VSM)-based text classification approach ineffective: 1) Many words within song lyrics actually contribute little to sentiment; 2) Nouns and verbs used to express sentiment are ambiguous; 3) Negations and modifiers around the sentiment keywords make particular contributions to sentiment; 4) Song lyric is usually very short. To address these problems, the sentiment vector space model (s-VSM) is proposed to represent song lyric document. The preliminary experiments prove that the s-VSM model outperforms the VSM model in the lyric-based song sentiment classification task.

web intelligence | 2008

Learning Knowledge from Relevant Webpage for Opinion Analysis

Ruifeng Xu; Kam-Fai Wong; Qin Lu; Yunqing Xia; Wenjie Li

This paper presents an opinion analysis system based on linguistic knowledge which is acquired from small-scale annotated text and raw topic-relevant Web page. Based on the observation on the annotated opinion corpus, some word-, collocation- and sentence-level linguistic features for opinion analysis are discovered. Supervised and unsupervised learning techniques are developed to learn these features from annotated text and raw relevant Web page, respectively. These features are then incorporated into a classifier based on support vector machine (SVM) to identify opinionated sentences and determine their polarities. Evaluations show that the proposed opinion analysis system, namely OA, achieved promising performance, which shows the effectiveness of linguistic knowledge learning from relevant Web page.

international conference on machine learning and cybernetics | 2007

The Unified Collocation Framework for Opinion Mining

Yunqing Xia; Ruifeng Xu; Kam-Fai Wong; Fang Zheng

Opinion mining is a complicated text understanding technology involving opinion extraction and sentiment analysis. State-of-the-art techniques adopt idea of attribute-driven or sentiment-driven, leading to low opinion mining coverage. This paper proposes the unified collocation framework (UCF) and describes a novel unified collocation-driven (UCD) opinion mining method. The UCF incorporates attribute-sentiment collocations as well as their syntactical features to achieve reasonable generalization ability. Preliminary experiments show that 0.245 on averages improve recall of opinion extraction without obvious loss on opinion extraction precision and sentiment analysis accuracy.

Cognitive Computation | 2015

Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification

Ruifeng Xu; Tao Chen; Yunqing Xia; Qin Lu; Bin Liu; Xuan Wang

Text classification often faces the problem of imbalanced training data. This is true in sentiment analysis and particularly prominent in emotion classification where multiple emotion categories are very likely to produce naturally skewed training data. Different sampling methods have been proposed to improve classification performance by reducing the imbalance ratio between training classes. However, data sparseness and the small disjunct problem remain obstacles in generating new samples for minority classes when the data are skewed and limited. Methods to produce meaningful samples for smaller classes rather than simple duplication are essential in overcoming this problem. In this paper, we present an oversampling method based on word embedding compositionality which produces meaningful balanced training data. We first use a large corpus to train a continuous skip-gram model to form a word embedding model maintaining the syntactic and semantic integrity of the word features. Then, a compositional algorithm based on recursive neural tensor networks is used to construct sentence vectors based on the word embedding model. Finally, we use the SMOTE algorithm as an oversampling method to generate samples for the minority classes and produce a fully balanced training set. Evaluation results on two quite different tasks show that the feature composition method and the oversampling method are both important in obtaining improved classification results. Our method effectively addresses the data imbalance issue and consequently achieves improved results for both sentiment and emotion classification.

Enterprise Information Systems | 2013

Discovering latent commercial networks from online financial news articles

Yunqing Xia; Weifeng Su; Raymond Y. K. Lau; Yi Liu

Unlike most online social networks where explicit links among individual users are defined, the relations among commercial entities (e.g. firms) may not be explicitly declared in commercial Web sites. One main contribution of this article is the development of a novel computational model for the discovery of the latent relations among commercial entities from online financial news. More specifically, a CRF model which can exploit both structural and contextual features is applied to commercial entity recognition. In addition, a point-wise mutual information (PMI)-based unsupervised learning method is developed for commercial relation identification. To evaluate the effectiveness of the proposed computational methods, a prototype system called CoNet has been developed. Based on the financial news articles crawled from Google finance, the CoNet system achieves average F-scores of 0.681 and 0.754 in commercial entity recognition and commercial relation identification, respectively. Our experimental results confirm that the proposed shallow natural language processing methods are effective for the discovery of latent commercial networks from online financial news.

IEEE Computational Intelligence Magazine | 2016

Computational Intelligence for Big Social Data Analysis [Guest Editorial]

Erik Cambria; Newton Howard; Yunqing Xia; Tat-Seng Chua

The articles in this special section focus on computational intelligence for big social data analytics. In the eras of social connectedness and social colonization, people are becoming increasingly enthusiastic about interacting, sharing, and collaborating through online collaborative media. In recent years, this collective intelligence has spread to many different areas, with particular focus on fields related to everyday life such as commerce, tourism, education, and health, causing the size of the Social Web to expand exponentially. The distillation of knowledge from such a large amount of unstructured information, however, is an extremely difficult task, as the contents of todays Web are perfectly suitable for human consumption, but remain hardly understandable to machines. Big social data analysis grows out of this need and combines multiple disciplines such as social network analysis, multimedia management, social media analytics, trend discovery, and opinion mining.

meeting of the association for computational linguistics | 2006

A Phonetic-Based Approach to Chinese Chat Text Normalization

Yunqing Xia; Kam-Fai Wong; Wenjie Li

Chatting is a popular communication media on the Internet via ICQ, chat rooms, etc. Chat language is different from natural language due to its anomalous and dynamic natures, which renders conventional NLP tools inapplicable. The dynamic problem is enormously troublesome because it makes static chat language corpus outdated quickly in representing contemporary chat language. To address the dynamic problem, we propose the phonetic mapping models to present mappings between chat terms and standard words via phonetic transcription, i.e. Chinese Pinyin in our case. Different from character mappings, the phonetic mappings can be constructed from available standard Chinese corpus. To perform the task of dynamic chat language term normalization, we extend the source channel model by incorporating the phonetic mapping models. Experimental results show that this method is effective and stable in normalizing dynamic chat language terms.

Explore More