Xiaojun Quan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaojun Quan is active.

Explore More

Publication

Featured researches published by Xiaojun Quan.

Knowledge and Information Systems | 2010

Short text similarity based on probabilistic topics

Xiaojun Quan; Gang Liu; Zhi Lu; Xingliang Ni; Liu Wenyin

In this paper, we propose a new method for measuring the similarity between two short text snippets by comparing each of them with the probabilistic topics. Specifically, our method starts by firstly finding the distinguishing terms between the two short text snippets and comparing them with a series of probabilistic topics, extracted by Gibbs sampling algorithm. The relationship between the distinguishing terms of the short text snippets can be discovered by examining their probabilities under each topic. The similarity between two short text snippets is calculated based on their common terms and the relationship of their distinguishing terms. Extensive experiments on paraphrasing and question categorization show that the proposed method can calculate the similarity of short text snippets more accurately than other methods including the pure TF-IDF measure.

Future Generation Computer Systems | 2010

Discovering phishing target based on semantic link network

Liu Wenyin; Ning Fang; Xiaojun Quan; Bite Qiu; Gang Liu

An approach to the discovery of the phishing target of a suspicious webpage is proposed, which is based on construction and reasoning of the Semantic Link Network (SLN) of the suspicious webpage. The SLN is constructed from the given suspicious webpage and its associated webpages. Since reasoning of the SLN can discover implicit relations among webpages, the true association relations between a phishing webpage and its target are acquired via reasoning. Afterwards, by analysis of the relations, the suspicious webpage can be identified as phishing or not based on the predefined rules, and its target can be discovered if it is phishing. Our test dataset consists of 1000 phishing pages selected from PhishTank, and 1000 legitimate webpages. The experimental results show that the proposed method yields a false negative rate of 16.6% on the phishing pages and a false positive rate of 13.8% on the legitimate pages.

Knowledge and Information Systems | 2011

Short text clustering by finding core terms

Xingliang Ni; Xiaojun Quan; Zhi Lu; Liu Wenyin; Bei Hua

A new clustering strategy, TermCut, is presented to cluster short text snippets by finding core terms in the corpus. We model the collection of short text snippets as a graph in which each vertex represents a piece of short text snippet and each weighted edge between two vertices measures the relationship between the two vertices. TermCut is then applied to recursively select a core term and bisect the graph such that the short text snippets in one part of the graph contain the term, whereas those snippets in the other part do not. We apply the proposed method on different types of short text snippets, including questions and search results. Experimental results show that the proposed method outperforms state-of-the-art clustering algorithms for clustering short text snippets.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011

Term Weighting Schemes for Question Categorization

Xiaojun Quan; Liu Wenyin; Bite Qiu

Term weighting has proven to be an effective way to improve the performance of text categorization. Very recently, with the development of user-interactive question answering or community question answering, there has emerged a need to accurately categorize questions into predefined categories. However, as a question is usually a piece of short text, can the existing term-weighting methods perform consistently in question categorization as they do in text categorization? The answer is not clear, since to the best of our knowledge, we have not seen any work related to this problem despite of its significance. In this study, we investigate the popular unsupervised and supervised term-weighting methods for question categorization. At the same time, we propose three new supervised term-weighting methods, namely, gf* icf, igf* gf* icf, and vrf. Comparisons of them with existing unsupervised and supervised term weighting methods are made through a series of experiments on question collections of Yahoo! Answers. The experimental results show that igf* gf* icf achieves the best performance among all term-weighting methods, while gf*icf and vrf are also competitive for question categorization. Meanwhile, tf* OR is proven to be the most significant one among existing methods. In addition, igf* gf* icf and vrf are also effective for long document categorization.

Neural Networks | 2014

2014 Special Issue: Affective topic model for social emotion detection

Yanghui Rao; Qing Li; Liu Wenyin; Qingyuan Wu; Xiaojun Quan

The rapid development of social media services has been a great boon for the communication of emotions through blogs, microblogs/tweets, instant-messaging tools, news portals, and so forth. This paper is concerned with the detection of emotions evoked in a reader by social media. Compared to classical sentiment analysis conducted from the writers perspective, analysis from the readers perspective can be more meaningful when applied to social media. We propose an affective topic model with the intention to bridge the gap between social media materials and a readers emotions by introducing an intermediate layer. The proposed model can be used to classify the social emotions of unlabeled documents and to generate a social emotion lexicon. Extensive evaluations using real-world data validate the effectiveness of the proposed model for both these applications.

Future Generation Computer Systems | 2014

Towards building a social emotion detection system for online news

Jingsheng Lei; Yanghui Rao; Qing Li; Xiaojun Quan; Liu Wenyin

Abstract Social emotion detection of online users has become an important task for mining public opinions. Social emotion detection aims at predicting the readers’ emotions evoked by news articles, tweets, etc. In this article, we focus on building a social emotion detection system for online news. The system is built based on the modules of document selection, Part-of-speech (POS) tagging, and social emotion lexicon generation. Empirical studies are extensively conducted on a large scale real-world collection of news articles. Experiments show that the document selection algorithm has a positive effect on the social emotion detection. The system performs better with the words and POS combination compared to a feature set consisting only of words. POS is also useful to detect emotion ambiguity of words and the context dependence of their sentiment orientations. Furthermore, the proposed method of generating the lexicon outperforms the baselines in terms of social emotion prediction.

Information Processing and Management | 2012

User interest modeling and its application for question recommendation in user-interactive question answering systems

Xingliang Ni; Yao Lu; Xiaojun Quan; Liu Wenyin; Bei Hua

In this paper, we propose a generative model, the Topic-based User Interest (TUI) model, to capture the user interest in the User-Interactive Question Answering (UIQA) systems. Specifically, our method aims to model the user interest in the UIQA systems with latent topic method, and extract interests for users by mining the questions they asked, the categories they participated in and relevant answer providers. We apply the TUI model to the application of question recommendation, which automatically recommends to certain user appropriate questions he might be interested in. Data collection from Yahoo! Answers is used to evaluate the performance of the proposed model in question recommendation, and the experimental results show the effectiveness of our proposed model.

Information Processing and Management | 2011

Automatic categorization of questions for user-interactive question answering

Wanpeng Song; Liu Wenyin; Naijie Gu; Xiaojun Quan; Tianyong Hao

Question categorization, which suggests one of a set of predefined categories to a users question according to the questions topic or content, is a useful technique in user-interactive question answering systems. In this paper, we propose an automatic method for question categorization in a user-interactive question answering system. This method includes four steps: feature space construction, topic-wise words identification and weighting, semantic mapping, and similarity calculation. We firstly construct the feature space based on all accumulated questions and calculate the feature vector of each predefined category which contains certain accumulated questions. When a new question is posted, the semantic pattern of the question is used to identify and weigh the important words of the question. After that, the question is semantically mapped into the constructed feature space to enrich its representation. Finally, the similarity between the question and each category is calculated based on their feature vectors. The category with the highest similarity is assigned to the question. The experimental results show that our proposed method achieves good categorization precision and outperforms the traditional categorization methods on the selected test questions.

international acm sigir conference on research and development in information retrieval | 2012

Emotion tagging for comments of online news by meta classification with heterogeneous information sources

Ying Zhang; Yi Fang; Xiaojun Quan; Lin Dai; Luo Si; Xiaojie Yuan

With the rapid growth of online news services, users can actively respond to online news by making comments. Users often express subjective emotions in comments such as sadness, surprise and anger. Such emotions can help understand the preferences and perspectives of individual users, and therefore may facilitate online publishers to provide users with more relevant services. This paper tackles the task of predicting emotions for the comments of online news. To the best of our knowledge, this is the first research work for addressing the task. In particular, this paper proposes a novel Meta classification approach that exploits heterogeneous information sources such as the content of the comments and the emotion tags of news articles generated by users. The experiments on two datasets from online news services demonstrate the effectiveness of the proposed approach.

semantics, knowledge and grid | 2009

Latent Link Analysis for Expert Finding in User-Interactive Question Answering Services

Yao Lu; Xiaojun Quan; Xingliang Ni; Wenyin Liu; Yinlong Xu

In this paper, we propose a latent link analysis approach for improving the accuracy of expert finding in User-Interactive Question Answering (UIQA) services. Both direct and latent relationship links are considered in the link analysis of the user relation graph model. In the graph model, the direct links can be acquired directly from question-answer relationship between users. While the latent links, which reveal the latent question-answer relation between users, can be obtained indirectly by measuring the similarity between users’ asked and answered question profiles. Finally, a propagation-based approach is employed to calculate the expert score for evaluating users’ expertise levels. Experimental results show that our approach can perform better than the method without considering latent relationship links in terms of expert finding and ranking.

Explore More