Guang Qiu
Zhejiang University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Guang Qiu.
Computational Linguistics | 2011
Guang Qiu; Bing Liu; Jiajun Bu; Chun Chen
Analysis of opinions, known as opinion mining or sentiment analysis, has attracted a great deal of attention recently due to many practical applications and challenging research problems. In this article, we study two important problems, namely, opinion lexicon expansion and opinion target extraction. Opinion targets (targets, for short) are entities and their attributes on which opinions have been expressed. To perform the tasks, we found that there are several syntactic relations that link opinion words and targets. These relations can be identified using a dependency parser and then utilized to expand the initial opinion lexicon and to extract targets. This proposed method is based on bootstrapping. We call it double propagation as it propagates information between opinion words and targets. A key advantage of the proposed method is that it only needs an initial opinion lexicon to start the bootstrapping process. Thus, the method is semi-supervised due to the use of opinion word seeds. In evaluation, we compare the proposed method with several state-of-the-art methods using a standard product review test collection. The results show that our approach outperforms these existing methods significantly.
IEEE Transactions on Image Processing | 2011
Miao Zheng; Jiajun Bu; Chun Chen; Can Wang; Lijun Zhang; Guang Qiu; Deng Cai
Sparse coding has received an increasing amount of interest in recent years. It is an unsupervised learning algorithm, which finds a basis set capturing high-level semantics in the data and learns sparse coordinates in terms of the basis set. Originally applied to modeling the human visual cortex, sparse coding has been shown useful for many applications. However, most of the existing approaches to sparse coding fail to consider the geometrical structure of the data space. In many real applications, the data is more likely to reside on a low-dimensional submanifold embedded in the high-dimensional ambient space. It has been shown that the geometrical information of the data is important for discrimination. In this paper, we propose a graph based algorithm, called graph regularized sparse coding, to learn the sparse representations that explicitly take into account the local manifold structure of the data. By using graph Laplacian as a smooth operator, the obtained sparse representations vary smoothly along the geodesics of the data manifold. The extensive experimental results on image classification and clustering have demonstrated the effectiveness of our proposed algorithm.
international world wide web conferences | 2009
Mingcheng Qu; Guang Qiu; Xiaofei He; Cheng Zhang; Hao Wu; Jiajun Bu; Chun Chen
User-Interactive Question Answering (QA) communities such as Yahoo! Answers are growing in popularity. However, as these QA sites always have thousands of new questions posted daily, it is difficult for users to find the questions that are of interest to them. Consequently, this may delay the answering of the new questions. This gives rise to question recommendation techniques that help users locate interesting questions. In this paper, we adopt the Probabilistic Latent Semantic Analysis (PLSA) model for question recommendation and propose a novel metric to evaluate the performance of our approach. The experimental results show our recommendation approach is effective.
Expert Systems With Applications | 2010
Guang Qiu; Xiaofei He; Feng Zhang; Yuan Shi; Jiajun Bu; Chun Chen
Online advertising has become one of the major revenue sources of todays Internet ecosystem. The main advertising channels used to distribute textual ads are sponsored search and contextual advertising. Here we consider the problem of contextual advertising, i.e. associating ads with a Web page. Most of previous work only focuses on topical relevance of ads whereas the consumer attitudes are ignored. In this paper, we propose a novel advertising strategy, called Dissatisfaction-oriented Advertising based on sentiment analysis (DASA), to simultaneously improve ad relevance and user experience. Specifically, by using syntactic parsing and sentiment dictionary, we propose a rule based approach to extract topic words of opinion sentences associated with negative sentiment, which are regarded as the advertising keywords. We also design a prototype system for product information submission for the sake of ad selection. We take into account the consumer attitudes and promote the competitors of those products with which the consumers are not satisfied. The experimental results on advertising keyword extraction and ad selection have demonstrated the effectiveness of the proposed approach.
international acm sigir conference on research and development in information retrieval | 2007
Guang Qiu; Kangmiao Liu; Jiajun Bu; Chun Chen; Zhiming Kang
Query ambiguity prevents existing retrieval systems from returning reasonable results for every query. As there is already lots of work done on resolving ambiguity, vague queries could be handled using corresponding approaches separately if they can be identified in advance. Quantification of the degree of (lack of) ambiguity laysthe groundwork for the identification. In this poster, we propose such a measure using query topics based on the topic structure selected from the Open Directory Project (ODP) taxonomy. We introduce clarity score to quantify the lack of ambiguity with respect to data sets constructed from the TREC collections and the rank correlation test results demonstrate a strong positive association between the clarity scores and retrieval precisions for queries.
computational intelligence and security | 2007
Peng Huang; Jiajun Bu; Chun Chen; Guang Qiu
Question classification is one of the most important sub- tasks in Question Answering systems. Now question tax- onomy is getting larger and more fine-grained for better answer generation. Many approaches to question classifi- cation have been proposed and achieve reasonable results. However, all previous approaches use certain learning al- gorithm to learn a classifier from binary feature vectors, extracted from small size of labeled examples. In this pa- per we propose a feature-weighting model which assigns different weights to features instead of simple binary val- ues. The main characteristic of this model is assigning more reasonable weight to features: these weights can be used to differentiate features each other according to their contri- bution to question classification. Furthermore, features are weighted depending on not only small labeled question col- lection but also large unlabeled question collection. Exper- imental results show that with this new feature-weighting model the SVM-based classifier outperforms the one with- out it to some extent.
international world wide web conferences | 2009
Hao Wu; Guang Qiu; Xiaofei He; Yuan Shi; Mingcheng Qu; Jing Shen; Jiajun Bu; Chun Chen
This paper proposes an efficient relevance feedback based interactive model for keyword generation in sponsored search advertising. We formulate the ranking of relevant terms as a supervised learning problem and suggest new terms for the seed by leveraging user relevance feedback information. Active learning is employed to select the most informative samples from a set of candidate terms for user labeling. Experiments show our approach improves the relevance of generated terms significantly with little user effort required.
asian conference on intelligent information and database systems | 2009
Feng Zhang; Guang Qiu; Jiajun Bu; Mingcheng Qu; Chun Chen
Online advertising has now turned to be one of the major revenue sources for todays Internet companies. Among the different channels of advertising, contextual advertising takes the great part. There are already lots of studies done for the keyword extraction problem in contextual advertising for English, however, little has been conducted for Chinese, which is mainly different from English linguistically. In this paper, we focus on the problem of Chinese advertising keywords extraction and propose a novel approach based on the idea of classification. We adopt C4.5 as the classifier model and select appropriate features with Chinese linguistic characteristic taken into consideration. The experimental results indicate that our approach is promising.
international workshop on data mining and audience intelligence for advertising | 2007
Guang Qiu; Kangmiao Liu; Jiajun Bu; Chun Chen; Zhiming Kang
Previous work on opinion/sentiment mining focuses only on sentiment classification with the postulation that topics are identified a prior. However, this assumption often fails in reality. In advertising, topics on which users are commenting are crucial as corresponding advertisements can only be promoted when advertisers have the idea of what users are referring to. In this paper, we propose a rule-based approach to extracting topics from opinion sentences given these sentences identified from texts in advance. We build up a sentiment dictionary and define several rules based on the syntactic roles of words using the Dependence Grammar which is considered to be more suitable for Chinese natural language parsing. The experiments show encouraging results.
signal-image technology and internet-based systems | 2007
Peng Huang; Jiajun Bu; Chun Chen; Kangmiao Liu; Guang Qiu
Automatic image annotation is a promising methodology for image retrieval. However most current annotation models are not yet sophisticated enough to produce high quality annotations. Given an image, some irrelevant keywords to image contents are produced, which are a primary obstacle to getting high-quality image retrieval. In this paper an approach is proposed to improve automatic image annotation two directions. One is to combine annotation keywords produced by underlying three classic image annotation models of translation model, continuous-space relevance model and multiple Bernoulli relevance models, hoping to increase the number of potential correctly annotated keywords. Another is to remove irrelevant keywords to image semantics based on semantic similarity calculation using WordNet. To verify the proposed hybrid annotation model, we carried out the experiments on the widely used Corel image data set, and the reported experimental results showed that the proposed approach improved image annotation to some extent.