Gu Xu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gu Xu is active.

Explore More

Publication

Featured researches published by Gu Xu.

international acm sigir conference on research and development in information retrieval | 2009

Named entity recognition in query

Jiafeng Guo; Gu Xu; Xueqi Cheng; Hang Li

This paper addresses the problem of Named Entity Recognition in Query (NERQ), which involves detection of the named entity in a given query and classification of the named entity into predefined classes. NERQ is potentially useful in many applications in web search. The paper proposes taking a probabilistic approach to the task using query log data and Latent Dirichlet Allocation. We consider contexts of a named entity (i.e., the remainders of the named entity in queries) as words of a document, and classes of the named entity as topics. The topic model is constructed by a novel and general learning method referred to as WS-LDA (Weakly Supervised Latent Dirichlet Allocation), which employs weakly supervised learning (rather than unsupervised learning) using partially labeled seed entities. Experimental results show that the proposed method based on WS-LDA can accurately perform NERQ, and outperform the baseline methods.

international acm sigir conference on research and development in information retrieval | 2008

A unified and discriminative model for query refinement

Jiafeng Guo; Gu Xu; Hang Li; Xueqi Cheng

This paper addresses the issue of query refinement, which involves reformulating ill-formed search queries in order to enhance relevance of search results. Query refinement typically includes a number of tasks such as spelling error correction, word splitting, word merging, phrase segmentation, word stemming, and acronym expansion. In previous research, such tasks were addressed separately or through employing generative models. This paper proposes employing a unified and discriminative model for query refinement. Specifically, it proposes a Conditional Random Field (CRF) model suitable for the problem, referred to as Conditional Random Field for Query Refinement (CRF-QR). Given a sequence of query words, CRF-QR predicts a sequence of refined query words as well as corresponding refinement operations. In that sense, CRF-QR differs greatly from conventional CRF models. Two types of CRF-QR models, namely a basic model and an extended model are introduced. One merit of employing CRF-QR is that different refinement tasks can be performed simultaneously and thus the accuracy of refinement can be enhanced. Furthermore, the advantages of discriminative models over generative models can be fully leveraged. Experimental results demonstrate that CRF-QR can significantly outperform baseline methods. Furthermore, when CRF-QR is used in web search, a significant improvement of relevance can be obtained.

international acm sigir conference on research and development in information retrieval | 2006

Building implicit links from content for forum search

Gu Xu; Wei-Ying Ma

The objective of Web forums is to create a shared space for open communications and discussions of specific topics and issues. The tremendous information behind forum sites is not fully-utilized yet. Most links between forum pages are automatically created, which means the link-based ranking algorithm cannot be applied efficiently. In this paper, we proposed a novel ranking algorithm which tries to introduce the content information into link-based methods as implicit links. The basic idea is derived from the more focused random surfer: the surfer may more likely jump to a page which is similar to what he is reading currently. In this manner, we are allowed to introduce the content similarities into the link graph as a personalization bias. Our method, named Fine-grained Rank (FGRank), can be efficiently computed based on an automatically generated topic hierarchy. Not like the topic-sensitive PageRank, our method only need to compute single PageRank score for each page. Another contribution of this paper is to present a very efficient algorithm for automatically generating topic hierarchy and map each page in a large-scale collection onto the computed hierarchy. The experimental results show that the proposed method can improve retrieval performance, and reveal that content-based link graph is also important compared with the hyper-link graph.

conference on information and knowledge management | 2011

Intent-aware query similarity

Jiafeng Guo; Xueqi Cheng; Gu Xu; Xiaofei Zhu

Query similarity calculation is an important problem and has a wide range of applications in IR, including query recommendation, query expansion, and even advertisement matching. Existing work on query similarity aims to provide a single similarity measure without considering the fact that queries are ambiguous and usually have multiple search intents. In this paper, we argue that query similarity should be defined upon search intents, so-called intent-aware query similarity. By introducing search intents into the calculation of query similarity, we can obtain more accurate and also informative similarity measures on queries and thus help a variety of applications, especially those related to diversification. Specifically, we first identify the potential search intents of queries, and then measure query similarity under different intents using intent-aware representations. A regularized topic model is employed to automatically learn the potential intents of queries by using both the words from search result snippets and the regularization from query co-clicks. Experimental results confirm the effectiveness of intent-aware query similarity on ambiguous queries which can provide significantly better similarity scores over the traditional approaches. We also experimentally verified the utility of intent-aware similarity in the application of query recommendation, which can suggest diverse queries in a structured way to search users.

international acm sigir conference on research and development in information retrieval | 2013

Query expansion using path-constrained random walks

Jianfeng Gao; Gu Xu; Jinxi Xu

This paper exploits Web search logs for query expansion (QE) by presenting a new QE method based on path-constrained random walks (PCRW), where the search logs are represented as a labeled, directed graph, and the probability of picking an expansion term for an input query is computed by a learned combination of constrained random walks on the graph. The method is shown to be generic in that it covers most of the popular QE models as special cases and flexible in that it provides a principled mathematical framework in which a wide variety of information useful for QE can be incorporated in a unified way. Evaluation is performed on the Web document ranking task using a real-world data set. Results show that the PCRW-based method is very effective for the expansion of rare queries, i.e., low-frequency queries that are unseen in search logs, and that it outperforms significantly other state-of-the-art QE meth-ods.

international acm sigir conference on research and development in information retrieval | 2011

Query representation and understanding workshop

W. Bruce Croft; Michael Bendersky; Hang Li; Gu Xu

This report summarizes the events of the SIGIR 2010 workshop on Query Representation and Understanding, which was held on July 23rd, 2010 in Geneva, Switzerland.

IEEE Transactions on Knowledge and Data Engineering | 2014

A Probabilistic Approach to String Transformation

Ziqi Wang; Gu Xu; Hang Li; Ming Zhang

Many problems in natural language processing, data mining, information retrieval, and bioinformatics can be formalized as string transformation, which is a task as follows. Given an input string, the system generates the k most likely output strings corresponding to the input string. This paper proposes a novel and probabilistic approach to string transformation, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for generating the top k candidates, whether there is or is not a predefined dictionary. The log linear model is defined as a conditional probability distribution of an output string and a rule set for the transformation conditioned on an input string. The learning method employs maximum likelihood estimation for parameter estimation. The string generation algorithm based on pruning is guaranteed to generate the optimal top k candidates. The proposed method is applied to correction of spelling errors in queries as well as reformulation of queries in web search. Experimental results on large scale data show that the proposed approach is very accurate and efficient improving upon existing methods in terms of accuracy and efficiency in different settings.

conference on information and knowledge management | 2010