Chenyan Xiong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chenyan Xiong is active.

Explore More

Publication

Featured researches published by Chenyan Xiong.

international conference on the theory of information retrieval | 2015

Query Expansion with Freebase

Chenyan Xiong; Jamie Callan

Large knowledge bases are being developed to describe entities, their attributes, and their relationships to other entities. Prior research mostly focuses on the construction of knowledge bases, while how to use them in information retrieval is still an open problem. This paper presents a simple and effective method of using one such knowledge base, Freebase, to improve query expansion, a classic and widely studied information retrieval task. It investigates two methods of identifying the entities associated with a query, and two methods of using those entities to perform query expansion. A supervised model combines information derived from Freebase descriptions and categories to select terms that are effective for query expansion. Experiments on the ClueWeb09 dataset with TREC Web Track queries demonstrate that these methods are almost 30% more effective than strong, state-of-the-art query expansion algorithms. In addition to improving average performance, some of these methods have better win/loss ratios than baseline algorithms, with 50% fewer queries damaged.

conference on information and knowledge management | 2015

EsdRank: Connecting Query and Documents through External Semi-Structured Data

Chenyan Xiong; Jamie Callan

This paper presents EsdRank, a new technique for improving ranking using external semi-structured data such as controlled vocabularies and knowledge bases. EsdRank treats vocabularies, terms and entities from external data, as objects connecting query and documents. Evidence used to link query to objects, and to rank documents are incorporated as features between query-object and object-document correspondingly. A latent listwise learning to rank algorithm, Latent-ListMLE, models the objects as latent space between query and documents, and learns how to handle all evidence in a unified procedure from document relevance judgments. EsdRank is tested in two scenarios: Using a knowledge base for web search, and using a controlled vocabulary for medical search. Experiments on TREC Web Track and OHSUMED data show significant improvements over state-of-the-art baselines.

web search and data mining | 2012

Relational click prediction for sponsored search

Chenyan Xiong; Taifeng Wang; Wenkui Ding; Tie-Yan Liu

This paper is concerned with the prediction of clicking an ad in sponsored search. The accurate prediction of users click on an ad plays an important role in sponsored search, because it is widely used in both ranking and pricing of the ads. Previous work on click prediction usually takes a single ad as input, and ignores its relationship to the other ads shown in the same page. This independence assumption here, however, might not be valid in the real scenario. In this paper, we first perform an analysis on this issue by looking at the click-through rates (CTR) of the same ad, in the same position and for the same query, but surrounded by different ads. We found that in most cases the CTR varies largely, which suggests that the relationship between ads is really an important factor in predicting click probability. Furthermore, our investigation shows that the more similar the surrounding ads are to an ad, the lower the CTR of the ad is. Based on this observation, we design a continuous conditional random fields (CRF) based model for click prediction, which considers both the features of an ad and its similarity to the surrounding ads. We show that the model can be effectively learned using maximum likelihood estimation, and can also be efficiently inferred due to its closed form solution. Our experimental results on the click-through log from a commercial search engine show that the proposed model can predict clicks more accurately than previous independent models. To our best knowledge this is the first work that predicts ad clicks by considering the relationship between ads.

international acm sigir conference on research and development in information retrieval | 2017

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

Chenyan Xiong; Zhuyun Dai; Jamie Callan; Zhiyuan Liu; Russell Power

This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. The whole model is trained end-to-end. The ranking layer learns desired feature patterns from the pairwise ranking loss. The kernels transfer the feature patterns into soft-match targets at each similarity level and enforce them on the translation matrix. The word embeddings are tuned accordingly so that they can produce the desired soft matches. Experiments on a commercial search engines query log demonstrate the improvements of K-NRM over prior feature-based and neural-based states-of-the-art, and explain the source of K-NRMs advantage: Its kernel-guided embedding encodes a similarity metric tailored for matching query words to document words, and provides effective multi-level soft matches.

international world wide web conferences | 2017

Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding

Chenyan Xiong; Russell Power; Jamie Callan

This paper introduces Explicit Semantic Ranking (ESR), a new ranking technique that leverages knowledge graph embedding. Analysis of the query log from our academic search engine, SemanticScholar.org, reveals that a major error source is its inability to understand the meaning of research concepts in queries. To addresses this challenge, ESR represents queries and documents in the entity space and ranks them based on their semantic connections from their knowledge graph embedding. Experiments demonstrate ESRs ability in improving Semantic Scholars online production system, especially on hard queries where word-based ranking fails.

international acm sigir conference on research and development in information retrieval | 2017

Word-Entity Duet Representations for Document Ranking

Chenyan Xiong; Jamie Callan; Tie-Yan Liu

This paper presents a word-entity duet framework for utilizing knowledge bases in ad-hoc retrieval. In this work, the query and documents are modeled by word-based representations and entity-based representations. Ranking features are generated by the interactions between the two representations, incorporating information from the word space, the entity space, and the cross-space connections through the knowledge graph. To handle the uncertainties from the automatically constructed entity representations, an attention-based ranking model AttR-Duet is developed. With back-propagation from ranking labels, the model learns simultaneously how to demote noisy entities and how to rank documents with the word-entity duet. Evaluation results on TREC Web Track ad-hoc task demonstrate that all of the four-way interactions in the duet are useful, the attention mechanism successfully steers the model away from noisy entities, and together they significantly outperform both word-based and entity-based learning to rank systems.

international acm sigir conference on research and development in information retrieval | 2016

An Empirical Study of Learning to Rank for Entity Search

Jing Chen; Chenyan Xiong; Jamie Callan

This work investigates the effectiveness of learning to rank methods for entity search. Entities are represented by multi-field documents constructed from their RDF triples, and field-based text similarity features are extracted for query-entity pairs. State-of-the-art learning to rank methods learn models for ad-hoc entity search. Our experiments on an entity search test collection based on DBpedia confirm that learning to rank methods are as powerful for ranking entities as for ranking documents, and establish a new state-of-the-art for accuracy on this benchmark dataset.

international conference on the theory of information retrieval | 2016

Bag-of-Entities Representation for Ranking

Chenyan Xiong; Jamie Callan; Tie-Yan Liu

This paper presents a new bag-of-entities representation for document ranking, with the help of modern knowledge bases and automatic entity linking. Our system represents query and documents by bag-of-entities vectors constructed from their entity annotations, and ranks documents by their matches with the query in the entity space. Our experiments with Freebase on TREC Web Track datasets demonstrate that current entity linking systems can provide sufficient coverage of the general domain search task, and that bag-of-entities representations outperform bag-of-words by as much as 18% in standard document ranking tasks.

international acm sigir conference on research and development in information retrieval | 2017

DBpedia-Entity v2: A Test Collection for Entity Search

Faegheh Hasibi; Fedor Nikolaev; Chenyan Xiong; Krisztian Balog; Svein Erik Bratsberg; Alexander Kotov; Jamie Callan

The DBpedia-entity collection has been used as a standard test collection for entity search in recent years. We develop and release a new version of this test collection, DBpedia-Entity v2, which uses a more recent DBpedia dump and a unified candidate result pool from the same set of retrieval models. Relevance judgments are also collected in a uniform way, using the same group of crowdsourcing workers, following the same assessment guidelines. The result is an up-to-date and consistent test collection.To facilitate further research, we also provide details about the pre-processing and indexing steps, and include baseline results from both classical and recently developed entity search methods.

international acm sigir conference on research and development in information retrieval | 2014

A language modeling approach to entity recognition and disambiguation for search queries

Bhavana Dalvi; Chenyan Xiong; Jamie Callan

The Entity Recognition and Disambiguation (ERD) problem refers to the task of recognizing mentions of entities in a given query string, disambiguating them, and mapping them to entities in a given Knowledge Base(KB). If there are multiple ways to interpret the query, then an ERD system is supposed to group candidate entity annotations into consistent interpretations. In this paper, we propose a four step solution to this problem. First, we generate candidate entity strings by segmenting queries in different ways. Second, we retrieve candidate entities by searching for these candidate entity stringsin Freebase. Third, we rank the candidate entities using language model based query likelihood scores. Finally, we group the entity annotations into interpretations. We also present both quantitative and qualitative evaluation of our methods based on 91 training, 500 validation and 1000 test queries. Our system achieved an F1 score of 0.42 on the set of validation queries, whereas the NULL baseline which returns no annotations for any query achieved an F1 score of 0.3. Similarly, on the test queries, our method achieved an F1 score of 0.36 and outperformed the NULL baseline which achieved an F1 score of 0.2.

Explore More