Ruey-g Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ruey-g Chen is active.

Explore More

Publication

Featured researches published by Ruey-g Chen.

european conference on information retrieval | 2016

Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval

Liu Yang; Qingyao Ai; Damiano Spina; Ruey-Cheng Chen; Liang Pang; W. Bruce Croft; Jiafeng Guo; Falk Scholer

Retrieving finer grained text units such as passages or sentences as answers for non-factoid Web queries is becoming increasingly important for applications such as mobile Web search. In this work, we introduce the answer sentence retrieval task for non-factoid Web queries, and investigate how this task can be effectively solved under a learning to rank framework. We design two types of features, namely semantic and context features, beyond traditional text matching features. We compare learning to rank methods with multiple baseline methods including query likelihood and the state-of-the-art convolutional neural network based method, using an answer-annotated version of the TREC GOV2 collection. Results show that features used previously to retrieve topical sentences and factoid answer sentences are not sufficient for retrieving answer sentences for non-factoid queries, but with semantic and context features, we can significantly outperform the baseline methods.

exploiting semantic annotations in information retrieval | 2015

Harnessing Semantics for Answer Sentence Retrieval

Ruey-Cheng Chen; Damiano Spina; W. Bruce Croft; Mark Sanderson; Falk Scholer

Finding answer passages from the Web is a challenging task. One major difficulty is to retrieve sentences that may not have many terms in common with the question. In this paper, we experiment with two semantic approaches for finding non-factoid answers using a learning-to-rank retrieval setting. We show that using semantic representations learned from external resources such as Wikipedia or Google News may substantially improve the quality of top-ranked retrieved answers.

international acm sigir conference on research and development in information retrieval | 2017

Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval

Ruey-Cheng Chen; Luke Gallagher; Roi Blanco; J. Shane Culpepper

Complex machine learning models are now an integral part of modern, large-scale retrieval systems. However, collection size growth continues to outpace advances in efficiency improvements in the learning models which achieve the highest effectiveness. In this paper, we re-examine the importance of tightly integrating feature costs into multi-stage learning-to-rank (LTR) IR systems. We present a novel approach to optimizing cascaded ranking models which can directly leverage a variety of different state-of-the-art LTR rankers such as LambdaMART and Gradient Boosted Decision Trees. Using our cascade model, we conclusively show that feature costs and the number of documents being re-ranked in each stage of the cascade can be balanced to maximize both efficiency and effectiveness. Finally, we also demonstrate that our cascade model can easily be deployed on commonly used collections to achieve state-of-the-art effectiveness results while only using a subset of the features required by the full model.

international conference on the theory of information retrieval | 2015

On Divergence Measures and Static Index Pruning

Ruey-Cheng Chen; Chia-Jung Lee; W. Bruce Croft

We study the problem of static index pruning in a renowned divergence minimization framework, using a range of divergence measures such as f-divergence and Rényi divergence as the objective. We show that many well-known divergence measures are convex in pruning decisions, and therefore can be exactly minimized using an efficient algorithm. Our approach allows postings be prioritized according to the amount of information they contribute to the index, and through specifying a different divergence measure the contribution is modeled on a different returns curve. In our experiment on GOV2 data, Rényi divergence of order infinity appears the most effective. This divergence measure significantly outperforms many standard methods and achieves identical retrieval effectiveness as full data using only 50% of the postings. When top-k precision is of the only concern, 10% of the data is sufficient to achieve the accuracy that one would usually expect from a full index.

IEEE Transactions on Knowledge and Data Engineering | 2018

Document Summarization for Answering Non-Factoid Queries

Evi Yulianti; Ruey-Cheng Chen; Falk Scholer; W. Bruce Croft; Mark Sanderson

We formulate a document summarization method to extract passage-level answers for non-factoid queries, referred to as answer-biased summaries. We propose to use external information from related Community Question Answering (CQA) content to better identify answer bearing sentences. Three optimization-based methods are proposed: (i) query-biased, (ii) CQA-answer-biased, and (iii) expanded-query-biased, where expansion terms were derived from related CQA content. A learning-to-rank-based method is also proposed that incorporates a feature extracted from related CQA content. Our results show that even if a CQA answer does not contain a perfect answer to a query, their content can be exploited to improve the extraction of answer-biased summaries from other corpora. The quality of CQA content is found to impact on the accuracy of optimization-based summaries, though medium quality answers enable the system to achieve a comparable (and in some cases superior) accuracy to state-of-the-art techniques. The learning-to-rank-based summaries, on the other hand, are not significantly influenced by CQA quality. We provide a recommendation of the best use of our proposed approaches in regard to the availability of different quality levels of related CQA content. As a further investigation, the reliability of our approaches was tested on another publicly available dataset.

international acm sigir conference on research and development in information retrieval | 2017

On the Benefit of Incorporating External Features in a Neural Architecture for Answer Sentence Selection

Ruey-Cheng Chen; Evi Yulianti; Mark Sanderson; W. Bruce Croft

Incorporating conventional, unsupervised features into a neural architecture has the potential to improve modeling effectiveness, but this aspect is often overlooked in the research of deep learning models for information retrieval. We investigate this incorporation in the context of answer sentence selection, and show that combining a set of query matching, readability, and query focus features into a simple convolutional neural network can lead to markedly increased effectiveness. Our results on two standard question-answering datasets show the effectiveness of the combined model.

international acm sigir conference on research and development in information retrieval | 2018

Ranking Documents by Answer-Passage Quality

Evi Yulianti; Ruey-Cheng Chen; Falk Scholer; W. Bruce Croft; Mark Sanderson

Evidence derived from passages that closely represent likely answers to a posed query can be useful input to the ranking process. Based on a novel use of Community Question Answering data, we present an approach for the creation of such passages. A general framework for extracting answer passages and estimating their quality is proposed, and this evidence is integrated into ranking models. Our experiments on two web collections show that such quality estimates from answer passages provide a strong indication of document relevance and compare favorably to previous passage-based methods. Combining such evidence can significantly improve over a set of state-of-the-art ranking models, including Quality-Biased Ranking, External Expansion, and a combination of both. A final ranking model that incorporates all quality estimates achieves further improvements on both collections.

conference on information and knowledge management | 2017

An Empirical Analysis of Pruning Techniques: Performance, Retrievability and Bias

Ruey-Cheng Chen; Leif Azzopardi; Falk Scholer

Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relation between retrieval performance and retrieval bias. While various factors influencing retrievability have been examined, showing how the retrieval model may influence bias, no prior work has examined the impact of the index (and how it is optimized) on retrieval bias. Intuitively, how the documents are represented, and what terms they contain, will influence whether they are retrievable or not. In this paper, we investigate how the retrieval bias of a system changes as the inverted index is optimized for efficiency through static index pruning. In our analysis, we consider four pruning methods and examine how they affect performance and bias on the TREC GOV2 Collection. Our results show that the relationship between these factors is varied and complex - and very much dependent on the pruning algorithm. We find that more pruning results in relatively little change or a slight decrease in bias up to a point, and then a dramatic increase. The increase in bias corresponds to a sharp decrease in early precision such as NDCG@10 and is also indicative of a large decrease in MAP. The findings suggest that the impact of pruning algorithms can be quite varied - but retrieval bias could be used to guide the pruning process. Further work is required to determine precisely which documents are most affected and how this impacts upon performance.

australasian document computing symposium | 2016

Using Semantic and Context Features for Answer Summary Extraction

Evi Yulianti; Ruey-Cheng Chen; Falk Scholer; Mark Sanderson

We investigate the effectiveness of using semantic and context features for extracting document summaries that are designed to contain answers for non-factoid queries. The summarization methods are compared against state-of-the-art factoid question answering and query-biased summarization techniques. The accuracy of generated answer summaries are evaluated using ROUGE as well as sentence ranking measures, and the relationship between these measures are further analyzed. The results show that semantic and context features give significant improvement to the state-of-the-art techniques.

text retrieval conference | 2015