Is this you? Create Your Porfile

Chia-Jung Lee

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chia-Jung Lee is active.

Explore More

Publication

Featured researches published by Chia-Jung Lee.

asia information retrieval symposium | 2009

Selecting Effective Terms for Query Formulation

Chia-Jung Lee; Yi-Chun Lin; Ruey-Cheng Chen; Pu-Jen Cheng

It is difficult for users to formulate appropriate queries for search. In this paper, we propose an approach to query term selection by measuring the effectiveness of a query term in IR systems based on its linguistic and statistical properties in document collections. Two query formulation algorithms are presented for improving IR performance. Experiments on NTCIR-4 and NTCIR-5 ad-hoc IR tasks demonstrate that the algorithms can significantly improve the retrieval performance by 9.2% averagely, compared to the performance of the original queries given in the benchmarks.

conference on information and knowledge management | 2015

An Optimization Framework for Merging Multiple Result Lists

Chia-Jung Lee; Qingyao Ai; W. Bruce Croft; Daniel Sheldon

Developing effective methods for fusing multiple ranked lists of documents is crucial to many applications. Federated web search, for instance, has become a common practice where a query is issued to different verticals and a single ranked list of blended results is created. While federated search is regarded as collection fusion, data fusion techniques aim at improving search coverage and precision by combining multiple search runs on a single document collection. In this paper, we study in depth and extend a neural network-based approach, LambdaMerge, for merging results of ranked lists drawn from one (i.e., data fusion) or more (i.e., collection fusion) verticals. The proposed model considers the impact of the quality of documents, ranked lists and verticals for producing the final merged result in an optimization framework. We further investigate the potential of incorporating deep structures into the model with an aim of determining better combinations of different evidence. In the experiments on collection fusion and data fusion, the proposed approach significantly outperforms several standard baselines and state-of-the-art learning-based approaches.

international acm sigir conference on research and development in information retrieval | 2013

An information-theoretic account of static index pruning

Ruey-Cheng Chen; Chia-Jung Lee

In this paper, we recast static index pruning as a model induction problem under the framework of Kullbacks principle of minimum cross-entropy. We show that static index pruning has an approximate analytical solution in the form of convex integer program. Further analysis on computation feasibility suggests that one of its surrogate model can be solved efficiently. This result has led to the rediscovery of \emph{uniform pruning}, a simple yet powerful pruning method proposed in 2001 and later easily ignored by many of us. To empirically verify this result, we conducted experiments under a new design in which prune ratio is strictly controlled. Our result on standard ad-hoc retrieval benchmarks has confirmed that uniform pruning is robust to high prune ratio and its performance is currently state of the art.

international acm sigir conference on research and development in information retrieval | 2010

To translate or not to translate

Chia-Jung Lee; Chin-Hui Chen; Shao-Hang Kao; Pu-Jen Cheng

Query translation is an important task in cross-language information retrieval (CLIR) aiming to translate queries into languages used in documents. The purpose of this paper is to investigate the necessity of translating query terms, which might differ from one term to another. Some untranslated terms cause irreparable performance drop while others do not. We propose an approach to estimate the translation probability of a query term, which helps decide if it should be translated or not. The approach learns regression and classification models based on a rich set of linguistic and statistical properties of the term. Experiments on NTCIR-4 and NTCIR-5 English-Chinese CLIR tasks demonstrate that the proposed approach can significantly improve CLIR performance. An in-depth analysis is also provided for discussing the impact of untranslated out-of-vocabulary (OOV) query terms and translation quality of non-OOV query terms on CLIR performance.

international acm sigir conference on research and development in information retrieval | 2014

Characterizing multi-click search behavior and the risks and opportunities of changing results during use

Chia-Jung Lee; Jaime Teevan; Sebastian de la Chica

Although searchers often click on more than one result following a query, little is known about how they interact with search results after their first click. Using large scale query log analysis, we characterize what people do when they return to a result page after having visited an initial result. We find that the initial click provides insight into the searchers subsequent behavior, with short initial dwell times suggesting more future interaction and later clicks occurring close in rank to the first. Although users think of a search result list as static, when people return to a result list following a click there is the opportunity for the list to change, potentially providing additional relevant content. Such change, however, can be confusing, leading to increased abandonment and slower subsequent clicks. We explore the risks and opportunities of changing search results during use, observing, for example, that when results change above a users initial click that user is less likely to find new content, whereas changes below correlate with increased subsequent interaction. Our results can be used to improve peoples search experience during the course of a single query by seamlessly providing new, more relevant content as the user interacts with a search result page, helping them find what they are looking for without having to issue a new query.

web search and data mining | 2012

Evaluating search in personal social media collections

Chia-Jung Lee; W. Bruce Croft; Jinyoung Kim

The prevalence of social media applications is generating potentially large personal archives of posts, tweets, and other communications. The existence of these archives creates a need for search tools, which can be seen as an extension of current desktop search services. Little is currently known about the best search techniques for personal archives of social data, because of the difficulty of creating test collections. In this paper, we describe how test collections for personal social data can be created by using games to collect queries. We then compare a range of retrieval models that exploit the semi-structured nature of social data. Our results show that a mixture of language models with field distribution estimation can be effective for this type of data, with certain fields, such as the name of the poster, being particularly important. We also analyze the properties of the queries that were generated by users with two versions of the games.

conference on information and knowledge management | 2012

Information preservation in static index pruning

Ruey-Cheng Chen; Chia-Jung Lee; Chiung-Min Tsai; Jieh Hsiang

We develop a new static index pruning criterion based on the notion of information preservation. This idea is motivated by the fact that model degeneration, as does static index pruning, inevitably reduces the predictive power of the resulting model. We model this loss in predictive power using conditional entropy and show that the decision in static index pruning can therefore be optimized to preserve information as much as possible. We evaluated the proposed approach on three different test corpora, and the result shows that our approach is comparable in retrieval performance to state-of-the-art methods. When efficiency is of concern, our method has some advantages over the reference methods and is therefore suggested in Web retrieval settings.

international conference on the theory of information retrieval | 2015

On Divergence Measures and Static Index Pruning

Ruey-Cheng Chen; Chia-Jung Lee; W. Bruce Croft

We study the problem of static index pruning in a renowned divergence minimization framework, using a range of divergence measures such as f-divergence and Rényi divergence as the objective. We show that many well-known divergence measures are convex in pruning decisions, and therefore can be exactly minimized using an efficient algorithm. Our approach allows postings be prioritized according to the amount of information they contribute to the index, and through specifying a different divergence measure the contribution is modeled on a different returns curve. In our experiment on GOV2 data, Rényi divergence of order infinity appears the most effective. This divergence measure significantly outperforms many standard methods and achieves identical retrieval effectiveness as full data using only 50% of the postings. When top-k precision is of the only concern, 10% of the data is sufficient to achieve the accuracy that one would usually expect from a full index.

international acm sigir conference on research and development in information retrieval | 2013

Building a web test collection using social media

Chia-Jung Lee; W. Bruce Croft

Community Question Answering (CQA) platforms contain a large number of questions and associated answers. Answerers sometimes include URLs as part of the answers to provide further information. This paper describes a novel way of building a test collection for web search by exploiting the link information from this type of social media data. We propose to build the test collection by regarding CQA questions as queries and the associated linked web pages as relevant documents. To evaluate this approach, we collect approximately ten thousand CQA queries, whose answers contained links to ClueWeb09 documents after spam filtering. Experimental results using this collection show that the relative effectiveness between different retrieval models on the ClueWeb-CQA query set is consistent with that on the TREC Web Track query sets, confirming the reliability of our test collection. Further analysis shows that the large number of queries generated through this approach compensates for the sparse relevance judgments in determining significant differences.

international acm sigir conference on research and development in information retrieval | 2015

Inter-Category Variation in Location Search

Chia-Jung Lee; Nick Craswell; Vanessa Murdock

When searching for place entities such as businesses or points of interest, the desired place may be close (finding the nearest ATM) or far away (finding a hotel in another city). Understanding the role of distance in predicting user interests can guide the design of location search and recommendation systems. We analyze a large dataset of location searches on GPS-enabled mobile devices with 15 location categories. We model user-location distance based on raw geographic distance (kilometers) and intervening opportunities (nth closest). Both models are helpful in predicting user interests, with the intervening opportunity model performing somewhat better. We find significant inter-category variation. For instance, the closest movie theater is selected in 17.7% of cases, while the closest restaurant in only 2.1% of cases. Overall, we recommend taking category information into account when modeling location preferences of users in search and recommendation systems.

Explore More