Giridhar Kumaran | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Giridhar Kumaran is active.

Explore More

Publication

Featured researches published by Giridhar Kumaran.

international acm sigir conference on research and development in information retrieval | 2009

Reducing long queries using query quality predictors

Giridhar Kumaran; Vitor R. Carvalho

Long queries frequently contain many extraneous terms that hinder retrieval of relevant documents. We present techniques to reduce long queries to more effective shorter ones that lack those extraneous terms. Our work is motivated by the observation that perfectly reducing long TREC description queries can lead to an average improvement of 30% in mean average precision. Our approach involves transforming the reduction problem into a problem of learning to rank all sub-sets of the original query (sub-queries) based on their predicted quality, and selecting the top sub-query. We use various measures of query quality described in the literature as features to represent sub-queries, and train a classifier. Replacing the original long query with the top-ranked sub-query chosen by the ranker results in a statistically significant average improvement of 8% on our test sets. Analysis of the results shows that query reduction is well-suited for moderately-performing long queries, and a small set of query quality predictors are well-suited for the task of ranking sub-queries.

international acm sigir conference on research and development in information retrieval | 2010

Exploring reductions for long web queries

Niranjan Balasubramanian; Giridhar Kumaran; Vitor R. Carvalho

Long queries form a difficult, but increasingly important segment for web search engines. Query reduction, a technique for dropping unnecessary query terms from long queries, improves performance of ad-hoc retrieval on TREC collections. Also, it has great potential for improving long web queries (upto 25% improvement in NDCG@5). However, query reduction on the web is hampered by the lack of accurate query performance predictors and the constraints imposed by search engine architectures and ranking algorithms. In this paper, we present query reduction techniques for long web queries that leverage effective and efficient query performance predictors. We propose three learning formulations that combine these predictors to perform automatic query reduction. These formulations enable trading of average improvements for the number of queries impacted, and enable easy integration into the search engines architecture for rank-time query reduction. Experiments on a large collection of long queries issued to a commercial search engine show that the proposed techniques significantly outperform baselines, with more than 12% improvement in NDCG@5 in the impacted set of queries. Extension to the formulations such as result interleaving further improves results. We find that the proposed techniques deliver consistent retrieval gains where it matters most: poorly performing long web queries.

international acm sigir conference on research and development in information retrieval | 2008

Effective and efficient user interaction for long queries

Giridhar Kumaran; James Allan

Handling long queries can involve either pruning the query to retain only the important terms (reduction), or expanding the query to include related concepts (expansion). While automatic techniques to do so exist, roughly 25% performance improvements in terms of MAP have been realized in past work through interactive variants. We show that selectively reducing or expanding a query leads to an average improvement of 51% in MAP over the baseline for standard TREC test collections. We demonstrate how user interaction can be used to achieve this improvement. Most interaction techniques present users with a fixed number of options for all queries. We achieve improvements by interacting less with the user, i.e., we present techniques to identify the optimal number of options to present to users, resulting in an interface with an average of 70% fewer options to consider. Previous algorithms supporting interactive reduction and expansion are exponential in nature. To extend their utility to operational environments, we present techniques to make the complexity of the algorithms polynomial. We finally present an analysis of long queries that continue to exhibit poor performance in spite of our new techniques.

empirical methods in natural language processing | 2005

Using Names and Topics for New Event Detection

Giridhar Kumaran; James Allan

New Event Detection (NED) involves monitoring chronologically-ordered news streams to automatically detect the stories that report on new events. We compare two stories by finding three cosine similarities based on names, topics and the full text. These additional comparisons suggest treating the NED problem as a binary classification problem with the comparison scores serving as features. The classifier models we learned show statistically significant improvement over the baseline vector space model system on all the collections we tested, including the latest TDT5 collection.The presence of automatic speech recognizer (ASR) output of broadcast news in news streams can reduce performance and render our named entity recognition based approaches ineffective. We provide a solution to this problem achieving statistically significant improvements.

ACM Transactions on The Web | 2010

Mining Historic Query Trails to Label Long and Rare Search Engine Queries

Peter Bailey; Ryen W. White; Han Liu; Giridhar Kumaran

Web search engines can perform poorly for long queries (i.e., those containing four or more terms), in part because of their high level of query specificity. The automatic assignment of labels to long queries can capture aspects of a user’s search intent that may not be apparent from the terms in the query. This affords search result matching or reranking based on queries and labels rather than the query text alone. Query labels can be derived from interaction logs generated from many users’ search result clicks or from query trails comprising the chain of URLs visited following query submission. However, since long queries are typically rare, they are difficult to label in this way because little or no historic log data exists for them. A subset of these queries may be amenable to labeling by detecting similarities between parts of a long and rare query and the queries which appear in logs. In this article, we present the comparison of four similarity algorithms for the automatic assignment of Open Directory Project category labels to long and rare queries, based solely on matching against similar satisfied query trails extracted from log data. Our findings show that although the similarity-matching algorithms we investigated have tradeoffs in terms of coverage and accuracy, one algorithm that bases similarity on a popular search result ranking function (effectively regarding potentially-similar queries as “documents”) outperforms the others. We find that it is possible to correctly predict the top label better than one in five times, even when no past query trail exactly matches the long and rare query. We show that these labels can be used to reorder top-ranked search results leading to a significant improvement in retrieval performance over baselines that do not utilize query labeling, but instead rank results using content-matching or click-through logs. The outcomes of our research have implications for search providers attempting to provide users with highly-relevant search results for long queries.

Information Processing and Management | 2008

Adapting information retrieval systems to user queries

Giridhar Kumaran; James Allan

Users enter queries that are short as well as long. The aim of this work is to evaluate techniques that can enable information retrieval (IR) systems to automatically adapt to perform better on such queries. By adaptation we refer to (1) modifications to the queries via user interaction, and (2) detecting that the original query is not a good candidate for modification. We show that the former has the potential to improve mean average precision (MAP) of long and short queries by 40% and 30% respectively, and that simple user interaction can help towards this goal. We observed that after inspecting the options presented to them, users frequently did not select any. We present techniques in this paper to determine beforehand the utility of user interaction to avoid this waste of time and effort. We show that our techniques can provide IR systems with the ability to detect and avoid interaction for unpromising queries without a significant drop in overall performance.

international acm sigir conference on research and development in information retrieval | 2010

Predicting query performance on the web

Niranjan Balasubramanian; Giridhar Kumaran; Vitor R. Carvalho

Predicting the performance of web queries is useful for several applications such as automatic query reformulation and automatic spell correction. In the web environment, accurate performance prediction is challenging because measures such as clarity that work well on homogeneous TREC-like collections, are not as effective and are often expensive to compute. We present Rank-time Performance Prediction (RAPP), an effective and efficient approach for online performance prediction on the web. RAPP uses retrieval scores, and aggregates of the rank-time features used by the document- ranking algorithm to train regressors for query performance prediction. On a set of over 12,000 queries sampled from the query logs of a major search engine, RAPP achieves a linear correlation of 0.78 with DCG@5, and 0.52 with NDCG@5. Analysis of prediction accuracy shows that hard queries are easier to identify while easy queries are harder to identify.

conference on information and knowledge management | 2007

Selective user interaction

Giridhar Kumaran; James Allan

Query expansion [12] refers to the process of including related terms in the original query to produce expanded queries, while query relaxation [8] refers to the dropping or down-weighting of terms from the original query to produce sub-queries. The automatic versions of both query expansion (AQE) and query relaxation (AQR) are known to fail in a large fraction of queries, and overall (average) improvements in performance can be attributed to high gains on a smaller fraction [7]. The potential to address the mistakes made by automatic techniques by involving the user [6] motivates interactive versions of these techniques (IQE, IQR). Previous research has shown that involving users in selection [4, 5, 10, 1] or rejection of terms or sets of terms [8] suggested by an automatic method has the potential to further improve performance. However, the same problems that plague automatic techniques are prevalent in interactive techniques: i.e. user interaction has the potential to lead to improvements only for a subset of queries. Further, a second problem has generally been ignored: frequently none of the options selected by the automatic procedures and presented to the user are any better than the original query. In this paper we develop and present procedures for determining when to interact with a user to obtain explicit feedback in the IQR and IQE settings. We show that by using these procedures we can avoid interaction for almost 40% of TREC queries without compromising significant improvements over the baseline. We also develop procedures to rank queries by their potential for improvement through user interaction, enabling systems to interact with users working under time and cognitive load constraints.

international acm sigir conference on research and development in information retrieval | 2006

Simple questions to improve pseudo-relevance feedback results

Giridhar Kumaran; James Allan

We explore interactive methods to further improve the performance of pseudo-relevance feedback. Studies \citeria suggest that new methods for tackling difficult queries are required. Our approach is to gather more information about the query from the user by asking her simple questions. The equally simple responses are used to modify the original query. Our experiments using the TREC Robust Track queries show that we can obtain a significant improvement in mean average precision averaging around 5% over pseudo-relevance feedback. This improvement is also spread across more queries compared to ordinary pseudo-relevance feedback, as suggested by geometric mean average precision.

international acm sigir conference on research and development in information retrieval | 2003

Stemming in the language modeling framework

James Allan; Giridhar Kumaran

Stemming is the process of collapsing words into their morphological root. For example, the terms addicted, addicting, addictions, addictive, and addicts might be conflated to their stem, addict. Over the years, numerous studies [2, 3, 4] have considered stemming as an external process — either to be ignored or used as a pre-processing step. In this study, we try and provide a fresh perspective to stemming. We are motivated by the observation that stemming can be viewed as a form of smoothing, as a way of improving statistical estimates. This suggests that stemming could be directly incorporated into a language model, which is what we achieve in this paper. Detailed discussions are available in[1].

Explore More