Hema Raghavan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hema Raghavan is active.

Explore More

Publication

Featured researches published by Hema Raghavan.

international acm sigir conference on research and development in information retrieval | 2002

Using part-of-speech patterns to reduce query ambiguity

James Allan; Hema Raghavan

Query ambiguity is a generally recognized problem, particularly in Web environments where queries are commonly only one or two words in length. In this study, we explore one technique that finds commonly occurring patterns of parts of speech near a one-word query and allows them to be transformed into clarification questions. We use a technique derived from statistical language modeling to show that the clarification queries will reduce ambiguity much of the time, and often quite substantially.

web search and data mining | 2010

Improving ad relevance in sponsored search

Dustin Hillard; Stefan Schroedl; Eren Manavoglu; Hema Raghavan; Chirs Leggetter

We describe a machine learning approach for predicting sponsored search ad relevance. Our baseline model incorporates basic features of text overlap and we then extend the model to learn from past user clicks on advertisements. We present a novel approach using translation models to learn user click propensity from sparse click logs. Our relevance predictions are then applied to multiple sponsored search applications in both offline editorial evaluations and live online user tests. The predicted relevance score is used to improve the quality of the search page in three areas: filtering low quality ads, more accurate ranking for ads, and optimized page placement of ads to reduce prominent placement of low relevance ads. We show significant gains across all three tasks.

north american chapter of the association for computational linguistics | 2004

Using Soundex codes for indexing names in ASR documents

Hema Raghavan; James Allan

In this paper we highlight the problems that arise due to variations of spellings of names that occur in text, as a result of which links between two pieces of text where the same name is spelt differently may be missed. The problem is particularly pronounced in the case of ASR text. We propose the use of approximate string matching techniques to normalize names in order to overcome the problem. We show how we could achieve an improvement if we could tag names with reasonable accuracy in ASR.

conference on information and knowledge management | 2009

A collaborative filtering approach to ad recommendation using the query-ad click graph

Dustin Hillard; Sanjay Kshetramade; Hema Raghavan

Search engine logs contain a large amount of click-through data that can be leveraged as soft indicators of relevance. In this paper we address the sponsored search retrieval problem which is to find and rank relevant ads to a search query. We propose a new technique to determine the relevance of an ad document for a search query using click-through data. The method builds on a collaborative filtering approach to discover new ads related to a query using a click graph. It is implemented on a graph with several million edges and scales to larger sizes easily. The proposed method is compared to three different baselines that are state-of-the-art for a commercial search engine. Evaluations on editorial data indicate that the model discovers many new ads not retrieved by the baseline methods. The ads from the new approach are on average of better quality than the baselines.

Information Retrieval | 2011

The sum of its parts: reducing sparsity in click estimation with query segments

Dustin Hillard; Eren Manavoglu; Hema Raghavan; Chris Leggetter; Erick Cantu-Paz; Rukmini Iyer

The critical task of predicting clicks on search advertisements is typically addressed by learning from historical click data. When enough history is observed for a given query-ad pair, future clicks can be accurately modeled. However, based on the empirical distribution of queries, sufficient historical information is unavailable for many query-ad pairs. The sparsity of data for new and rare queries makes it difficult to accurately estimate clicks for a significant portion of typical search engine traffic. In this paper we provide analysis to motivate modeling approaches that can reduce the sparsity of the large space of user search queries. We then propose methods to improve click and relevance models for sponsored search by mining click behavior for partial user queries. We aggregate click history for individual query words, as well as for phrases extracted with a CRF model. The new models show significant improvement in clicks and revenue compared to state-of-the-art baselines trained on several months of query logs. Results are reported on live traffic of a commercial search engine, in addition to results from offline evaluation.

conference on information and knowledge management | 2010

Probabilistic first pass retrieval for search advertising: from theory to practice

Hema Raghavan; Rukmini Iyer

Information retrieval in search advertising, as in other ad-hoc retrieval tasks, aims to find the most appropriate ranking of the ad documents of a corpus for a given query. In addition to ranking the ad documents, we also need to filter or threshold irrelevant ads from participating in the auction to be displayed alongside search results. In this work, we describe our experience in implementing a successful ad retrieval system for a commercial search engine based on the Language Modeling (LM) framework for retrieval. The LM demonstrates significant performance improvements over the baseline vector space model (TF-IDF) system that was in production at the time. From a modeling perspective, we propose a novel approach to incorporate query segmentation and phrases in the LM framework, discuss impact of score normalization for relevance filtering, and present preliminary results of incorporating query expansions using query rewriting techniques. From an implementation perspective, we also discuss real-time latency constraints of a production search engine and how we overcome them by adapting the WAND algorithm to work with language models. In sum, our LM formulation is considerably better in terms of accuracy metrics such as Precision-Recall (10% improvement in AUC) and nDCG (8% improvement in nDCG@5) on editorial data and also demonstrates significant improvements in clicks in live user tests (0.787% improvement in Click Yield, with 8% coverage increase). Finally, we hope that this paper provides the reader with adequate insights into the challenges of building a system that serves millions of users every day.

empirical methods in natural language processing | 2005

Matching Inconsistently Spelled Names in Automatic Speech Recognizer Output for Information Retrieval

Hema Raghavan; James Allan

Many proper names are spelled inconsistently in speech recognizer output, posing a problem for applications where locating mentions of named entities is critical. We model the distortion in the spelling of a name due to the speech recognizer as the effect of a noisy channel. The models follow the framework of the IBM translation models. The model is trained using a parallel text of closed caption and automatic speech recognition output. We also test a string edit distance based method. The effectiveness of these models is evaluated on a name query retrieval task. Our methods result in a 60% improvement in F1. We also demonstrate why the problem has not been critical in TREC and TDT tasks.

Journal of Machine Learning Research | 2006