Deepak Agarwal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Deepak Agarwal is active.

Explore More

Publication

Featured researches published by Deepak Agarwal.

knowledge discovery and data mining | 2009

Regression-based latent factor models

Deepak Agarwal; Bee-Chung Chen

We propose a novel latent factor model to accurately predict response for large scale dyadic data in the presence of features. Our approach is based on a model that predicts response as a multiplicative function of row and column latent factors that are estimated through separate regressions on known row and column features. In fact, our model provides a single unified framework to address both cold and warm start scenarios that are commonplace in practical applications like recommender systems, online advertising, web search, etc. We provide scalable and accurate model fitting methods based on Iterated Conditional Mode and Monte Carlo EM algorithms. We show our model induces a stochastic process on the dyadic space with kernel (covariance) given by a polynomial function of features. Methods that generalize our procedure to estimate factors in an online fashion for dynamic applications are also considered. Our method is illustrated on benchmark datasets and a novel content recommendation application that arises in the context of Yahoo! Front Page. We report significant improvements over several commonly used methods on all datasets.

international world wide web conferences | 2008

Contextual advertising by combining relevance with click feedback

Deepayan Chakrabarti; Deepak Agarwal; Vanja Josifovski

Contextual advertising supports much of the Webs ecosystem today. User experience and revenue (shared by the site publisher and the ad network) depend on the relevance of the displayed ads to the page content. As with other document retrieval systems, relevance is provided by scoring the match between individual ads (documents) and the content of the page where the ads are shown (query). In this paper we show how this match can be improved significantly by augmenting the ad-page scoring function with extra parameters from a logistic regression model on the words in the pages and ads. A key property of the proposed model is that it can be mapped to standard cosine similarity matching and is suitable for efficient and scalable implementation over inverted indexes. The model parameter values are learnt from logs containing ad impressions and clicks, with shrinkage estimators being used to combat sparsity. To scale our computations to train on an extremely large training corpus consisting of several gigabytes of data, we parallelize our fitting algorithm in a Hadoop framework [10]. Experimental evaluation is provided showing improved click prediction over a holdout set of impression and click events from a large scale real-world ad placement engine. Our best model achieves a 25% lift in precision relative to a traditional information retrieval model which is based on cosine similarity, for recalling 10% of the clicks in our test data.

international conference on machine learning | 2007

Multi-armed bandit problems with dependent arms

Sandeep Pandey; Deepayan Chakrabarti; Deepak Agarwal

We provide a framework to exploit dependencies among arms in multi-armed bandit problems, when the dependencies are in the form of a generative model on clusters of arms. We find an optimal MDP-based policy for the discounted reward case, and also give an approximation of it with formal error guarantee. We discuss lower bounds on regret in the undiscounted reward scenario, and propose a general two-level bandit policy for it. We propose three different instantiations of our general policy and provide theoretical justifications of how the regret of the instantiated policies depend on the characteristics of the clusters. Finally, we empirically demonstrate the efficacy of our policies on large-scale real-world and synthetic data, and show that they significantly outperform classical policies designed for bandits with independent arms.

international world wide web conferences | 2009

Spatio-temporal models for estimating click-through rate

Deepak Agarwal; Bee-Chung Chen; Pradheep Elango

We propose novel spatio-temporal models to estimate click-through rates in the context of content recommendation. We track article CTR at a fixed location over time through a dynamic Gamma-Poisson model and combine information from correlated locations through dynamic linear regressions, significantly improving on per-location model. Our models adjust for user fatigue through an exponential tilt to the first-view CTR (probability of click on first article exposure) that is based only on user-specific repeat-exposure features. We illustrate our approach on data obtained from a module (Today Module) published regularly on Yahoo! Front Page and demonstrate significant improvement over commonly used baseline methods. Large scale simulation experiments to study the performance of our models under different scenarios provide encouraging results. Throughout, all modeling assumptions are validated via rigorous exploratory data analysis.

knowledge discovery and data mining | 2011

Response prediction using collaborative filtering with hierarchies and side-information

Aditya Krishna Menon; Krishna Prasad Chitrapura; Sachin Garg; Deepak Agarwal; Nagaraj Kota

In online advertising, response prediction is the problem of estimating the probability that an advertisement is clicked when displayed on a content publishers webpage. In this paper, we show how response prediction can be viewed as a problem of matrix completion, and propose to solve it using matrix factorization techniques from collaborative filtering (CF). We point out the two crucial differences between standard CF problems and response prediction, namely the requirement of predicting probabilities rather than scores, and the issue of confidence in matrix entries. We address these issues using a matrix factorization analogue of logistic regression, and by applying a principled confidence-weighting scheme to its objective. We show how this factorization can be seamlessly combined with explicit features or side-information for pages and ads, which let us combine the benefits of both approaches. Finally, we combat the extreme sparsity of response prediction data by incorporating hierarchical information about the pages and ads into our factorization model. Experiments on three very large real-world datasets show that our model outperforms current state-of-the-art methods for response prediction.

conference on recommender systems | 2009

A spatio-temporal approach to collaborative filtering

Zhengdong Lu; Deepak Agarwal; Inderjit S. Dhillon

In this paper, we propose a novel spatio-temporal model for collaborative filtering applications. Our model is based on low-rank matrix factorization that uses a spatio-temporal filtering approach to estimate user and item factors. The spatial component regularizes the factors by exploiting correlation across users and/or items, modeled as a function of some implicit feedback (e.g., who rated what) and/or some side information (e.g., user demographics, browsing history). In particular, we incorporate correlation in factors through a Markov random field prior in a probabilistic framework, whereby the neighborhood weights are functions of user and item covariates. The temporal component ensures that the user/item factors adapt to process changes that occur through time and is implemented in a state space framework with fast estimation through Kalman filtering. Our spatio-temporal filtering (ST-KF hereafter) approach provides a single joint model to simultaneously incorporate both spatial and temporal structure in ratings and therefore provides an accurate method to predict future ratings. To ensure scalability of ST-KF, we employ a mean-field approximation for inference. Incorporating user/item covariates in estimating neighborhood weights also helps in dealing with both cold-start and warm-start problems seamlessly in a single unified modeling framework; covariates predict factors for new users and items through the neighborhood. We illustrate our method on simulated data, benchmark data and data obtained from a relatively new recommender system application arising in the context of Yahoo! Front Page.

knowledge discovery and data mining | 2007

Estimating rates of rare events at multiple resolutions

Deepak Agarwal; Andrei Z. Broder; Deepayan Chakrabarti; Dejan Diklic; Vanja Josifovski; Mayssam Sayyadian

We consider the problem of estimating occurrence rates of rare eventsfor extremely sparse data, using pre-existing hierarchies to perform inference at multiple resolutions. In particular, we focus on the problem of estimating click rates for (webpage, advertisement) pairs (called impressions) where both the pages and the ads are classified into hierarchies that capture broad contextual information at different levels of granularity. Typically the click rates are low and the coverage of the hierarchies is sparse. To overcome these difficulties we devise a sampling method whereby we analyze aspecially chosen sample of pages in the training set, and then estimate click rates using a two-stage model. The first stage imputes the number of (webpage, ad) pairs at all resolutions of the hierarchy to adjust for the sampling bias. The second stage estimates clickrates at all resolutions after incorporating correlations among sibling nodes through a tree-structured Markov model. Both models are scalable and suited to large scale data mining applications. On a real-world dataset consisting of 1/2 billion impressions, we demonstrate that even with 95% negative (non-clicked) events in the training set, our method can effectively discriminate extremely rare events in terms of their click propensity.

knowledge discovery and data mining | 2011

Localized factor models for multi-context recommendation

Deepak Agarwal; Bee-Chung Chen; Bo Long

Combining correlated information from multiple contexts can significantly improve predictive accuracy in recommender problems. Such information from multiple contexts is often available in the form of several incomplete matrices spanning a set of entities like users, items, features, and so on. Existing methods simultaneously factorize these matrices by sharing a single set of factors for entities across all contexts. We show that such a strategy may introduce significant bias in estimates and propose a new model that ameliorates this issue by positing local, context-specific factors for entities. To avoid over-fitting in contexts with sparse data, the local factors are connected through a shared global model. This sharing of parameters allows information to flow across contexts through multivariate regressions among local factors, instead of enforcing exactly the same factors for an entity, everywhere. Model fitting is done in an EM framework, we show that the E-step can be fitted through a fast multi-resolution Kalman filter algorithm that ensures scalability. Experiments on benchmark and real-world Yahoo! datasets clearly illustrate the usefulness of our approach. Our model significantly improves predictive accuracy, especially in cold-start scenarios.

knowledge discovery and data mining | 2007

Efficient and effective explanation of change in hierarchical summaries

Deepak Agarwal; Dhiman Barman; Dimitrios Gunopulos; Neal E. Young; Flip Korn; Divesh Srivastava

Dimension attributes in data warehouses are typically hierarchical (e.g., geographic locations in sales data, URLs in Web traffic logs). OLAP tools are used to summarize the measure attributes (e.g., total sales) along a dimension hierarchy, and to characterize changes (e.g., trends and anomalies) in a hierarchical summary over time. When thenumber of changes identified is large (e.g., total sales in many stores differed from their expected values), a parsimonious explanation of the most significant changes is desirable. In this paper, we propose a natural model of parsimonious explanation, as a composition of node weights along the root-to-leaf paths in a dimension hierarchy, which permits changes to be aggregated with maximal generalization along the dimension hierarchy. We formalize this model of explaining changes in hierarchical summaries and investigate the problem of identifying optimally parsimonious explanations on arbitrary rooted one dimensional tree hierarchies. We show that such explanations can be computed efficiently in time essentially proportional to the number of leaves and the depth of the hierarchy. Further, our method can produce parsimonious explanations from the output of any statistical model that provides predictions and confidence intervals, making it widely applicable. Our experiments use real data sets to demonstrate the utility and robustness of our proposed model for explaining significant changes, as well as its superior parsimony compared to alternatives.

knowledge discovery and data mining | 2011

Click shaping to optimize multiple objectives

Deepak Agarwal; Bee-Chung Chen; Pradheep Elango; Xuanhui Wang

Recommending interesting content to engage users is important for web portals (e.g. AOL, MSN, Yahoo!, and many others). Existing approaches typically recommend articles to optimize for a single objective, i.e., number of clicks. However a click is only the starting point of a users journey and subsequent downstream utilities such as time-spent and revenue are important. In this paper, we call the problem of recommending links to jointly optimize for clicks and post-click downstream utilities click shaping. We propose a multi-objective programming approach in which multiple objectives are modeled in a constrained optimization framework. Such a formulation can naturally incorporate various application-driven requirements. We study several variants that model different requirements as constraints and discuss some of the subtleties involved. We conduct our experiments on a large dataset from a real system by using a newly proposed unbiased evaluation methodology [17]. Through extensive experiments we quantify the tradeoff between different objectives under various constraints. Our experimental results show interesting characteristics of different formulations and our findings may provide valuable guidance to the design of recommendation engines for web portals.

Explore More