Rajeev Rastogi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rajeev Rastogi is active.

Explore More

Publication

Featured researches published by Rajeev Rastogi.

conference on information and knowledge management | 2012

LogUCB: an explore-exploit algorithm for comments recommendation

Dhruv Mahajan; Rajeev Rastogi; Charu Tiwari; Adway Mitra

The highly dynamic nature of online commenting environments makes accurate ratings prediction for new comments challenging. In such a setting, in addition to exploiting comments with high predicted ratings, it is also critical to explore comments with high uncertainty in the predictions. In this paper, we propose a novel upper confidence bound (UCB) algorithm called LOGUCB that balances exploration with exploitation when the average rating of a comment is modeled using logistic regression on its features. At the core of our LOGUCB algorithm lies a novel variance approximation technique for the Bayesian logistic regression model that is used to compute the UCB value for each comment. In experiments with a real-life comments dataset from Yahoo! News, we show that LOGUCB with bag-of-words and topic features outperforms state-of-the-art explore-exploit algorithms.

conference on information and knowledge management | 2012

Matching product titles using web-based enrichment

Vishrawas Gopalakrishnan; Suresh Iyengar; Amit Madaan; Rajeev Rastogi; Srinivasan H. Sengamedu

Matching product titles from different data feeds that refer to the same underlying product entity is a key problem in online shopping. This matching problem is challenging because titles across the feeds have diverse representations with some missing important keywords like brand and others containing extraneous keywords related to product specifications. In this paper, we propose a novel unsupervised matching algorithm that leverages web earch engines to (1) enrich product titles by adding important missing tokens that occur frequently in search results, and (2) compute importance scores for tokens based on their ability to retrieve other (enriched title) tokens in search results. Our matching scheme calculates the Cosine similarity between enriched title pairs with tokens weighted by their importance scores. We propose an optimization that exploits the templatized structure of product titles to reduce the number of search queries. In experiments with real-life shopping datasets, we found that our matching algorithm has superior F1 scores compared to IDF-based cosine similarity.

conference on recommender systems | 2017

Recommending Product Sizes to Customers

Vivek Sembium; Rajeev Rastogi; Atul Saroop; Srujana Merugu

We propose a novel latent factor model for recommending product size fits {Small, Fit, Large} to customers. Latent factors for customers and products in our model correspond to their physical true size, and are learnt from past product purchase and returns data. The outcome for a customer, product pair is predicted based on the difference between customer and product true sizes, and efficient algorithms are proposed for computing customer and product true size values that minimize two loss function variants. In experiments with Amazon shoe datasets, we show that our latent factor models incorporating personas, and leveraging return codes show a 17-21% AUC improvement compared to baselines. In an online A/B test, our algorithms show an improvement of 0.49% in percentage of Fit transactions over control.

international world wide web conferences | 2018

Bayesian Models for Product Size Recommendations

Vivek Sembium; Rajeev Rastogi; Lavanya Sita Tekumalla; Atul Saroop

Lack of calibrated product sizing in popular categories such as apparel and shoes leads to customers purchasing incorrect sizes, which in turn results in high return rates due to fit issues. We address the problem of product size recommendations based on customer purchase and return data. We propose a novel approach based on Bayesian logit and probit regression models with ordinal categories Small, Fit, Largeto model size fits as a function of the difference between latent sizes of customers and products. We propose posterior computation based on mean-field variational inference, leveraging the Polya-Gamma augmentation for the logit prior, that results in simple updates, enabling our technique to efficiently handle large datasets. Our Bayesian approach effectively deals with issues arising from noise and sparsity in the data providing robust recommendations. Offline experiments with real-life shoe datasets show that our model outperforms the state-of-the-art in 5 of 6 datasets. and leads to an improvement of 17-26% in AUC over baselines when predicting size fit outcomes.

very large data bases | 2016

Machine learning in the real world

Vineet Chaoji; Rajeev Rastogi; Gourav Roy

Machine Learning (ML) has become a mature technology that is being applied to a wide range of business problems such as web search, online advertising, product recommendations, object recognition, and so on. As a result, it has become imperative for researchers and practitioners to have a fundamental understanding of ML concepts and practical knowledge of end-to-end modeling. This tutorial takes a hands-on approach to introducing the audience to machine learning. The first part of the tutorial gives a broad overview and discusses some of the key concepts within machine learning. The second part of the tutorial takes the audience through the end-to-end modeling pipeline for a real-world income prediction problem.

Data Stream Management | 2016

Data Stream Management: A Brave New World

Minos N. Garofalakis; Johannes Gehrke; Rajeev Rastogi

Traditional data-management systems software is built on the concept of persistent data sets that are stored reliably in stable storage and queried/updated several times throughout their lifetime. For several emerging application domains, however, data arrives and needs to be processed on a continuous basis, without the benefit of several passes over a static, persistent data image. Such continuous data streams arise naturally, for instance telecom and IP network monitoring. This volume focuses on the theory and practice of data stream management, and the difficult, novel challenges this emerging domain introduces for data-management systems. The collection of chapters (contributed by authorities in the field) offers a comprehensive introduction to both the algorithmic/theoretical foundations of data streams and the streaming systems/applications built in different domains. In the remainder of this introductory chapter, we provide a brief summary of some basic data streaming concepts and models, and discuss the key elements of a generic stream query processing architecture. We then give a short overview of the contents of this volume.

conference on information and knowledge management | 2018

A Scalable Algorithm for Higher-order Features Generation using MinHash

Pooja A; Naveen Nair; Rajeev Rastogi

Linear models have been widely used in the industry for their low computation time, small memory footprint and interpretability. However, linear models are not capable of leveraging non-linear feature interactions in predicting the target. This limits their performance. A classical approach to overcome this limitation is to use combinations of the original features, referred as higher-order features, to capture non-linearity. The number of higher-order features can be very large. Selecting the informative ones among them that are predictive of the target is essential for scalability. This is computationally expensive, requiring large memory footprint. In this paper, we propose a novel scalable MinHash based scheme to select informative higher-order features. Unlike typical use of MinHash for near-duplicate entity detection and association-rule mining, we use MinHash signature of features to approximate mutual information between higher-order features and target to enable their selection. By analyzing the running time and memory requirements, we show that our proposal is highly efficient in terms of running time and storage compared to existing alternatives. We demonstrate through experiments on multiple benchmark datasets that our proposed approach is not only scalable, but also able to identify the most important feature interactions resulting in improved model performance.

acm/ieee international conference on mobile computing and networking | 2018

MobiCom'18 Panel: Hammer & Nail vis-a-vis AI / ML Applications to Networked Systems

Pravin Bhagwat; Andrea J. Goldsmith; Manish Gupta; Rajeev Rastogi; Gautam Shroff

Artificial Intelligence (AI) and Machine Learning (ML) approaches, well known from IT disciplines, are beginning to excite the networking and networked systems community. Of late, we are seeing a huge excitement about applying AI and ML to networked systems. Is this merely a hype? Are there use cases and genuine applications that could lead to real deployment and practical solutions? What are the key challenges in applying AI and ML to networked systems? Can researchers and practitioners in communication networks and networked systems tap into machine learning and AI techniques to optimize network architecture, control and management, leading to increased automation in network operations? Can researchers and practitioners in the AI community explore synergy with networking researchers to optimize network architecture and design? The above are some of the questions that would be addressed during the panel discussion. The objective of the panel discussion would be to tap the minds of the global experts in order to understand the merits and limitations and the future landscape in the intersection of networking/networked systems and AI/ML.

conference on information and knowledge management | 2017

Machine Learning @ Amazon

Rajeev Rastogi

In this talk, I will first provide an overview of key problem areas where we are applying Machine Learning (ML) techniques within Amazon such as product demand forecasting, product search, and information extraction from reviews, and associated technical challenges. I will then talk about three specific applications where we use a variety of methods to learn semantically rich representations of data: question answering where we use deep learning techniques, product size recommendations where we use probabilistic models, and fake reviews detection where we use tensor factorization algorithms. I will point out the computing challenges associated with these applications and how parallelism can be exploited to scale to large datasets.

Data Stream Management | 2016

Conclusions and Looking Forward

Minos N. Garofalakis; Johannes Gehrke; Rajeev Rastogi

Today, data streaming is a part of the mainstream and several data steaming products are now publicly available. Data streaming algorithms are powering complex event processing, predictive analytics, and big data applications in the cloud. In this final chapter, we provide an overview of current data streaming products, and applications of data streaming to cloud computing, anomaly detection and predictive modeling. We also identify future research directions for mining and doing predictive analytics on data streams, especially in a distributed environment.

Explore More