Rishabh Mehrotra
University College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rishabh Mehrotra.
international conference on the theory of information retrieval | 2015
Rishabh Mehrotra; Emine Yilmaz
Given the distinct preferences of different users while using search engines, search personalization has become an important problem in information retrieval. Most approaches to search personalization are based on identifying topics a user may be interested in and personalizing search results based on this information. While topical interests information of users can be highly valuable in personalizing search results and improving user experience, it ignores the fact that two different users that have similar topical interests may still be interested in achieving very different tasks with respect to this topic (e.g. the type of tasks a broker is likely to perform related to finance is likely to be very different than that of a regular investor). Hence, considering users topical interests jointly with the type of tasks they are likely to be interested in could result in better personalised We present an approach that uses search task information embedded in search logs to represent users by their actions over a task-space as well as over their topical-interest space. In particular, we describe a tensor based approach that represents each user in terms of (i) users topical interests and (ii) users search task behaviours in a coupled fashion and use these representations for personalization. Additionally, we also integrate users historic search behavior in a coupled matrix-tensor factorization framework to learn user representations. Through extensive evaluation via query recommendations and user cohort analysis, we demonstrate the value of considering topic specific task information while developing user models.
international world wide web conferences | 2015
Rishabh Mehrotra; Emine Yilmaz
Current search systems do not provide adequate support for users tackling complex tasks due to which the cognitive burden of keeping track of such tasks is placed on the searcher. As opposed to recent approaches to search task extraction, a more naturalistic viewpoint would involve viewing query logs as hierarchies of tasks with each search task being decomposed into more focussed sub-tasks. In this work, we propose an efficient Bayesian nonparametric model for extracting hierarchies of such tasks & subtasks. The proposed approach makes use of the multi-relational aspect of query associations which are important in identifying query-task associations. We describe a greedy agglomerative model selection algorithm based on the Gamma-Poisson conjugate mixture that take just one pass through the data to learn a fully probabilistic, hierarchical model of trees that is capable of learning trees with arbitrary branching structures as opposed to the more common binary structured trees. We evaluate our method based on real world query log data based on query term prediction. To the best of our knowledge, this work is the first to consider hierarchies of search tasks and subtasks.
international world wide web conferences | 2017
Rishabh Mehrotra; Ashton Anderson; Fernando Diaz; Amit Sharma; Hanna M. Wallach; Emine Yilmaz
Many online services, such as search engines, social media platforms, and digital marketplaces, are advertised as being available to any user, regardless of their age, gender, or other demographic factors. However, there are growing concerns that these services may systematically underserve some groups of users. In this paper, we present a framework for internally auditing such services for differences in user satisfaction across demographic groups, using search engines as a case study. We first explain the pitfalls of naively comparing the behavioral metrics that are commonly used to evaluate search engines. We then propose three methods for measuring latent differences in user satisfaction from observed differences in evaluation metrics. To develop these methods, we drew on ideas from the causal inference literature and the multilevel modeling literature. Our framework is broadly applicable to other online services, and provides general insight into interpreting their evaluation metrics.
conference on human information interaction and retrieval | 2016
Rishabh Mehrotra; Prasanta Bhattacharya; Emine Yilmaz
Multi-tasking within a single online search sessions is an increasingly popular phenomenon. In this work, we quantify multi-tasking behavior of web search users. Using insights from large-scale search logs, we seek to characterize user groups and search sessions with a focus on multi-task sessions. Our findings show that dual-task sessions are more prevalent than single-task sessions in online search, and that over 50\% of search sessions have more than 2 tasks. Further, we provide a method to categorize users into focused, multi-taskers or supertaskers depending on their level of task-multiplicity and show that the search effort expended by these users varies across the groups. The findings from this analysis provide useful insights about task-multiplicity in an online search environment and hold potential value for search engines that wish to personalize and support search experiences of users based on their task behavior.
international acm sigir conference on research and development in information retrieval | 2015
Rishabh Mehrotra; Emine Yilmaz
The performance of Learning to Rank algorithms strongly depend on the number of labelled queries in the training set, while the cost incurred in annotating a large number of queries with relevance judgements is prohibitively high. As a result, constructing such a training dataset involves selecting a set of candidate queries for labelling. In this work, we investigate query selection strategies for learning to rank aimed at actively selecting unlabelled queries to be labelled so as to minimize the data annotation cost. %total number of labelled queries -- without degrading the ranking performance. In particular, we characterize query selection based on two aspects of \emph{informativeness} and \emph{representativeness} and propose two novel query selection strategies (i) Permutation Probability based query selection and (ii) Topic Model based query selection which capture the two aspects, respectively. We further argue that an ideal query selection strategy should take into account both these aspects and as our final contribution, we present a submodular objective that couples both these aspects while selecting query subsets. We evaluate the quality of the proposed strategies on three real world learning to rank datasets and show that the proposed query selection methods results in significant performance gains compared to the existing state-of-the-art approaches.
international acm sigir conference on research and development in information retrieval | 2016
Rishabh Mehrotra; Prasanta Bhattacharya; Emine Yilmaz
While a major share of prior work have considered search sessions as the focal unit of analysis for seeking behavioral insights, search tasks are emerging as a competing perspective in this space. In the current work, we quantify user search task behavior for both single- as well as multi-task search sessions and relate it to tasks and topics. Specifically, we analyze user-disposition, topic and user-interest level heterogeneities that are prevalent in search task behavior. Our results show that while search multi-tasking is a common phenomenon among the search engine users, the extent and choice of multi-tasking topics vary significantly across users. We find that not only do users have varying propensities to multi-task, they also search for distinct topics across single-task and multi-task sessions. To our knowledge, this is among the first studies to fully characterize online search tasks with a focus on user- and topic-level differences that are observable from search sessions.
international acm sigir conference on research and development in information retrieval | 2017
Rishabh Mehrotra; Emine Yilmaz
A significant amount of search queries originate from some real world information need or tasks [13]. In order to improve the search experience of the end users, it is important to have accurate representations of tasks. As a result, significant amount of research has been devoted to extracting proper representations of tasks in order to enable search systems to help users complete their tasks, as well as providing the end user with better query suggestions [9], for better recommendations [41], for satisfaction prediction [36] and for improved personalization in terms of tasks [24, 38]. Most existing task extraction methodologies focus on representing tasks as flat structures. However, tasks often tend to have multiple subtasks associated with them and a more naturalistic representation of tasks would be in terms of a hierarchy, where each task can be composed of multiple (sub)tasks. To this end, we propose an efficient Bayesian nonparametric model for extracting hierarchies of such tasks & subtasks. We evaluate our method based on real world query log data both through quantitative and crowdsourced experiments and highlight the importance of considering task/subtask hierarchies.
international world wide web conferences | 2017
Rishabh Mehrotra; Ahmed El Kholy; Imez Zitouni; Milad Shokouhi; Ahmed Hassan
Search sessions have traditionally been considered as the focal unit of analysis for seeking behavioral insights from user interactions. While most session identification techniques have focused on the traditional web search setting; in this work, we instead consider user interactions with digital assistants (e.g. Cortana, Siri) and aim at identifying session boundary cut-offs. To our knowledge, this is one of the first studies investigating user interactions with a desktop based digital assistant. Historically, most user session identification strategies based on inactivity thresholds are either inherently arbitrary, or set at about 30 minutes. We postulate that such 30 minute thresholds may not be optimal for segregating user interactions with intelligent assistants into sessions. Instead, we model user-activity times as a Gaussian mixture model and look for evidence of a valley to identify optimal inter-activity thresholds for identifying sessions. Our results suggest a smaller threshold(
north american chapter of the association for computational linguistics | 2016
Rishabh Mehrotra; Prasanta Bhattacharya; Emine Yilmaz
\sim
conference on information and knowledge management | 2017
Rishabh Mehrotra; Emine Yilmaz
2 minutes) for session boundary cut-off in digital assistants than the traditionally used 30 minute threshold for web search engines.