Georges Dupret
Yahoo!
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Georges Dupret.
international acm sigir conference on research and development in information retrieval | 2008
Georges Dupret; Benjamin Piwowarski
Search engine click logs provide an invaluable source of relevance information but this information is biased because we ignore which documents from the result list the users have actually seen before and after they clicked. Otherwise, we could estimate document relevance by simple counting. In this paper, we propose a set of assumptions on user browsing behavior that allows the estimation of the probability that a document is seen, thereby providing an unbiased estimate of document relevance. To train, test and compare our model to the best alternatives described in the Literature, we gather a large set of real data and proceed to an extensive cross-validation experiment. Our solution outperforms very significantly all previous models. As a side effect, we gain insight into the browsing behavior of users and we can compare it to the conclusions of an eye-tracking experiments by Joachims et al. [12]. In particular, our findings confirm that a user almost always see the document directly after a clicked document. They also explain why documents situated just after a very relevant document are clicked more often.
international conference on user modeling adaptation and personalization | 2012
Janette Lehmann; Mounia Lalmas; Elad Yom-Tov; Georges Dupret
Our research goal is to provide a better understanding of how users engage with online services, and how to measure this engagement. We should not speak of one main approach to measure user engagement --- e.g. through one fixed set of metrics --- because engagement depends on the online services at hand. Instead, we should be talking of models of user engagement. As a first step, we analysed a number of online services, and show that it is possible to derive effectively simple models of user engagement, for example, accounting for user types and temporal aspects. This paper provides initial insights into engagement patterns, allowing for a better understanding of the important characteristics of how users repeatedly interact with a service or group of services.
latin american web congress | 2005
Ricardo A. Baeza-Yates; Carlos A. Hurtado; Marcelo Mendoza; Georges Dupret
Web usage mining is a main research area in Web mining focused on learning about Web users and their interactions with Web sites. Main challenges in Web usage mining are the application of data mining techniques to Web data in an efficient way and the discovery of non trivial user behaviour patterns. In this paper we focus the attention on search engines analyzing query log data and showing several models about how users search and how users use search engine results.
ACM Transactions on Information Systems | 2007
Benjamin Piwowarski; Patrick Gallinari; Georges Dupret
Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML or Web IR in which retrievable information units are document elements and/or sets of related documents. Part of the problem stems from the classical hypotheses on the user models: They do not take into account the structural or logical context of document elements or the possibility of navigation between units. This article proposes an explicit and formal user model that encompasses a large variety of user behaviors. Based on this model, we extend the probabilistic precision-recall metric to deal with the new IR paradigms.
web search and data mining | 2009
Benjamin Piwowarski; Georges Dupret; Rosie Jones
Mining user web search activity potentially has a broad range of applications including web result pre-fetching, automatic search query reformulation, click spam detection, estimation of document relevance and prediction of user satisfaction. This analysis is difficult because the data recorded by search engines while users interact with them, although abundant, is very noisy. In this work, we explore the utility of mining search behavior of users, represented by observed variables including the time the user spends on the page, and whether the user reformulated his or her query. As a case study, we examine the contribution this data makes to predicting the relevance of a document in the absence of document content models. To this end, we first propose a method for grouping the interactions of a particular user according to the different tasks he or she undertakes. With each task corresponding to a distinct information need, we then propose a Bayesian Network to holistically model these interactions. The aim is to identify distinct patterns of search behaviors. Finally, we join these patterns to a list of custom features and we use gradient boosted decision trees to predict the relevance of a set of query document pairs for which we have relevance assessments. The experimental results confirm the potential of our model, with significant improvements in precision for predicting the relevance of documents based on a model of the users search and click behavior, over a baseline model using only click and query features, with no Bayesian Network input.
international acm sigir conference on research and development in information retrieval | 2006
Benjamin Piwowarski; Georges Dupret
Standard Information Retrieval (IR) metrics assume a simple model where documents are understood as independent units. Such an assumption is not adapted to new paradigms like XML or Web IR where retrievable informations are parts of documents or sets of related documents. Moreover, classical hypotheses assumes that the user ignores the structural or logical context of document elements and hence the possibility of navigation between units. EPRUM is a generalisation of Precision-Recall (PR) that aims at allowing the user to navigate or browse in the corpus structure. Like the Cumulated Gain metrics, it is able to handle continuous valued relevance. We apply and compare EPRUM in the context of XML Retrieval -- a very active field for evaluation metrics. We also explain how EPRUM can be used in other IR paradigms.
conference on information and knowledge management | 2013
Janette Lehmann; Mounia Lalmas; Georges Dupret; Ricardo A. Baeza-Yates
Users often access and re-access more than one site during an online session, effectively engaging in multitasking. In this paper, we study the effect of online multitasking on two widely used engagement metrics designed to capture users browsing behavior with a site. Our study is based on browsing data of 2.5M users across 760 sites encompassing diverse types of services such as social media, news and mail. To account for multitasking we need to redefine how user sessions are represented and we need to adapt the metrics under study. We introduce a new representation of user sessions: tree-streams -- as opposed to the commonly used click-streams -- present a more accurate picture of the browsing behavior of a user that includes how users switch between sites (e.g., hyperlinking, teleporting, backpaging). We then discuss a number of insights on multitasking patterns, and show how these help to better understand how users engage with sites. Finally, we define metrics that characterize multitasking during online sessions and show how they provide additional insights to standard engagement metrics.
ifip world computer congress wcc | 2006
Georges Dupret; Marcelo Mendoza
We present a method to help a user redefine a query suggesting a list of similar queries. The method proposed is based on click-through data were sets of similar queries could be identified. Scientific literature shows that similar queries are useful for the identification of different information needs behind a query. Unlike most previous work, in this paper we are focused on the discovery of better queries rather than related queries. We will show with experiments over real data that the identification of better queries is useful for query disambiguation and query specialization.
string processing and information retrieval | 2006
Georges Dupret; Benjamin Piwowarski; Carlos A. Hurtado; Marcelo Mendoza
Query logs record past query sessions across a time span. A statistical model is proposed to explain the log generation process. Within a search engine list of results, the model explains the document selection – a user’s click – by taking into account both a document position and its popularity. We show that it is possible to quantify this influence and consequently estimate document “un-biased” popularities. Among other applications, this allows to re-order the result list to match more closely user preferences and to use the logs as a feedback to improve search engines.
international conference on big data | 2013
Elad Yom-Tov; Mounia Lalmas; Ricardo A. Baeza-Yates; Georges Dupret; Janette Lehmann; Pinar Donmez
Many large online providers offer a variety of content sites (e.g. news, sport, e-commerce). These providers endeavor to keep users accessing and interacting with their sites, that is to engage users by spending time using their sites and to return regularly to them. They do so by serving users the most relevant content in an attractive and enticing manner. Due to their highly varied content, each site is usually studied and optimized separately. However, these online providers aim not only to engage users with individual sites, but across all sites in their network. In these cases, site engagement should be examined not only within individual sites, but also across the entire content provider network. This paper investigates intersite engagement, that is, site engagement within a network of sites, by defining a global measure of engagement that captures the effect sites have on the engagement on other sites. As an application, we look at the effect of web page layout and structure, which we refer to as web page stylistics, on intersite engagement on Yahoo! properties. Through the analysis of 50 popular Yahoo! sites and a sample of 265,000 users and 19.4M online sessions, we demonstrate that the stylistic components of a web page on a site can be used to predict inter-site engagement across the Yahoo! network of sites. Intersite engagement is a new big data problem as overall it implies analyzing dozen of sites visited by hundreds of millions of people generating billions of sessions.