Damien Lefortier | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Damien Lefortier is active.

Explore More

Publication

Featured researches published by Damien Lefortier.

conference on information and knowledge management | 2014

Multileaved Comparisons for Fast Online Evaluation

Anne Schuth; Floor Sietsma; Shimon Whiteson; Damien Lefortier; Maarten de Rijke

Evaluation methods for information retrieval systems come in three types: offline evaluation, using static data sets annotated for relevance by human judges; user studies, usually conducted in a lab-based setting; and online evaluation, using implicit signals such as clicks from actual users. For the latter, preferences between rankers are typically inferred from implicit signals via interleaved comparison methods, which combine a pair of rankings and display the result to the user. We propose a new approach to online evaluation called multileaved comparisons that is useful in the prevalent case where designers are interested in the relative performance of more than two rankers. Rather than combining only a pair of rankings, multileaved comparisons combine an arbitrary number of rankings. The resulting user clicks then give feedback about how all these rankings compare to each other. We propose two specific multileaved comparison methods. The first, called team draft multileave, is an extension of team draft interleave. The second, called optimized multileave, is an extension of optimized interleave and is designed to handle cases where a large number of rankers must be multileaved. We present experimental results that demonstrate that both team draft multileave and optimized multileave can accurately determine all pairwise preferences among a set of rankers using far less data than the interleaving methods that they extend.

conference on information and knowledge management | 2014

Online Exploration for Detecting Shifts in Fresh Intent

Damien Lefortier; Pavel Serdyukov; Maarten de Rijke

In web search, recency ranking refers to the task of ranking documents while taking into account freshness as one of the criteria of their relevance. There are two approaches to recency ranking. One focuses on extending existing learning to rank algorithms to optimize for both freshness and relevance. The other relies on an aggregated search strategy: a (dedicated) fresh vertical is used and fresh results from this vertical are subsequently integrated into the search engine result page. In this paper, we adopt the second strategy. In particular, we focus on the fresh vertical prediction task for repeating queries and identify the following novel algorithmic problem: how to quickly correct fresh intent detection mistakes made by a state-of-the-art fresh intent detector, which erroneously detected or missed a fresh intent shift upwards for a particular repeating query (i.e., a change in the degree to which the query has a fresh intent). We propose a method for solving this problem. We use online exploration at the early start of what we believe to be a detected intent shift. Based on this exploratory phase, we correct fresh intent detection mistakes made by a state-of-that-art fresh intent detector for queries, whose fresh intent has shifted. Using query logs of Yandex, we demonstrate that our methods allow us to significantly improve the speed and quality of the detection of fresh intent shifts.

conference on information and knowledge management | 2013

Timely crawling of high-quality ephemeral new content

Damien Lefortier; Liudmila Ostroumova; Egor Samosvat; Pavel Serdyukov

In this paper, we study the problem of timely finding and crawling of \textit{ephemeral} new pages, i.e., for which user traffic grows really quickly right after they appear, but lasts only for several days (e.g., news, blog and forum posts). Traditional crawling policies do not give any particular priority to such pages and may thus crawl them not quickly enough, and even crawl already obsolete content. We thus propose a new metric, well thought out for this task, which takes into account the decrease of user interest for ephemeral pages over time. We show that most ephemeral new pages can be found at a relatively small set of content sources and suggest a method for finding such a set. Our idea is to periodically recrawl content sources and crawl newly created pages linked from them, focusing on high-quality (in terms of user interest) content. One of the main difficulties here is to divide resources between these two activities in an efficient way. We find the adaptive balance between crawls and recrawls by maximizing the proposed metric. Further, we incorporate search engine click logs to give our crawler an insight about the current user demands. The effectiveness of our approach is finally demonstrated experimentally on real-world data.

workshop on algorithms and models for the web graph | 2013

Evolution of the Media Web

Damien Lefortier; Liudmila Ostroumova; Egor Samosvat

We present a detailed study of the part of the Web related to media content, i.e., the Media Web. Using publicly available data, we analyze the evolution of incoming and outgoing links from and to media pages. Based on our observations, we propose a new class of models for the appearance of new media content on the Web where different \textit{attractiveness} functions of nodes are possible including ones taken from well-known preferential attachment and fitness models. We analyze these models theoretically and empirically and show which ones realistically predict both the incoming degree distribution and the so-called \textit{recency property} of the Media Web, something that existing models did not do well. Finally we compare these models by estimating the likelihood of the real-world link graph from our data set given each model and obtain that models we introduce are significantly more likely than previously proposed ones. One of the most surprising results is that in the Media Web the probability for a post to be cited is determined, most likely, by its quality rather than by its current popularity.

european conference on information retrieval | 2015

Adaptive Caching of Fresh Web Search Results

Liudmila Ostroumova Prokhorenkova; Yury Ustinovskiy; Egor Samosvat; Damien Lefortier; Pavel Serdyukov

In this paper, we study the problem of caching search results with a rapid rate of their degradation. We suggest a new caching algorithm, which is based on queries’ frequencies and the predicted staleness of cached results. We also introduce a new performance metric of caching algorithms called staleness degree, which measures the level of degradation of a cached result. In the case of frequently changing search results, this metric is more sensitive to those changes than the previously used stale traffic ratio.

european conference on information retrieval | 2014

Blending Vertical and Web Results

Damien Lefortier; Pavel Serdyukov; Fedor Romanenko; Maarten de Rijke

Modern search engines aggregate results from specialized verticals into the Web search results. We study a setting where vertical and Web results are blended into a single result list, a setting that has not been studied before. We focus on video intent and present a detailed observational study of Yandexs two video content sources i.e., the specialized vertical and a subset of the general web index thus providing insights into their complementary character. By investigating how to blend results from these sources, we contrast traditional federated search and fusion-based approaches with newly proposed approaches that significantly outperform the baseline methods.

international world wide web conferences | 2015