Pavel Serdyukov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pavel Serdyukov is active.

Explore More

Publication

Featured researches published by Pavel Serdyukov.

international acm sigir conference on research and development in information retrieval | 2009

Placing flickr photos on a map

Pavel Serdyukov; Vanessa Murdock; Roelof van Zwol

In this paper we investigate generic methods for placing photos uploaded to Flickr on the World map. As primary input for our methods we use the textual annotations provided by the users to predict the single most probable location where the image was taken. Central to our approach is a language model based entirely on the annotations provided by users. We define extensions to improve over the language model using tag-based smoothing and cell-based smoothing, and leveraging spatial ambiguity. Further we demonstrate how to incorporate GeoNames\footnote{http://www.geonames.org visited May 2009}, a large external database of locations. For varying levels of granularity, we are able to place images on a map with at least twice the precision of the state-of-the-art reported in the literature.

Foundations and Trends in Information Retrieval archive | 2012

Expertise Retrieval

Krisztian Balog; Yi Fang; Maarten de Rijke; Pavel Serdyukov; Luo Si

People have looked for experts since before the advent of computers. With advances in information retrieval technology and the large-scale availability of digital traces of knowledge-related activities, computer systems that can fully automate the process of locating expertise have become a reality. The past decade has witnessed tremendous interest, and a wealth of results, in expertise retrieval as an emerging subdiscipline in information retrieval. This survey highlights advances in models and algorithms relevant to this field. We draw connections among methods proposed in the literature and summarize them in five groups of basic approaches. These serve as the building blocks for more advanced models that arise when we consider a range of content-based factors that may impact the strength of association between a topic and a person. We also discuss practical aspects of building an expert search system and present applications of the technology in other domains, such as blog distillation and entity retrieval. The limitations of current approaches are also pointed out. We end our survey with a set of conjectures on what the future may hold for expertise retrieval research.

international conference on multimedia retrieval | 2011

Automatic tagging and geotagging in video collections and communities

Martha Larson; Mohammad Soleymani; Pavel Serdyukov; Stevan Rudinac; Christian Wartena; Vanessa Murdock; Gerald Friedland; Roeland Ordelman; Gareth J. F. Jones

Automatically generated tags and geotags hold great promise to improve access to video collections and online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features.

conference on information and knowledge management | 2008

Modeling multi-step relevance propagation for expert finding

Pavel Serdyukov; Henning Rode; Djoerd Hiemstra

An expert finding system allows a user to type a simple text query and retrieve names and contact information of individuals that possess the expertise expressed in the query. This paper proposes a novel approach to expert finding in large enterprises or intranets by modeling candidate experts (persons), web documents and various relations among them with so-called expertise graphs. As distinct from the state of-the-art approaches estimating personal expertise through one-step propagation of relevance probability from documents to the related candidates, our methods are based on the principle of multi-step relevance propagation in topic specific expertise graphs. We model the process of expert finding by probabilistic random walks of three kinds: finite, infinite and absorbing. Experiments on TREC Enterprise Track data originating from two large organizations show that our methods using multi-step relevance propagation improve over the baseline one-step propagation based method in almost all cases.

european conference on information retrieval | 2008

Modeling documents as mixtures of persons for expert finding

Pavel Serdyukov; Djoerd Hiemstra

In this paper we address the problem of searching for knowledgeable persons within the enterprise, known as the expert finding (or expert search) task. We present a probabilistic algorithm using the assumption that terms in documents are produced by people who are mentioned in them.We represent documents retrieved to a query as mixtures of candidate experts language models. Two methods of personal language models extraction are proposed, as well as the way of combining them with other evidences of expertise. Experiments conducted with the TREC Enterprise collection demonstrate the superiority of our approach in comparison with the best one among existing solutions.

conference on information and knowledge management | 2012

Prediction of retweet cascade size over time

Andrey Kupavskii; Liudmila Ostroumova; Alexey V. Umnov; Svyatoslav Usachev; Pavel Serdyukov; Gleb Gusev; Andrey Kustarev

Retweet cascades play an essential role in information diffusion in Twitter. Popular tweets reflect the current trends in Twitter, while Twitter itself is one of the most important online media. Thus, understanding the reasons why a tweet becomes popular is of great interest for sociologists, marketers and social media researches. What is even more important is the possibility to make a prognosis of a tweets future popularity. Besides the scientific significance of such possibility, this sort of prediction has lots of practical applications such as breaking news detection, viral marketing etc. In this paper we try to forecast how many retweets a given tweet will gain during a fixed time period. We train an algorithm that predicts the number of retweets during time T since the initial moment. In addition to a standard set of features we utilize several new ones. One of the most important features is the flow of the cascade. Another one is PageRank on the retweet graph, which can be considered as the measure of influence of users.

international acm sigir conference on research and development in information retrieval | 2013

Click model-based information retrieval metrics

Aleksandr Chuklin; Pavel Serdyukov; Maarten de Rijke

In recent years many models have been proposed that are aimed at predicting clicks of web search users. In addition, some information retrieval evaluation metrics have been built on top of a user model. In this paper we bring these two directions together and propose a common approach to converting any click model into an evaluation metric. We then put the resulting model-based metrics as well as traditional metrics (like DCG or Precision) into a common evaluation framework and compare them along a number of dimensions. One of the dimensions we are particularly interested in is the agreement between offline and online experimental outcomes. It is widely believed, especially in an industrial setting, that online A/B-testing and interleaving experiments are generally better at capturing system quality than offline measurements. We show that offline metrics that are based on click models are more strongly correlated with online experimental outcomes than traditional offline metrics, especially in situations when we have incomplete relevance judgements.

information interaction in context | 2010

An analysis of queries intended to search information for children

Sergio Duarte Torres; Djoerd Hiemstra; Pavel Serdyukov

Query logs contain valuable information about the behavior, interests, and preferences of the users. The analysis of this information can give insight in their interaction and search behavior. In this paper, we analyze queries and groups of queries intended to find information that is suitable for children by using a large-scale query log. The aim of the analysis it twofold: (i) To identify differences in the query space, content space, user sessions, and user click behavior. (ii) To enhance the query log by including annotations of queries, sessions and actions. The paper presents plans to use this resource for further research on information retrieval for children. We found statistically significant differences between the set of general purpose queries, and the set of children queries. We show that many of these differences are consistent with small-scale research studies in which children were observed while using web search engines.

international world wide web conferences | 2016

A Neural Click Model for Web Search

Alexey Borisov; Ilya Markov; Maarten de Rijke; Pavel Serdyukov

Understanding user browsing behavior in web search is key to improving web search effectiveness. Many click models have been proposed to explain or predict user clicks on search engine results. They are based on the probabilistic graphical model (PGM) framework, in which user behavior is represented as a sequence of observable and hidden events. The PGM framework provides a mathematically solid way to reason about a set of events given some information about other events. But the structure of the dependencies between the events has to be set manually. Different click models use different hand-crafted sets of dependencies. We propose an alternative based on the idea of distributed representations: to represent the users information need and the information available to the user with a vector state. The components of the vector state are learned to represent concepts that are useful for modeling user behavior. And user behavior is modeled as a sequence of vector states associated with a query session: the vector state is initialized with a query, and then iteratively updated based on information about interactions with the search engine results. This approach allows us to directly understand user browsing behavior from click-through data, i.e., without the need for a predefined set of rules as is customary for PGM-based click models. We illustrate our approach using a set of neural click models. Our experimental results show that the neural click model that uses the same training data as traditional PGM-based click models, has better performance on the click prediction task (i.e., predicting user click on search engine results) and the relevance prediction task (i.e., ranking documents by their relevance to a query). An analysis of the best performing neural click model shows that it learns similar concepts to those used in traditional click models, and that it also learns other concepts that cannot be designed manually.

international world wide web conferences | 2015

Future User Engagement Prediction and Its Application to Improve the Sensitivity of Online Experiments

Alexey Drutsa; Gleb Gusev; Pavel Serdyukov

Modern Internet companies improve their services by means of data-driven decisions that are based on online controlled experiments (also known as A/B tests). To run more online controlled experiments and to get statistically significant results faster are the emerging needs for these companies. The main way to achieve these goals is to improve the sensitivity of A/B experiments. We propose a novel approach to improve the sensitivity of user engagement metrics (that are widely used in A/B tests) by utilizing prediction of the future behavior of an individual user. This problem of prediction of the exact value of a user engagement metric is also novel and is studied in our work. We demonstrate the effectiveness of our sensitivity improvement approach on several real online experiments run at Yandex. Especially, we show how it can be used to detect the treatment effect of an A/B test faster with the same level of statistical significance.

Explore More