Ilya Markov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ilya Markov is active.

Explore More

Publication

Featured researches published by Ilya Markov.

international world wide web conferences | 2016

A Neural Click Model for Web Search

Alexey Borisov; Ilya Markov; Maarten de Rijke; Pavel Serdyukov

Understanding user browsing behavior in web search is key to improving web search effectiveness. Many click models have been proposed to explain or predict user clicks on search engine results. They are based on the probabilistic graphical model (PGM) framework, in which user behavior is represented as a sequence of observable and hidden events. The PGM framework provides a mathematically solid way to reason about a set of events given some information about other events. But the structure of the dependencies between the events has to be set manually. Different click models use different hand-crafted sets of dependencies. We propose an alternative based on the idea of distributed representations: to represent the users information need and the information available to the user with a vector state. The components of the vector state are learned to represent concepts that are useful for modeling user behavior. And user behavior is modeled as a sequence of vector states associated with a query session: the vector state is initialized with a query, and then iteratively updated based on information about interactions with the search engine results. This approach allows us to directly understand user browsing behavior from click-through data, i.e., without the need for a predefined set of rules as is customary for PGM-based click models. We illustrate our approach using a set of neural click models. Our experimental results show that the neural click model that uses the same training data as traditional PGM-based click models, has better performance on the click prediction task (i.e., predicting user click on search engine results) and the relevance prediction task (i.e., ranking documents by their relevance to a query). An analysis of the best performing neural click model shows that it learns similar concepts to those used in traditional click models, and that it also learns other concepts that cannot be designed manually.

ACM Transactions on Information Systems | 2014

Theoretical, Qualitative, and Quantitative Analyses of Small-Document Approaches to Resource Selection

Ilya Markov; Fabio Crestani

In a distributed retrieval setup, resource selection is the problem of identifying and ranking relevant sources of information for a given user’s query. For better usage of existing resource-selection techniques, it is desirable to know what the fundamental differences between them are and in what settings one is superior to others. However, little is understood still about the actual behavior of resource-selection methods. In this work, we focus on small-document approaches to resource selection that rank and select sources based on the ranking of their documents. We pose a number of research questions and approach them by three types of analyses. First, we present existing small-document techniques in a unified framework and analyze them theoretically. Second, we propose using a qualitative analysis to study the behavior of different small-document approaches. Third, we present a novel experimental methodology to evaluate small-document techniques and to validate the results of the qualitative analysis. This way, we answer the posed research questions and provide insights about small-document methods in general and about each technique in particular.

european conference on information retrieval | 2013

Reducing the uncertainty in resource selection

Ilya Markov; Leif Azzopardi; Fabio Crestani

The distributed retrieval process is plagued by uncertainty. Sampling, selection, merging and ranking are all based on very limited information compared to centralized retrieval. In this paper, we focus our attention on reducing the uncertainty within the resource selection phase by obtaining a number of estimates, rather than relying upon only one point estimate. We propose three methods for reducing uncertainty which are compared against state-of-the-art baselines across three distributed retrieval testbeds. Our results show that the proposed methods significantly improve baselines, reduce the uncertainty and improve robustness of resource selection.

european conference on information retrieval | 2013

Distributed information retrieval and applications

Fabio Crestani; Ilya Markov

Distributed Information Retrieval (DIR) is a generic area of research that brings together techniques, such as resource selection and results aggregation, dealing with data that, for organizational or technical reasons, cannot be managed centrally. Existing and potential applications of DIR methods vary from blog retrieval to aggregated search and from multimedia and multilingual retrieval to distributed Web search. In this tutorial we briefly discuss main DIR phases, that are resource description, resource selection, results merging and results presentation. The main focus is made on applications of DIR techniques: blog, expert and desktop search, aggregated search and personal meta-search, multimedia and multilingual retrieval. We also discuss a number of potential applications of DIR techniques, such as distributed Web search, enterprise search and aggregated mobile search.

cross language evaluation forum | 2015

A Comparative Study of Click Models for Web Search

Artem Grotov; Aleksandr Chuklin; Ilya Markov; Luka Stout; Finde Xumara; Maarten de Rijke

Click models have become an essential tool for understanding user behavior on a search engine result page, running simulated experiments and predicting relevance. Dozens of click models have been proposed, all aiming to tackle problems stemming from the complexity of user behavior or of contemporary result pages. Many models have been evaluated using proprietary data, hence the results are hard to reproduce. The choice of baseline models is not always motivated and the fairness of such comparisons may be questioned. In this study, we perform a detailed analysis of all major click models for web search ranging from very simplistic to very complex. We employ a publicly available dataset, open-source software and a range of evaluation techniques, which makes our results both representative and reproducible. We also analyze the query space to show what type of queries each model can handle best.

international acm sigir conference on research and development in information retrieval | 2012

Unsupervised linear score normalization revisited

Ilya Markov; Avi Arampatzis; Fabio Crestani

We give a fresh look into score normalization for merging result-lists, isolating the problem from other components. We focus on three of the simplest, practical, and widely-used linear methods which do not require any training data, i.e. MinMax, Sum, and Z-Score. We provide theoretical arguments on why and when the methods work, and evaluate them experimentally. We find that MinMax is the most robust under many circumstances, and that Sum is - in contrast to previous literature - the worst. Based on the insights gained, we propose another three simple methods which work as good or better than the baselines.

european conference on information retrieval | 2013

On CORI results merging

Ilya Markov; Avi Arampatzis; Fabio Crestani

Score normalization and results merging are important components of many IR applications. Recently MinMax--an unsupervised linear score normalization method--was shown to perform quite well across various distributed retrieval testbeds, although based on strong assumptions. The CORI results merging method relaxes these assumptions to some extent and significantly improves the performance of MinMax. We parameterize CORI and evaluate its performance across a range of parameter settings. Experimental results on three distributed retrieval testbeds show that CORI significantly outperforms state-of-the-art results merging and score normalization methods when its parameter goes to infinity.

international conference on image and signal processing | 2008

Image Retrieval: Color and Texture Combining Based on Query-Image

Ilya Markov; Natalia Vassilieva

It is a common way to process different image features independently in order to measure similarity between images. Color and texture are the common ones to use for searching in natural images. In [10] a technique to combine color and texture features based on a particular query-image in order to improve retrieval efficiency was proposed. Weighted linear combination of color and texture metrics was considered as a mixed-metrics. In this paper the mixed-metrics with different weights are compared to pure color and texture metrics and widely used CombMNZ data fusion algorithm. Experiments show that proposed metrics outperform CombMNZ method in some cases, and have close results in others.

international acm sigir conference on research and development in information retrieval | 2011

Modeling document scores for distributed information retrieval

Ilya Markov

Distributed Information Retrieval (DIR), also known as Federated Search, integrates multiple searchable collections and provides direct access to them through a unified interface [3]. This is done by a centralized broker, that receives user queries, forwards them to appropriate collections and returns merged results to users. In practice, most of federated resources do not cooperate with a broker and do not provide neither their content nor the statistics used for retrieval. This is known as uncooperative DIR. In this case a broker creates a resource representation by sending sample queries to a collection and analyzing retrieved documents. This process is called query-based sampling. The key issue here is the following: 1.1 How many documents have to be retrieved from a resource in order to obtain a representative sample? Although there have been a number of attempts to address this issue it is still not solved appropriately. For a given user query resources are ranked according to their similarity to the query or based on the number of relevant documents they contain. Since resource representations are usually incomplete, the similarity or the number of relevant documents cannot be calculated precisely. Resource selection algorithms proposed in the literature estimate these numbers based on incomplete samples. However these estimates are subjects to error. In practice, inaccurate estimates that have high error should be trusted less then the more accurate estimates with low error. Unfortunately none of the existing algorithms can make the calculation of the estimation errors possible. Therefore the following questions arise: 2.1 How to estimate resource scores so that the estimation errors can be calculated? 2.2 How to use these errors in order to improve the resource selection performance? Existing results merging algorithms estimate normalized document scores based on scores of documents that appear both in a sample and in a result list. The problem similar to the resource selection one arises. The normalized document scores are only the estimates and are subjects to error. Inaccurate estimates should be trusted less then the more accurate ones. Again none of the existing algorithms provide a way for calculating these errors. Thus the two question to be address on the results merging phase are similar to the resource selection ones: 3.1 How to estimate normalized document scores so that the estimation errors can be calculated? 3.2 How to use these errors in order to improve the results merging performance? In this work we address the above issues by applying score distribution models (SDM) to different phases of DIR [2]. In particular, we discuss the SDM-based resource selection technique that allows the calculation of resource score estimation errors and can be extended in order to calculate the number of documents to be sampled from each resource for a given query. We have performed initial experiments comparing the SDM-based resource selection technique to the state-of-the-art algorithms and we are currently experimenting with the SDM-based results merging method. We plan to apply the existing score normalization techniques from meta-search to the DIR results merging problem [1]. However, the SDM-based results merging approaches require the relevance scores to be returned together with retrieved documents. It is not yet clear how to relax this strong assumption that does not always hold in practice.

conference on information and knowledge management | 2014

Vertical-Aware Click Model-Based Effectiveness Metrics

Ilya Markov; Eugene Kharitonov; Vadim Nikulin; Pavel Serdyukov; Maarten de Rijke; Fabio Crestani

Todays web search systems present users with heterogeneous information coming from sources of different types, also known as verticals. Evaluating such systems is an important but complex task, which is still far from being solved. In this paper we examine the hypothesis that the use of models that capture user search behavior on heterogeneous result pages helps to improve the quality of offline metrics. We propose two vertical-aware metrics based on user click models for federated search and evaluate them using query logs of the Yandex search engine. We show that depending on the type of vertical, the proposed metrics have higher correlation with online user behavior than other state-of-the-art techniques.

Explore More