Virgiliu Pavlu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Virgiliu Pavlu is active.

Explore More

Publication

Featured researches published by Virgiliu Pavlu.

international acm sigir conference on research and development in information retrieval | 2006

A statistical method for system evaluation using incomplete judgments

Javed A. Aslam; Virgiliu Pavlu; Emine Yilmaz

We consider the problem of large-scale retrieval evaluation, and we propose a statistical method for evaluating retrieval systems using incomplete judgments. Unlike existing techniques that (1) rely on effectively complete, and thus prohibitively expensive, relevance judgment sets, (2) produce biased estimates of standard performance measures, or (3) produce estimates of non-standard measures thought to be correlated with these standard measures, our proposed statistical technique produces unbiased estimates of the standard measures themselves.Our proposed technique is based on random sampling. While our estimates are unbiased by statistical design, their variance is dependent on the sampling distribution employed; as such, we derive a sampling distribution likely to yield low variance estimates. We test our proposed technique using benchmark TREC data, demonstrating that a sampling pool derived from a set of runs can be used to efficiently and effectively evaluate those runs. We further show that these sampling pools generalize well to unseen runs. Our experiments indicate that highly accurate estimates of standard performance measures can be obtained using a number of relevance judgments as small as 4% of the typical TREC-style judgment pool.

international acm sigir conference on research and development in information retrieval | 2008

Evaluation over thousands of queries

Ben Carterette; Virgiliu Pavlu; Evangelos Kanoulas; Javed A. Aslam; James Allan

Information retrieval evaluation has typically been performed over several dozen queries, each judged to near-completeness. There has been a great deal of recent work on evaluation over much smaller judgment sets: how to select the best set of documents to judge and how to estimate evaluation measures when few judgments are available. In light of this, it should be possible to evaluate over many more queries without much more total judging effort. The Million Query Track at TREC 2007 used two document selection algorithms to acquire relevance judgments for more than 1,800 queries. We present results of the track, along with deeper analysis: investigating tradeoffs between the number of queries and number of judgments shows that, up to a point, evaluation over more queries with fewer judgments is more cost-effective and as reliable as fewer queries with more judgments. Total assessor effort can be reduced by 95% with no appreciable increase in evaluation errors.

international acm sigir conference on research and development in information retrieval | 2005

The maximum entropy method for analyzing retrieval measures

Javed A. Aslam; Emine Yilmaz; Virgiliu Pavlu

We present a model, based on the maximum entropy method, for analyzing various measures of retrieval performance such as average precision, R-precision, and precision-at-cutoffs. Our methodology treats the value of such a measure as a constraint on the distribution of relevant documents in an unknown list, and the maximum entropy distribution can be determined subject to these constraints. For good measures of overall performance (such as average precision), the resulting maximum entropy distributions are highly correlated with actual distributions of relevant documents in lists as demonstrated through TREC data; for poor measures of overall performance, the correlation is weaker. As such, the maximum entropy method can be used to quantify the overall quality of a retrieval measure. Furthermore, for good measures of overall performance (such as average precision), we show that the corresponding maximum entropy distributions can be used to accurately infer precision-recall curves and the values of other measures of performance, and we demonstrate that the quality of these inferences far exceeds that predicted by simple retrieval measure correlation, as demonstrated through TREC data.

international acm sigir conference on research and development in information retrieval | 2009

Document selection methodologies for efficient and effective learning-to-rank

Javed A. Aslam; Evangelos Kanoulas; Virgiliu Pavlu; Stefan Savev; Emine Yilmaz

Learning-to-rank has attracted great attention in the IR community. Much thought and research has been placed on query-document feature extraction and development of sophisticated learning-to-rank algorithms. However, relatively little research has been conducted on selecting documents for learning-to-rank data sets nor on the effect of these choices on the efficiency and effectiveness of learning-to-rank algorithms. In this paper, we employ a number of document selection methodologies, widely used in the context of evaluation--depth-k pooling, sampling (infAP, statAP), active-learning (MTC), and on-line heuristics (hedge). Certain methodologies, e.g. sampling and active-learning, have been shown to lead to efficient and effective evaluation. We investigate whether they can also enable efficient and effective learning-to-rank. We compare them with the document selection methodology used to create the LETOR datasets. Further, all of the utilized methodologies are different in nature, and thus they construct training data sets with different properties, such as the proportion of relevant documents in the data or the similarity among them. We study how such properties affect the efficiency, effectiveness, and robustness of learning-to-rank collections.

international acm sigir conference on research and development in information retrieval | 2005

A geometric interpretation of r-precision and its correlation with average precision

Javed A. Aslam; Emine Yilmaz; Virgiliu Pavlu

We consider two of the most commonly cited measures of retrieval performance: average precision and R-precision. It is well known that average precision and R-precision are highly correlated and similarly robust measures of performance, though the reasons for this are not entirely clear. In this paper, we give a geometric argument which shows that under a very reasonable set of assumptions, average precision and R-precision both approximate the area under the precision-recall curve, thus explaining their high correlation. We further demonstrate through the use of TREC data that the similarity or difference between average precision and R-precision is largely governed by the adherence to, or violation of, these reasonable assumptions.

international acm sigir conference on research and development in information retrieval | 2010

Score distribution models: assumptions, intuition, and robustness to score manipulation

Evangelos Kanoulas; Keshi Dai; Virgiliu Pavlu; Javed A. Aslam

Inferring the score distribution of relevant and non-relevant documents is an essential task for many IR applications (e.g. information filtering, recall-oriented IR, meta-search, distributed IR). Modeling score distributions in an accurate manner is the basis of any inference. Thus, numerous score distribution models have been proposed in the literature. Most of the models were proposed on the basis of empirical evidence and goodness-of-fit. In this work, we model score distributions in a rather different, systematic manner. We start with a basic assumption on the distribution of terms in a document. Following the transformations applied on term frequencies by two basic ranking functions, BM25 and Language Models, we derive the distribution of the produced scores for all documents. Then we focus on the relevant documents. We detach our analysis from particular ranking functions. Instead, we consider a model for precision-recall curves, and given this model, we present a general mathematical framework which, given any score distribution for all retrieved documents, produces an analytical formula for the score distribution of relevant documents that is consistent with the precision-recall curves that follow the aforementioned model. In particular, assuming a Gamma distribution for all retrieved documents, we show that the derived distribution for the relevant documents resembles a Gaussian distribution with a heavy right tail.

european conference on information retrieval | 2009

If I Had a Million Queries

Ben Carterette; Virgiliu Pavlu; Evangelos Kanoulas; Javed A. Aslam; James Allan

As document collections grow larger, the information needs and relevance judgments in a test collection must be well-chosen within a limited budget to give the most reliable and robust evaluation results. In this work we analyze a sample of queries categorized by length and corpus-appropriateness to determine the right proportion needed to distinguish between systems. We also analyze the appropriate division of labor between developing topics and making relevance judgments, and show that only a small, biased sample of queries with sparse judgments is needed to produce the same results as a much larger sample of queries.

international acm sigir conference on research and development in information retrieval | 2005

Measure-based metasearch

Javed A. Aslam; Virgiliu Pavlu; Emine Yilmaz

We propose a simple method for converting many standard measures of retrieval performance into metasearch algorithms. Our focus is both on the analysis of retrieval measures themselves and on the development of new metasearch algorithms. Given the conversion method proposed, our experimental results using TREC data indicate that system-oriented measures of overall retrieval performance (such as average precision) yield good metasearch algorithms whose performance equals or exceeds that of benchmark techniques such as CombMNZ and Condorcet.

international acm sigir conference on research and development in information retrieval | 2003

A unified model for metasearch and the efficient evaluation of retrieval systems via the hedge algorithm

Javed A. Aslam; Virgiliu Pavlu; Robert Savell

We present a unified framework for simultaneously solving both the pooling problem (the construction of efficient document pools for the evaluation of retrieval systems) and metasearch (the fusion of ranked lists returned by retrieval systems in order to increase performance). The implementation is based on the Hedge algorithm for online learning, which has the advantage of convergence to bounded error rates approaching the performance of the best linear combination of the underlying systems. The choice of a loss function closely related to the average precision measure of system performance ensures that the judged document set performs well, both in constructing a metasearch list and as a pool for the accurate evaluation of retrieval systems. Our experimental results on TREC data demonstrate excellent performance in all measures---evaluation of systems, retrieval of relevant documents, and generation of metasearch lists.

european conference on information retrieval | 2012

Extended expectation maximization for inferring score distributions

Keshi Dai; Virgiliu Pavlu; Evangelos Kanoulas; Javed A. Aslam

Inferring the distributions of relevant and nonrelevant documents over a ranked list of scored documents returned by a retrieval system has a broad range of applications including information filtering, recall-oriented retrieval, metasearch, and distributed IR. Typically, the distribution of documents over scores is modeled by a mixture of two distributions, one for the relevant and one for the nonrelevant documents, and expectation maximization (EM) is run to estimate the mixture parameters. A large volume of work has focused on selecting the appropriate form of the two distributions in the mixture. In this work we consider the form of the distributions as a given and we focus on the inference algorithm. We extend the EM algorithm (a) by simultaneously considering the ranked lists of documents returned by multiple retrieval systems, and (b) by encoding in the algorithm the constraint that the same document retrieved by multiple systems should have the same, global, probability of relevance. We test the new inference algorithm using TREC data and we demonstrate that it outperforms the regular EM algorithm. It is better calibrated in inferring the probability of documents relevance, and it is more effective when applied on the task of metasearch.

Explore More