Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Colin Wilkie is active.

Publication


Featured researches published by Colin Wilkie.


international acm sigir conference on research and development in information retrieval | 2013

Relating retrievability, performance and length

Colin Wilkie; Leif Azzopardi

Retrievability provides a different way to evaluate an Information Retrieval (IR) system as it focuses on how easily documents can be found. It is intrinsically related to retrieval performance because a document needs to be retrieved before it can be judged relevant. In this paper, we undertake an empirical investigation into the relationship between the retrievability of documents, the retrieval bias imposed by a retrieval system, and the retrieval performance, across different amounts of document length normalization. To this end, two standard IR models are used on three TREC test collections to show that there is a useful and practical link between retrievability and performance. Our findings show that minimizing the bias across the document collection leads to good performance (though not the best performance possible). We also show that past a certain amount of document length normalization the retrieval bias increases, and the retrieval performance significantly and rapidly decreases. These findings suggest that the relationship between retrievability and effectiveness may offer a way to automatically tune systems.


european conference on information retrieval | 2014

Efficiently Estimating Retrievability Bias

Colin Wilkie; Leif Azzopardi

Retrievability is the measure of how easily a document can be retrieved using a particular retrieval system. The extent to which a retrieval system favours certain documents over others as expressed by their retrievability scores determines the level of bias the system imposes on a collection. Recently it has been shown that it is possible to tune a retrieval system by minimising the retrievability bias. However, to perform such a retrievability analysis often requires posing millions upon millions of queries. In this paper, we examine how many queries are needed to obtain a reliable and useful approximation of the retrievability bias imposed by the system, and an estimate of the individual retrievability of documents in the collection. We find that a reliable estimate of retrievability bias can be obtained, in some cases, with 90% less queries than are typically used while estimating document retrievability can be done with up to 60% less queries.


conference on information and knowledge management | 2014

A Retrievability Analysis: Exploring the Relationship Between Retrieval Bias and Retrieval Performance

Colin Wilkie; Leif Azzopardi

Retrievability provides an alternative way to assess an Information Retrieval (IR) system by measuring how easily documents can be retrieved. Retrievability can also be used to determine the level of retrieval bias a system exerts upon a collection of documents. It has been hypothesised that reducing the retrieval bias will lead to improved performance. To date, it has been shown that this hypothesis does not appear to hold on standard retrieval performance measures (MAP and P@10) when exploring the parameter space of a given retrieval model. However, the evidence is limited and confined to only a few models, collections and measures. In this paper, we perform a comprehensive empirical evaluation analysing the relationship between retrieval bias and retrieval performance using several well known retrieval models, five large TREC test collections and ten performance measures (including the recently proposed PRES, Time Biased Gain (TBG) and U-Measure). For traditional relevance based measures (MAP, P@10, MRR, Recall, etc) the correlation between retrieval bias and performance is moderate. However, for TBG and U-Measure, we find that there is strong and significant negative correlations between retrieval bias and performance (i.e as bias drops, performance increases). These findings suggest that for these more sophisticated, user oriented measures the retrievability bias hypothesis tends to hold. The implication is that for these measures, systems can then be tuned using retrieval bias, without recourse to relevance judgements.


european conference on information retrieval | 2013

An initial investigation on the relationship between usage and findability

Colin Wilkie; Leif Azzopardi

Ensuring that information within a website is findable is particularly important. This is because visitors that cannot find what they are looking for are likely to leave the site or become very frustrated and switch to a competing site. While findability has been touted as important in web design, we wonder to what degree measures of findability are correlated to usage. To this end, we have conducted a preliminary study on three sub-domains across a number of measures of findability.


european conference on information retrieval | 2015

Retrievability and Retrieval Bias: A Comparison of Inequality Measures

Colin Wilkie; Leif Azzopardi

The disposition of a retrieval system to favour certain documents over others can be quantified using retrievability. Typically, the Gini Coefficient has been used to quantify the level of bias a system imposes across the collection with a single value. However, numerous inequality measures have been proposed that may provide different insights into retrievability bias. In this paper, we examine 8 inequality measures, and see the changes in the estimation of bias on 3 standard retrieval models across their respective parameter spaces. We find that most of the measures agree with each other, and that the parameter setting that minimise the inequality according to each measure is similar. This work suggests that the standard inequality measure, the Gini Coefficient, provides similar information regarding the bias. However, we find that Palma index and 20:20 Ratio show the greatest differences and may be useful to provide a different perspective when ranking systems according to bias.


conference on information and knowledge management | 2015

Query Length, Retrievability Bias and Performance

Colin Wilkie; Leif Azzopardi

Past work has shown that longer queries tend to lead to better retrieval performance. However, this comes at the cost of increased user effort effort and additional system processing. In this paper, we examine whether there are benefits of longer queries beyond performance. We posit that increasing the query length will also lead to a reduction in the retrievability bias. Additionally, we speculate that to minimise retrievability bias as queries become longer, more length normalisation must be applied to account for the increase in the length of documents retrieved. To this end, we perform a retrievability analysis on two TREC collections using three standard retrieval models and various lengths of queries (one to five terms). From this investigation we find that increasing the length of queries reduces the overall retrievability bias but at a decreasing rate. Moreover, once the query length exceeds three terms the bias can begin to increase (and the performance can start to drop). We also observe that more document length normalisation is typically required as query length increases, in order to minimise bias. Finally, we show that there is a strong correlation between performance and retrieval bias. This work raises some interesting questions regarding query length and its affect on performance and bias. Further work will be directed towards examining longer and more verbose queries, including those generated via query expansion methods, to obtain a more comprehensive understanding of the relationship between query length, performance and retrievability bias.


international conference on the theory of information retrieval | 2016

A Topical Approach to Retrievability Bias Estimation

Colin Wilkie; Leif Azzopardi

Retrievability is an independent evaluation measure that offers insights to an aspect of retrieval systems that performance and efficiency measures do not. Retrievability is often used to calculate the retrievability bias, an indication of how accessible a system makes all the documents in a collection. Generally, computing the retrievability bias of a system requires a colossal number of queries to be issued for the system to gain an accurate estimate of the bias. However, it is often the case that the accuracy of the estimate is not of importance, but the relationship between the estimate of bias and performance when tuning a systems parameters. As such, reaching a stable estimation of bias for the system is more important than getting very accurate retrievability scores for individual documents. This work explores the idea of using topical subsets of the collection for query generation and bias estimation to form a local estimate of bias which correlates with the global estimate of retrievability bias. By using topical subsets, it would be possible to reduce the volume of queries required to reach an accurate estimate of retrievability bias, reducing the time and resources required to perform a retrievability analysis. Findings suggest that this is a viable approach to estimating retrievability bias and that the number of queries required can be reduced to less than a quarter of what was previously thought necessary.


european conference on information retrieval | 2014

Page Retrievability Calculator

Leif Azzopardi; Rosanne English; Colin Wilkie; D.J. Maxwell

Knowing how easily pages within a website can be retrieved using the sites search functionality provides crucial information to the site designer. If the system is not retrieving particular pages then the system or information may need to be changed to ensure that visitors to the site have the best chance of finding the relevant information. In this demo paper, we present a Page Retrievability Calculator, which estimates the retrievability of a page for a given search engine. To estimate the retrievability, instead of posing all possible queries, we focus on issuing only those likely to retrieve the page and use them to obtain an accurate approximation. We can also rank the queries associated with the page to show the site designer what queries are most likely to retrieve the pages and at what rank. With this application we can now explore how it might be possible to improve the site or content to improve the retrievability.


international conference on the theory of information retrieval | 2017

An Initial Investigation of Query Expansion Bias

Colin Wilkie; Leif Azzopardi

In this work, the relationship between performance and retrievability bias is explored when various query expansion methods are employed to aide retrieval. Several parameters are altered, independently, to identify those that have an impact on bias. Parameters altered include; Rocchios beta, length normalisation parameters, the number of terms added and the number of documents those terms are extracted from. A strong correlation between performance and retrievability bias is identified, suggesting that query expansion increases performance by making the system more biased.


conference on information and knowledge management | 2017

Algorithmic Bias: Do Good Systems Make Relevant Documents More Retrievable?

Colin Wilkie; Leif Azzopardi

Algorithmic bias presents a difficult challenge within Information Retrieval. Long has it been known that certain algorithms favour particular documents due to attributes of these documents that are not directly related to relevance. The evaluation of bias has recently been made possible through the use of retrievability, a quantifiable measure of bias. While evaluating bias is relatively novel, the evaluation of performance has been common since the dawn of the Cranfield approach and TREC. To evaluate performance, a pool of documents to be judged by human assessors is created from the collection. This pooling approach has faced accusations of bias due to the fact that the state of the art algorithms were used to create it, thus the inclusion of biases associated with these algorithms may be included in the pool. The introduction of retrievability has provided a mechanism to evaluate the bias of these pools. This work evaluates the varying degrees of bias present in the groups of relevant and non-relevant documents for topics. The differentiating power of a system is also evaluated by examining the documents from the pool that are retrieved for each topic. The analysis finds that the systems that perform better, tend to have a higher chance of retrieving a relevant document rather than a non-relevant document for a topic prior to retrieval, indicating that retrieval systems which perform better at TREC are already predisposed to agree with the judgements regardless of the query posed.

Collaboration


Dive into the Colin Wilkie's collaboration.

Top Co-Authors

Avatar

Leif Azzopardi

University of Strathclyde

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge