Craig Macdonald | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Craig Macdonald is active.

Explore More

Publication

Featured researches published by Craig Macdonald.

international world wide web conferences | 2010

Exploiting query reformulations for web search result diversification

Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis

When a Web users underlying information need is not clearly specified from the initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search result diversification, which explicitly accounts for the various aspects associated to an underspecified query. In particular, we diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as a whole. We thoroughly evaluate our framework in the context of the diversity task of the TREC 2009 Web track. Moreover, we exploit query reformulations provided by three major Web search engines (WSEs) as a means to uncover different query aspects. The results attest the effectiveness of our framework when compared to state-of-the-art diversification approaches in the literature. Additionally, by simulating an upper-bound query reformulation mechanism from official TREC data, we draw useful insights regarding the effectiveness of the query reformulations generated by the different WSEs in promoting diversity.

european conference on information retrieval | 2005

Terrier information retrieval platform

Iadh Ounis; Gianni Amati; Vassilis Plachouras; Ben He; Craig Macdonald; Douglas Johnson

Terrier is a modular platform for the rapid development of large-scale Information Retrieval (IR) applications. It can index various document collections, including TREC and Web collections. Terrier also offers a range of document weighting and query expansion models, based on the Divergence From Randomness framework. It has been successfully used for ad-hoc retrieval, cross-language retrieval, Web IR and intranet search, in a centralised or distributed setting.

international acm sigir conference on research and development in information retrieval | 2011

Intent-aware search result diversification

Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis

Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches.

conference on information and knowledge management | 2008

An effective statistical approach to blog post opinion retrieval

Ben He; Craig Macdonald; Jiyin He; Iadh Ounis

Finding opinionated blog posts is still an open problem in information retrieval, as exemplified by the recent TREC blog tracks. Most of the current solutions involve the use of external resources and manual efforts in identifying subjective features. In this paper, we propose a novel and effective dictionary-based statistical approach, which automatically derives evidence for subjectivity from the blog collection itself, without requiring any manual effort. Our experiments show that the proposed approach is capable of achieving remarkable and statistically significant improvements over robust baselines, including the best TREC baseline run. In addition, with relatively little computational costs, our proposed approach provides an effective performance in retrieving opinionated blog posts, which is as good as a computationally expensive approach using Natural Language Processing techniques.

european conference on information retrieval | 2010

Explicit search result diversification through sub-queries

Rodrygo L. T. Santos; Jie Peng; Craig Macdonald; Iadh Ounis

Queries submitted to a retrieval system are often ambiguous. In such a situation, a sensible strategy is to diversify the ranking of results to be retrieved, in the hope that users will find at least one of these results to be relevant to their information need. In this paper, we introduce xQuAD, a novel framework for search result diversification that builds such a diversified ranking by explicitly accounting for the relationship between documents retrieved for the original query and the possible aspects underlying this query, in the form of sub-queries. We evaluate the effectiveness of xQuAD using a standard TREC collection. The results show that our framework markedly outperforms state-of-the-art diversification approaches under a simulated best-case scenario. Moreover, we show that its effectiveness can be further improved by estimating the relative importance of each identified sub-query. Finally, we show that our framework can still outperform the simulated best-case scenario of the state-of-the-art diversification approaches using sub-queries automatically derived from the baseline document ranking itself.

conference on information and knowledge management | 2010

Selectively diversifying web search results

Rodrygo L. T. Santos; Craig Macdonald; Iadh Ounis

Search result diversification is a natural approach for tackling ambiguous queries. Nevertheless, not all queries are equally ambiguous, and hence different queries could benefit from different diversification strategies. A more lenient or more aggressive diversification strategy is typically encoded by existing approaches as a trade-off between promoting relevance or diversity in the search results. In this paper, we propose to learn such a trade-off on a per-query basis. In particular, we examine how the need for diversification can be learnt for each query - given a diversification approach and an unseen query, we predict an effective trade-off between relevance and diversity based on similar previously seen queries. Thorough experiments using the TREC ClueWeb09 collection show that our selective approach can significantly outperform a uniform diversification for both classical and state-of-the-art diversification approaches.

Knowledge and Information Systems | 2008

Voting techniques for expert search

Craig Macdonald; Iadh Ounis

In an expert search task, the users’ need is to identify people who have relevant expertise to a topic of interest. An expert search system predicts and ranks the expertise of a set of candidate persons with respect to the users’ query. In this paper, we propose a novel approach for predicting and ranking candidate expertise with respect to a query, called the Voting Model for Expert Search. In the Voting Model, we see the problem of ranking experts as a voting problem. We model the voting problem using 12 various voting techniques, which are inspired from the data fusion field. We investigate the effectiveness of the Voting Model and the associated voting techniques across a range of document weighting models, in the context of the TREC 2005 and TREC 2006 Enterprise tracks. The evaluation results show that the voting paradigm is very effective, without using any query or collection-specific heuristics. Moreover, we show that improving the quality of the underlying document representation can significantly improve the retrieval performance of the voting techniques on an expert search task. In particular, we demonstrate that applying field-based weighting models improves the ranking of candidates. Finally, we demonstrate that the relative performance of the voting techniques for the proposed approach is stable on a given task regardless of the used weighting models, suggesting that some of the proposed voting techniques will always perform better than other voting techniques.

Information Retrieval | 2013

The whens and hows of learning to rank for web search

Craig Macdonald; Rodrygo L. T. Santos; Iadh Ounis

Web search engines are increasingly deploying many features, combined using learning to rank techniques. However, various practical questions remain concerning the manner in which learning to rank should be deployed. For instance, a sample of documents with sufficient recall is used, such that re-ranking of the sample by the learned model brings the relevant documents to the top. However, the properties of the document sample such as when to stop ranking—i.e. its minimum effective size—remain unstudied. Similarly, effective listwise learning to rank techniques minimise a loss function corresponding to a standard information retrieval evaluation measure. However, the appropriate choice of how to calculate the loss function—i.e. the choice of the learning evaluation measure and the rank depth at which this measure should be calculated—are as yet unclear. In this paper, we address all of these issues by formulating various hypotheses and research questions, before performing exhaustive experiments using multiple learning to rank techniques and different types of information needs on the ClueWeb09 and LETOR corpora. Among many conclusions, we find, for instance, that the smallest effective sample for a given query set is dependent on the type of information need of the queries, the document representation used during sampling and the test evaluation measure. As the sample size is varied, the selected features markedly change—for instance, we find that the link analysis features are favoured for smaller document samples. Moreover, despite reflecting a more realistic user model, the recently proposed ERR measure is not as effective as the traditional NDCG as a learning loss function. Overall, our comprehensive experiments provide the first empirical derivation of best practices for learning to rank deployments.

conference on information and knowledge management | 2007

Expertise drift and query expansion in expert search

Craig Macdonald; Iadh Ounis

Pseudo-relevance feedback, or query expansion, has been shown to improve retrieval performance in the adhoc retrieval task. In such a scenario, a few top-ranked documents are assumed to be relevant, and these are then used to expand and refine the initial user query, such that it retrieves a higher quality ranking of documents. However, there has been little work in applying query expansion in the expert search task. In this setting, query expansion is applied by assuming a few top-ranked candidates have relevant expertise, and using these to expand the query. Nevertheless, retrieval is not improved as expected using such an approach. We show that the success of the application of query expansion is hindered by the presence of topic drift within the profiles of experts that the system considers. In this work, we demonstrate how topic drift occurs in the expert profiles, and moreover, we propose three measures to predict the amount of drift occurring in an experts profile. Finally, we suggest and evaluate ways of enhancing query expansion in expert search using our new insights. Our results show that, once topic drift has been anticipated, query expansion can be successfully applied in a general manner in the expert search task.

international acm sigir conference on research and development in information retrieval | 2012

On building a reusable Twitter corpus

Richard McCreadie; Ian Soboroff; Jimmy J. Lin; Craig Macdonald; Iadh Ounis; Dean McCullough

The Twitter real-time information network is the subject of research for information retrieval tasks such as real-time search. However, so far, reproducible experimentation on Twitter data has been impeded by restrictions imposed by the Twitter terms of service. In this paper, we detail a new methodology for legally building and distributing Twitter corpora, developed through collaboration between the Text REtrieval Conference (TREC) and Twitter. In particular, we detail how the first publicly available Twitter corpus - referred to as Tweets2011 - was distributed via lists of tweet identifiers and specialist tweet crawling software. Furthermore, we analyse whether this distribution approach remains robust over time, as tweets in the corpus are removed either by users or Twitter itself. Tweets2011 was successfully used by 58 participating groups for the TREC 2011 Microblog track, while our results attest to the robustness of the crawling methodology over time.

Explore More