Richard McCreadie | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Richard McCreadie is active.

Explore More

Publication

Featured researches published by Richard McCreadie.

international acm sigir conference on research and development in information retrieval | 2012

On building a reusable Twitter corpus

Richard McCreadie; Ian Soboroff; Jimmy J. Lin; Craig Macdonald; Iadh Ounis; Dean McCullough

The Twitter real-time information network is the subject of research for information retrieval tasks such as real-time search. However, so far, reproducible experimentation on Twitter data has been impeded by restrictions imposed by the Twitter terms of service. In this paper, we detail a new methodology for legally building and distributing Twitter corpora, developed through collaboration between the Text REtrieval Conference (TREC) and Twitter. In particular, we detail how the first publicly available Twitter corpus - referred to as Tweets2011 - was distributed via lists of tweet identifiers and specialist tweet crawling software. Furthermore, we analyse whether this distribution approach remains robust over time, as tweets in the corpus are removed either by users or Twitter itself. Tweets2011 was successfully used by 58 participating groups for the TREC 2011 Microblog track, while our results attest to the robustness of the crawling methodology over time.

Information Processing and Management | 2012

MapReduce indexing strategies: Studying scalability and efficiency

Richard McCreadie; Craig Macdonald; Iadh Ounis

In Information Retrieval (IR), the efficient indexing of terabyte-scale and larger corpora is still a difficult problem. MapReduce has been proposed as a framework for distributing data-intensive operations across multiple processing machines. In this work, we provide a detailed analysis of four MapReduce indexing strategies of varying complexity. Moreover, we evaluate these indexing strategies by implementing them in an existing IR framework, and performing experiments using the Hadoop MapReduce implementation, in combination with several large standard TREC test corpora. In particular, we examine the efficiency of the indexing strategies, and for the most efficient strategy, we examine how it scales with respect to corpus size, and processing power. Our results attest to both the importance of minimising data transfer between machines for IO intensive tasks like indexing, and the suitability of the per-posting list MapReduce indexing strategy, in particular for indexing at a terabyte-scale. Hence, we conclude that MapReduce is a suitable framework for the deployment of large-scale indexing.

international conference on big data | 2013

Scalable distributed event detection for Twitter

Richard McCreadie; Craig Macdonald; Iadh Ounis; Miles Osborne; Sasa Petrovic

Social media streams, such as Twitter, have shown themselves to be useful sources of real-time information about what is happening in the world. Automatic detection and tracking of events identified in these streams have a variety of real-world applications, e.g. identifying and automatically reporting road accidents for emergency services. However, to be useful, events need to be identified within the stream with a very low latency. This is challenging due to the high volume of posts within these social streams. In this paper, we propose a novel event detection approach that can both effectively detect events within social streams like Twitter and can scale to thousands of posts every second. Through experimentation on a large Twitter dataset, we show that our approach can process the equivalent to the full Twitter Firehose stream, while maintaining event detection accuracy and outperforming an alternative distributed event detection system.

meeting of the association for computational linguistics | 2014

Real-Time Detection, Tracking, and Monitoring of Automatically Discovered Events in Social Media

Miles Osborne; Sean Moran; Richard McCreadie; Alexander von Lünen; Martin D. Sykora; Elizabeth Cano; Neil Ireson; Craig Macdonald; Iadh Ounis; Yulan He; Thomas W. Jackson; Fabio Ciravegna; Ann O'Brien

We introduce ReDites, a system for realtime event detection, tracking, monitoring and visualisation. It is designed to assist Information Analysts in understanding and exploring complex events as they unfold in the world. Events are automatically detected from the Twitter stream. Then those that are categorised as being security-relevant are tracked, geolocated, summarised and visualised for the end-user. Furthermore, the system tracks changes in emotions over events, signalling possible flashpoints or abatement. We demonstrate the capabilities of ReDites using an extended use case from the September 2013 Westgate shooting incident. Through an evaluation of system latencies, we also show that enriched events are made available for users to explore within seconds of that event occurring.

international acm sigir conference on research and development in information retrieval | 2009

On single-pass indexing with MapReduce

Richard McCreadie; Craig Macdonald; Iadh Ounis

Indexing is an important Information Retrieval (IR) operation, which must be parallelised to support large-scale document corpora. We propose a novel adaptation of the state-of-the-art single-pass indexing algorithm in terms of the MapReduce programming model. We then experiment with this adaptation, in the context of the Hadoop MapReduce implementation. In particular, we explore the scale of improvements that can be achieved when using firstly more processing hardware and secondly larger corpora. Our results show that indexing speed increases in a close to linear fashion when scaling corpus size or number of processing machines. This suggests that the proposed indexing implementation is viable to support upcoming large-scale corpora.

international acm sigir conference on research and development in information retrieval | 2012

Exploiting term dependence while handling negation in medical search

Nut Limsopatham; Craig Macdonald; Richard McCreadie; Iadh Ounis

In medical records, negative qualifiers, e.g. no or without, are commonly used by health practitioners to identify the absence of a medical condition. Without considering whether the term occurs in a negative or positive context, the sole presence of a query term in a medical record is insufficient to imply that the record is relevant to the query. In this paper, we show how to effectively handle such negation within a medical records information retrieval system. In particular, we propose a term representation that tackles negated language in medical records, which is further extended by considering the dependence of negated query terms. We evaluate our negation handling technique within the search task provided by the TREC Medical Records 2011 track. Our results, which show a significant improvement upon a system that does not consider negated context within records, attest the importance of handling negation.

Information Retrieval | 2013

Identifying top news using crowdsourcing

Richard McCreadie; Craig Macdonald; Iadh Ounis

The influential Text REtrieval Conference (TREC) retrieval conference has always relied upon specialist assessors or occasionally participating groups to create relevance judgements for the tracks that it runs. Recently however, crowdsourcing has been championed as a cheap, fast and effective alternative to traditional TREC-like assessments. In 2010, TREC tracks experimented with crowdsourcing for the very first time. In this paper, we report our successful experience in creating relevance assessments for the TREC Blog track 2010 top news stories task using crowdsourcing. In particular, we crowdsourced both real-time newsworthiness assessments for news stories as well as traditional relevance assessments for blog posts. We conclude that crowdsourcing not only appears to be a feasible, but also cheap and fast means to generate relevance assessments. Furthermore, we detail our experiences running the crowdsourced evaluation of the TREC Blog track, discuss the lessons learned, and provide best practices.

cross language evaluation forum | 2014

Comparing Algorithms for Microblog Summarisation

Stuart Mackie; Richard McCreadie; Craig Macdonald; Iadh Ounis

Event detection and tracking using social media and user-generated content has received a lot of attention from the research community in recent years, since such sources can purportedly provide up-to-date information about events as they evolve, e.g. earthquakes. Concisely reporting (summarising) events for users/emergency services using information obtained from social media sources like Twitter is not a solved problem. Current systems either directly apply, or build upon, classical summarisation approaches previously shown to be effective within the newswire domain. However, to-date, research into how well these approaches generalise from the newswire to the microblog domain is limited. Hence, in this paper, we compare the performance of eleven summarisation approaches using four microblog summarisation datasets, with the aim of determining which are the most effective and therefore should be used as baselines in future research. Our results indicate that the SumBasic algorithm and Centroid-based summarisation with redundancy reduction are the most effective approaches, across the four datasets and five automatic summarisation evaluation measures tested.

international acm sigir conference on research and development in information retrieval | 2016

EAIMS: Emergency Analysis Identification and Management System

Richard McCreadie; Craig Macdonald; Iadh Ounis

Social media has great potential as a means to enable civil protection and law enforcement agencies to more effectively tackle disasters and emergencies. However, there is currently a lack of tools that enable civil protection agencies to easily make use of social media. The Emergency Analysis Identification and Management System (EAIMS) is a prototype service that provides real-time detection of emergency events, related information finding and credibility analysis tools for use over social media during emergencies. This system exploits machine learning over data gathered from past emergencies and disasters to build effective models for identifying new events as they occur, tracking developments within those events and analyzing those developments for the purposes of enhancing the decision making processes of emergency response agencies.

international acm sigir conference on research and development in information retrieval | 2015

Comparing Approaches for Query Autocompletion

Giovanni Di Santo; Richard McCreadie; Craig Macdonald; Iadh Ounis

Within a search engine, query auto-completion aims to predict the final query the user wants to enter as they type, with the aim of reducing query entry time and potentially preparing the search results in advance of query submission. There are a large number of approaches to automatically rank candidate queries for the purposes of auto-completion. However, no study exists that compares these approaches on a single dataset. Hence, in this paper, we present a comparison study between current approaches to rank candidate query completions for the user query as it is typed. Using a query-log and document corpus from a commercial medical search engine, we study the performance of 11 candidate query ranking approaches from the literature and analyze where they are effective. We show that the most effective approaches to query auto-completion are largely dependent on the number of characters that the user has typed so far, with the most effective approach differing for short and long prefixes. Moreover, we show that if personalized information is available about the searcher, this additional information can be used to more effectively rank query candidate completions, regardless of the prefix length.

Explore More