Fergus Toolan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fergus Toolan is active.

Explore More

Publication

Featured researches published by Fergus Toolan.

international acm sigir conference on research and development in information retrieval | 2006

ProbFuse: a probabilistic approach to data fusion

David Lillis; Fergus Toolan; Rem W. Collier; John Dunnion

Data fusion is the combination of the results of independent searches on a document collection into one single output result set. It has been shown in the past that this can greatly improve retrieval effectiveness over that of the individual results.This paper presents probFuse, a probabilistic approach to data fusion. ProbFuse assumes that the performance of the individual input systems on a number of training queries is indicative of their future performance. The fused result set is based on probabilities of relevance calculated during this training process. Retrieval experiments using data from the TREC ad hoc collection demonstrate that probFuse achieves results superior to that of the popular CombMNZ fusion algorithm.

2010 eCrime Researchers Summit | 2010

Feature selection for Spam and Phishing detection

Fergus Toolan; Joe Carthy

Unsolicited Bulk Email (UBE) has become a large problem in recent years. The number of mass mailers in existence is increasing dramatically. Automatically detecting UBE has become a vital area of current research. Many email clients (such as Outlook and Thunderbird) already have junk filters built in. Mass mailers are continually evolving and overcoming some of the junk filters. This means that the need for research in the area is ongoing. Many existing techniques seem to randomly choose the features that will be used for classification. This paper aims to address this issue by investigating the utility of over 40 features that have been used in recent literature. Information gain for these features are calculated over Ham, Spam and Phishing corpora.

2009 eCrime Researchers Summit | 2009

Phishing detection using classifier ensembles

Fergus Toolan; Joe Carthy

This paper introduces an approach to classifying emails into Phishing / non-Phishing categories using the C5.0 algorithm which achieves very high precision and an ensemble of other classifiers that achieve high recall. The representation of instances used in this paper is very small consisting of only five features. Results of an evaluation of this system, using over 8,000 emails approximately half of which were phishing emails and the remainder legitimate, are presented. These results show the benefits of using this recall boosting technique over that of any individual classifier or collection of classifiers.

web information systems engineering | 2002

Mining web logs for personalized site maps

Fergus Toolan; Nicholas Kusmerick

Navigating through a large Web site can be a frustratingexercise. Many sites employ Site Maps to help visitorsunderstand the overall structure of the site. However, bytheir very nature, unpersonalized Site Maps show most visitorslarge amounts of irrelevant content. We propose techniquesbased on Web usage mining to deliver PersonalizedSite Maps that are specialized to the interests of each individualvisitor. The key challenge is to resolve the tension betweensimplicity (showing just relevant content), and comprehensibility(showing sufficient context so that the visitorscan understand how the content is related to the overallstructure of the site). We develop two baseline algorithms(one that displays just shortest paths, and one that minesthe server log for popular paths), and compare them to anovel approach that mines the server log for popular pathfragments that can be dynamically assembled to reconstructpopular paths. Our experiments with two large Web sitesconfirm that the mined path fragments provide much bettercoverage of visitors sessions than the baseline approach ofmining entire paths.

Information Security Technical Report | 2011

The threats of social networking: Old wine in new bottles?

George R. S. Weir; Fergus Toolan; Duncan N. Smeed

Despite the many potential benefits to its users, social networking appears to provide a rich setting for criminal activities and other misdeeds. In this paper we consider whether the risks of social networking are unique and novel to this context. Having considered the nature and range of applications to which social networks may be applied, we conclude that there are no exploits or fundamental threats inherent to the social networking setting. Rather, the risks and associated threats treat this communicative and social context as an enabler for existing, long established and well-recognised exploits and activities.

Artificial Intelligence Review | 2006

Probability-based fusion of information retrieval result sets

David Lillis; Fergus Toolan; Angel Mur; Liu Peng; Rem W. Collier; John Dunnion

Information Retrieval (IR) forms the basis of many information management tasks. Information management itself has become an extremely important area as the amount of electronically available information increases dramatically. There are numerous methods of performing the IR task both by utilising different techniques and through using different representations of the information available to us. It has been shown that some algorithms outperform others on certain tasks. Combining the results produced by different algorithms has resulted in superior retrieval performance and this has become an important research area. This paper introduces a probability-based fusion technique probFuse that shows initial promise in addressing this question. It also compares probFuse with the common CombMNZ data fusion technique.

international acm sigir conference on research and development in information retrieval | 2010

Estimating probabilities for effective data fusion

David Lillis; Lusheng Zhang; Fergus Toolan; Rem W. Collier; David Leonard; John Dunnion

Data Fusion is the combination of a number of independent search results, relating to the same document collection, into a single result to be presented to the user. A number of probabilistic data fusion models have been shown to be effective in empirical studies. These typically attempt to estimate the probability that particular documents will be relevant, based on training data. However, little attempt has been made to gauge how the accuracy of these estimations affect fusion performance. The focus of this paper is twofold: firstly, that accurate estimation of the probability of relevance results in effective data fusion; and secondly, that an effective approximation of this probability can be made based on less training data that has previously been employed. This is based on the observation that the distribution of relevant documents follows a similar pattern in most high-quality result sets. Curve fitting suggests that this can be modelled by a simple function that is less complex than other models that have been proposed. The use of existing IR evaluation metrics is proposed as a substitution for probability calculations. Mean Average Precision is used to demonstrate the effectiveness of this approach, with evaluation results demonstrating competitive performance when compared with related algorithms with more onerous requirements for training data.

european conference on information retrieval | 2008

Extending probabilistic data fusion using sliding windows

David Lillis; Fergus Toolan; Rem W. Collier; John Dunnion

Recent developments in the field of data fusion have seen a focus on techniques that use training queries to estimate the probability that various documents are relevant to a given query and use that information to assign scores to those documents on which they are subsequently ranked. This paper introduces SlideFuse, which builds on these techniques, introducing a sliding window in order to compensate for situations where little relevance information is available to aid in the estimation of probabilities. SlideFuse is shown to perform favourably in comparison with CombMNZ, ProbFuse and SegFuse. CombMNZ is the standard baseline technique against which data fusion algorithms are compared whereas ProbFuse and SegFuse represent the state-of-the-art for probabilistic data fusion methods.

Lecture Notes in Computer Science | 2005

A self-configuring agent-based document indexing system

Liu Peng; Rem W. Collier; Angel Mur; David Lillis; Fergus Toolan; John Dunnion

This paper describes an extensible and scalable approach to indexing documents that is utilized within the Highly Organised Team of Agents for Information Retrieval (HOTAIR) architecture.

Artificial Intelligence Review | 2006

Probabilistic data fusion on a large document collection

David Lillis; Fergus Toolan; Rem W. Collier; John Dunnion

Data fusion is the process of combining the output of a number of Information Retrieval (IR) algorithms into a single result set, to achieve greater retrieval performance. ProbFuse is a data fusion algorithm that uses the history of the underlying IR algorithms to estimate the probability that subsequent result sets include relevant documents in particular positions. It has been shown to out-perform CombMNZ, the standard data fusion algorithm against which to compare performance, in a number of previous experiments. This paper builds upon this previous work and applies probFuse to the much larger Web Track document collection from the 2004 Text REtreival Conference. The performance of probFuse is compared against that of CombMNZ using a number of evaluation measures and is shown to achieve substantial performance improvements.

Explore More