Brian Mac Namee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brian Mac Namee is active.

Explore More

Publication

Featured researches published by Brian Mac Namee.

the florida ai research society | 2010

Handling Concept Drift in a Text Data Stream Constrained by High Labelling Cost

Patrick Lindstrom; Sarah Jane Delany; Brian Mac Namee

In many real-world classification problems the concept being modelled is not static but rather changes over time - a situation known as concept drift. Most techniques for handling concept drift rely on the true classifications of test instances being available shortly after classification so that classifiers can be retrained to handle the drift. However, in applications where labelling instances with their true class has a high cost this is not reasonable. In this paper we present an approach for keeping a classifier up-to-date in a concept drift domain which is constrained by a high cost of labelling. We use an active learning type approach to select those examples for labelling that are most useful in handling changes in concept. We show how this approach can adequately handle concept drift in a text filtering scenario requiring just 15% of the documents to be manually categorised and labelled.

Evolving Systems | 2013

Drift detection using uncertainty distribution divergence

Patrick Lindstrom; Brian Mac Namee; Sarah Jane Delany

Data generated from naturally occurring processes tends to be non-stationary. For example, seasonal and gradual changes in climate data and sudden changes in financial data. In machine learning the degradation in classifier performance due to such changes in the data is known as concept drift and there are many approaches to detecting and handling it. Most approaches to detecting concept drift, however, make the assumption that true classes for test examples will be available at no cost shortly after classification and base the detection of concept drift on measures relying on these labels. The high labelling cost in many domains provides a strong motivation to reduce the number of labelled instances required to detect and handle concept drift. Triggered detection approaches that do not require labelled instances to detect concept drift show great promise for achieving this. In this paper we present Confidence Distribution Batch Detection, an approach that provides a signal correlated to changes in concept without using labelled data. This signal combined with a trigger and a rebuild policy can maintain classifier accuracy which, in most cases, matches the accuracy achieved using classification error based detection techniques but using only a limited amount of labelled data.

AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science | 2009

Learning without default: a study of one-class classification and the low-default portfolio problem

Kenneth Kennedy; Brian Mac Namee; Sarah Jane Delany

This paper asks at what level of class imbalance one-class classifiers outperform two-class classifiers in credit scoring problems in which class imbalance, referred to as the low-default portfolio problem, is a serious issue. The question is answered by comparing the performance of a variety of one-class and two-class classifiers on a selection of credit scoring datasets as the class imbalance is manipulated. We also include random oversampling as this is one of the most common approaches to addressing class imbalance. This study analyses the suitability and performance of recognised two-class classifiers and one-class classifiers. Based on our study we conclude that the performance of the two-class classifiers deteriorates proportionally to the level of class imbalance. The two-class classifiers outperform one-class classifiers with class imbalance levels down as far as 15% (i.e. the imbalance ratio of minority class to majority class is 15:85). The one-class classifiers, whose performance remains unvaried throughout, are preferred when the minority class constitutes approximately 2% or less of the data. Between an imbalance of 2% to 15% the results are not as conclusive. These results show that one-class classifiers could potentially be used as a solution to the low-default portfolio problem experienced in the credit scoring domain.

Expert Systems With Applications | 2014

Dynamic estimation of worker reliability in crowdsourcing for regression tasks: Making it work

Alexey Tarasov; Sarah Jane Delany; Brian Mac Namee

Abstract One of the biggest challenges in crowdsourcing is detecting noisy and incompetent workers. A possible way of handling this problem is to dynamically estimate the reliability of workers as they do work and accept only those workers who are deemed to be reliable to date. Although many approaches to dynamic estimation of rater reliability exist, they are often only appropriate for very specific categories of tasks, for example, only for binary classification. They also can make unrealistic assumptions such as requiring access to a large number of gold standard answers or relying on the constant availability of any rater. In this paper, we propose a novel approach to the dynamic estimation of rater reliability in regression (DER 3 ) using multi-armed bandits. This approach is specifically suited for real-life crowdsourcing scenarios, where the task at hand is labelling or rating corpora to be used in supervised machine learning, and the annotations are continuous ratings, although it can be easily generalised to multi-class or binary classification tasks. We demonstrate that DER 3 provides high-accuracy results and at the same time keeps the cost of the rating process low. Although our main motivating example is the recognition of emotion in speech, our approach shows similar results in other application areas.

the florida ai research society | 2010

Off to a Good Start: Using Clustering to Select the Initial Training Set in Active Learning

Rong Hu; Brian Mac Namee; Sarah Jane Delany

Active learning (AL) is used in textual classification to alleviate the cost of labelling documents for training. An important issue in AL is the selection of a representative sample of documents to label for the initial training set that seeds the process, and clustering techniques have been successfully used in this regard. However, the clustering techniques used are nondeterministic which causes inconsistent behaviour in the AL process. In this paper we first illustrate the problems associated with using non-deterministic clustering for initial training set selection in AL. We then examine the performance of three deterministic clustering techniques for this task and show that performance comparable to the non-deterministic approaches can be achieved without variations in behaviour.

Knowledge Based Systems | 2012

Profiling instances in noise reduction

Sarah Jane Delany; Nicola Segata; Brian Mac Namee

The dependency on the quality of the training data has led to significant work in noise reduction for instance-based learning algorithms. This paper presents an empirical evaluation of current noise reduction techniques, not just from the perspective of their comparative performance, but from the perspective of investigating the types of instances that they focus on for removal. A novel instance profiling technique known as RDCL profiling allows the structure of a training set to be analysed at the instance level categorising each instance based on modelling their local competence properties. This profiling approach offers the opportunity of investigating the types of instances removed by the noise reduction techniques that are currently in use in instance-based learning. The paper also considers the effect of removing instances with specific profiles from a dataset and shows that a very simple approach of removing instances that are misclassified by the training set and cause other instances in the dataset to be misclassified is an effective noise reduction technique.

international conference on case based reasoning | 2010

EGAL: exploration guided active learning for TCBR

Rong Hu; Sarah Jane Delany; Brian Mac Namee

The task of building labelled case bases can be approached using active learning (AL), a process which facilitates the labelling of large collections of examples with minimal manual labelling effort. The main challenge in designing AL systems is the development of a selection strategy to choose the most informative examples to manually label. Typical selection strategies use exploitation techniques which attempt to refine uncertain areas of the decision space based on the output of a classifier. Other approaches tend to balance exploitation with exploration, selecting examples from dense and interesting regions of the domain space. In this paper we present a simple but effective exploration-only selection strategy for AL in the textual domain. Our approach is inherently case-based, using only nearest-neighbour-based density and diversity measures. We show how its performance is comparable to the more computationally expensive exploitation-based approaches and that it offers the opportunity to be classifier independent.

computer games | 2010

Motion in augmented reality games: an engine for creating plausible physical interactions in augmented reality games

Brian Mac Namee; David Beaney; Qingqing Dong

The next generation of Augmented Reality (AR) games will require real and virtual objects to coexist in motion in immersive game environments. This will require the illusion that real and virtual objects interact physically together in a plausible way. The Motion in Augmented Reality Games (MARG) engine described in this paper has been developed to allow these kinds of game environments. The paper describes the design and implementation of the MARG engine and presents two proof-of-concept AR games that have been developed using it. Evaluations of these games have been performed and are presented to show that the MARG engine takes an important step in developing the next generation of motion-rich AR games.

international conference on data mining | 2011

Drift Detection Using Uncertainty Distribution Divergence

Patrick Lindstrom; Brian Mac Namee; Sarah Jane Delany

Concept drift is believed to be prevalent in most data gathered from naturally occurring processes and thus warrants research by the machine learning community. There are a myriad of approaches to concept drift handling which have been shown to handle concept drift with varying degrees of success. However, most approaches make the key assumption that the labelled data will be available at no labelling cost shortly after classification, an assumption which is often violated. The high labelling cost in many domains provides a strong motivation to reduce the number of labelled instances required to handle concept drift. Explicit detection approaches that do not require labelled instances to detect concept drift show great promise for achieving this. Our approach Confidence Distribution Batch Detection (CDBD) provides a signal correlated to changes in concept without using labelled data. We also show how this signal combined with a trigger and a rebuild policy can maintain classifier accuracy while using a limited amount of labelled data.

international conference on case based reasoning | 2010

CBTV: visualising case bases for similarity measure design and selection

Brian Mac Namee; Sarah Jane Delany

In CBR the design and selection of similarity measures is paramount. Selection can benefit from the use of exploratory visualisation-based techniques in parallel with techniques such as cross-validation accuracy comparison. In this paper we present the Case Base Topology Viewer (CBTV) which allows the application of different similarity measures to a case base to be visualised so that system designers can explore the case base and the associated decision boundary space. We show, using a range of datasets and similarity measure types, how the idiosyncrasies of particular similarity measures can be illustrated and compared in CBTV allowing CBR system designers to make more informed choices.

Explore More