Colin Bellinger | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Colin Bellinger is active.

Explore More

Publication

Featured researches published by Colin Bellinger.

international conference on machine learning and applications | 2012

One-Class versus Binary Classification: Which and When?

Colin Bellinger; Shiven Sharma; Nathalie Japkowicz

Binary classifiers have typically been the norm for building classification models in the Machine Learning community. However, an alternate to binary classification is one-class classification, which aims to build models using only a single class of data. This is particularly useful when there is an over-abundance of data of a particular class. In such imbalanced cases, binary classifiers may not perform very well, and one-class classifiers then become the viable option. In this paper, we are interested in investigating the performance of binary and one-class classifiers as the level of imbalance increases, and, thus, uncertainty in the second class. Our objective is to gain insight into which classification paradigm becomes more suitable as imbalance and uncertainty increase. To this end, we conduct experiments on various datasets, both artificial and from the UCI repository, and monitor the performance of the binary and one-class classifiers as the size of the second class gradually decreases, thus increasing the level of imbalance. The results show that as the level of imbalance increases, the performance of binary classifiers decreases, whereas one-class classifiers stay relatively stable.

computational intelligence and security | 2012

Anomaly detection in gamma ray spectra: A machine learning perspective

Shiven Sharma; Colin Bellinger; Nathalie Japkowicz; Rodney Berg; R. Kurt Ungar

With Canadian security and the safety of the general public in mind, physicists at Health Canada (HC) have begun to develop techniques to identify persons concealing radioactive material that may represent a threat to attendees at public gatherings, such as political proceedings and sporting events. To this end, Health Canada has initiated field trials that include the deployment of gamma-ray spectrometers. In particular, a series of these detectors, which take measurements every minute and produce 1,024 channel gamma-ray spectrum, were deployed during the Vancouver 2010 olympics. Simple computerized statistics and human expertise were used as the primary line of defence. More specifically, if a measured spectrum deviated significantly from the background, an internal alarm was sounded and an HC physicist undertook further analysis into the nature of the alarming spectrum. This strategy, however, lead to a significant number of costly and time consuming false positives. This research applies sophisticated machine learning algorithms to reduce the number of false positives to an acceptable level, the results of which are detailed in this paper. In addition, we emphasize the primary findings of our work and highlight avenues available to further improve upon our current results.

canadian conference on artificial intelligence | 2012

Clustering based one-class classification for compliance verification of the comprehensive nuclear-test-ban treaty

Shiven Sharma; Colin Bellinger; Nathalie Japkowicz

Monitoring the levels of radioxenon isotopes in the atmosphere has been proposed as a means of verifying the Comprehensive Nuclear-Test-Ban Treaty (CTBT). This translates into a classification problem, whereby the measured concentrations either belong to an explosion class or a background class. Instances drawn from the explosions class are extremely rare, if not non-existent. Therefore, the resulting dataset is extremely imbalanced, and inherently suited for one-class classification. Further exacerbating the problem is the fact that the background distribution can be extremely complex, and thus, modelling it using one-class learning is difficult. In order to improve upon the previous classification results, we investigate the augmentation of one-class learning methods with clustering. The purpose of clustering is to convert a complex distribution into simpler distributions, the clusters, over which more effective models can be built. The resulting model, built from one-class learners trained over the clusters, performs more effectively than a model that is built over the original distribution. This thesis is empirically tested on three different data domains; in particular, a number of artificial datasets, datasets from the UCI repository, and data modelled after the extremely challenging CTBT. The results offer credence to the fact that there is an improvement in performance when clustering is used with one-class classification on complex distributions.

international conference on machine learning and applications | 2015

Synthetic Oversampling for Advanced Radioactive Threat Detection

Colin Bellinger; Nathalie Japkowicz; Chris Drummond

Gamma-ray spectral classification requires the automatic identification of a large background class and a small minority class composed of instances that may pose a risk to humans and the environment. Accurate classification of such instances is required in a variety of domains, spanning event and port security to national monitoring for failures at industrial nuclear facilities. This work proposes a novel form of synthetic oversampling based on artificial neural network architecture and empirically demonstrates that it is superior to the state-of-the-art in synthetic oversampling on the target domain. In particular, we utilize gamma-ray spectral data collected for security purposes at the Vancouver 2010 winter Olympics and on a node of Health Canadas national monitoring networks.

international conference on machine learning and applications | 2015

Active Learning for One-Class Classification

Vincent Barnabé-Lortie; Colin Bellinger; Nathalie Japkowicz

Active learning is a common solution for reducing labeling costs and maximizing the impact of human labeling efforts in binary and multi-class classification settings. However, when we are faced with extreme levels of class imbalance, a situation in which it is not safe to assume that we have a representative sample of the minority class, it has been shown effective to replace the binary classifiers with a one-class classifiers. In such a setting, traditional active learning methods, and many previously proposed in the literature for one-class classifiers, prove to be inappropriate, as they rely on assumptions about the data that no longer stand. In this paper, we propose a novel approach to active learning designed for one-class classification. The proposed method does not rely on many of the inappropriate assumptions of its predecessors and leads to more robust classification performance. The gist of this method consists of labeling, in priority, the instances considered to fit the learned class the least by previous iterations of a one-class classification model. We provide empirical evidence for the merits of the proposed method compared to the available alternatives, and discuss how the method may have an impact in an applied setting.

computational intelligence and security | 2011

Motivating the inclusion of meteorological indicators in the CTBT feature-space

Colin Bellinger; Nathalie Japkowicz

Verification of the Comprehensive Test-Ban-Treaty (CTBT), as a Pattern Recognition (PR) problem, has been proposed based on four radioxenon features. It has been noted, however, that in many cases this limited feature set is insufficient to distinguish radioxenon levels effected by an explosion from those that are solely products of industrial activities. As a means of improving the detectability of low-yield clandestine nuclear explosions, this paper motivates the inclusion of meteorological indicators in the CTBT feature-space, promotes further research into which meteorological indicators are most informative, and how they may be acquired. In doing so, we present classification results from four simulated scenarios. These results demonstrate that the inclusion of a simple wind direction feature can significantly increase the prospect of classifying challenging detonation events, and suggests the predictive power of meteorological features in general.

BMC Public Health | 2017

A systematic review of data mining and machine learning for air pollution epidemiology

Colin Bellinger; Mohomed Shazan Mohomed Jabbar; Osmar R. Zaïane; Alvaro Osornio-Vargas

BackgroundData measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology.MethodsWe conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed.ResultsOur search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology.ConclusionsWe carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology.The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.

european conference on machine learning | 2016

Beyond the Boundaries of SMOTE

Colin Bellinger; Chris Drummond; Nathalie Japkowicz

Problems of class imbalance appear in diverse domains, ranging from gene function annotation to spectra and medical classification. On such problems, the classifier becomes biased in favour of the majority class. This leads to inaccuracy on the important minority classes, such as specific diseases and gene functions. Synthetic oversampling mitigates this by balancing the training set, whilst avoiding the pitfalls of random under and oversampling. The existing methods are primarily based on the SMOTE algorithm, which employs a bias of randomly generating points between nearest neighbours. The relationship between the generative bias and the latent distribution has a significant impact on the performance of the induced classifier. Our research into gamma-ray spectra classification has shown that the generative bias applied by SMOTE is inappropriate for domains that conform to the manifold property, such as spectra, text, image and climate change classification. To this end, we propose a framework for manifold-based synthetic oversampling, and demonstrate its superiority in terms of robustness to the manifold with respect to the AUC on three spectra classification tasks and 16 UCI datasets.

international conference on machine learning and applications | 2015

Multi-label Classification of Anemia Patients

Colin Bellinger; Ali Amid; Nathalie Japkowicz; Herna Victor

This work examines the application of machine learning to an important area of medicine which aims to diagnose paediatric patients with β-thalassemia minor, iron deficiency anemia or the co-occurrence of these ailments. Iron deficiency anemia is a major cause of microcytic anemia and is considered an important task in global health. Whilst existing methods, based on linear equations, are proficient at distinguishing between the two classes of anemia, they fail to identify the co-occurrence of this issues. Machine learning algorithms, however, can induce non-linear decision boundaries that enable accurate classification within complex domains. Through a multi-label classification technique, known as problem transformations, we convert the learning task to one that is appropriate for machine learning and examine the effectiveness of machine learning algorithms on this domain. Our results show that machine learning classifiers produce good overall accuracy and are able to identify instances of the co-occurrence class unlike the existing methods.

computational intelligence and security | 2014

Smoothing gamma ray spectra to improve outlier detection

Vincent Barnabé-Lortie; Colin Bellinger; Nathalie Japkowicz

Rapid detection of radioisotopes in gamma-ray data can, in some situations, be an important security concern. The task of designing an automated system for this purpose is complex due to, amongst other factors, the noisy nature of the data. The method described herein consists of preprocessing the data by applying a smoothing method tailored to gamma ray spectra, hoping that this should decrease their variance. Given that the number of counts at a given energy level in a spectrum should follow a Poisson distribution, smoothing may allow us to estimate the true photon arrival rate. Our experiments suggest that the added data preprocessing step can have large impact on the performance of anomaly detection algorithms on this particular domain.

Explore More