Is this you? Create Your Porfile

Graham J. Williams

Commonwealth Scientific and Industrial Research Organisation

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Graham J. Williams is active.

Explore More

Publication

Featured researches published by Graham J. Williams.

Data Mining and Knowledge Discovery | 2004

On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms

Kenji Yamanishi; Jun'ichi Takeuchi; Graham J. Williams; Peter Milne

Outlier detection is a fundamental issue in data mining, specifically in fraud detection, network intrusion detection, network monitoring, etc. SmartSifter is an outlier detection engine addressing this problem from the viewpoint of statistical learning theory. This paper provides a theoretical basis for SmartSifter and empirically demonstrates its effectiveness. SmartSifter detects outliers in an on-line process through the on-line unsupervised learning of a probabilistic model (using a finite mixture model) of the information source. Each time a datum is input SmartSifter employs an on-line discounting learning algorithm to learn the probabilistic model. A score is given to the datum based on the learned model with a high score indicating a high possibility of being a statistical outlier. The novel features of SmartSifter are: (1) it is adaptive to non-stationary sources of data; (2) a score has a clear statistical/information-theoretic meaning; (3) it is computationally inexpensive; and (4) it can handle both categorical and continuous variables. An experimental application to network intrusion detection shows that SmartSifter was able to identify data with high scores that corresponded to attacks, with low computational costs. Further experimental application has identified a number of meaningful rare cases in actual health insurance pathology data from Australias Health Insurance Commission.

data warehousing and knowledge discovery | 2002

Outlier Detection Using Replicator Neural Networks

Simon Hawkins; Hongxing He; Graham J. Williams; Rohan A. Baxter

We consider the problem of finding outliers in large multivariate databases. Outlier detection can be applied during the data cleansing process of data mining to identify problems with the data itself, and to fraud detection where groups of outliers are often of particular interest. We use replicator neural networks (RNNs) to provide a measure of the outlyingness of data records. The performance of the RNNs is assessed using a ranked score measure. The effectiveness of the RNNs for outlier detection is demonstrated on two publicly available databases.

knowledge discovery and data mining | 2000

On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms

Kenji Yamanishi; Jun'ichi Takeuchi; Graham J. Williams; Peter Milne

international conference on data mining | 2002

A comparative study of RNN for outlier detection in data mining

Graham J. Williams; Rohan A. Baxter; Hongxing He; Simon Hawkins; Lifang Gu

We have proposed replicator neural networks (RNNs) for outlier detection. We compare RNN for outlier detection with three other methods using both publicly available statistical datasets (generally small) and data mining datasets (generally much larger and generally real data). The smaller datasets provide insights into the relative strengths and weaknesses of RNNs. The larger datasets in particular test scalability and practicality of application.

knowledge discovery and data mining | 2005

Mining risk patterns in medical data

Jiuyong Li; Ada Wai-Chee Fu; Hongxing He; Jie Chen; Huidong Jin; Damien McAullay; Graham J. Williams; Ross Sparks; Chris Kelman

In this paper, we discuss a problem of finding risk patterns in medical data. We define risk patterns by a statistical metric, relative risk, which has been widely used in epidemiological research. We characterise the problem of mining risk patterns as an optimal rule discovery problem. We study an anti-monotone property for mining optimal risk pattern sets and present an algorithm to make use of the property in risk pattern discovery. The method has been applied to a real world data set to find patterns associated with an allergic event for ACE inhibitors. The algorithm has generated some useful results for medical researchers.

pacific-asia conference on knowledge discovery and data mining | 2004

Temporal Sequence Associations for Rare Events

Jie Chen; Hongxing He; Graham J. Williams; Huidong Jin

In many real world applications, systematic analysis of rare events, such as credit card frauds and adverse drug reactions, is very important. Their low occurrence rate in large databases often makes it difficult to identify the risk factors from straightforward application of associations and sequential pattern discovery. In this paper we introduce a heuristic to guide the search for interesting patterns associated with rare events from large temporal event sequences. Our approach combines association and sequential pattern discovery with a measure of risk borrowed from epidemiology to assess the interestingness of the discovered patterns. In the experiments, we successfully identify a known drug and several new drug combinations with high risk of adverse reactions. The approach is also applicable to other applications where rare events are of primary interest.

australian joint conference on artificial intelligence | 1997

Mining the knowledge mine

Graham J. Williams; Zhexue Huang

As databases grow in size and complexity the task of adding value to the wealth of data becomes difficult. Data mining has emerged as the technology to add value to enormous databases by finding new and important snippets (or nuggets) of knowledge. With large training sets, however, extremely large collections of nuggets are being extracted, leading to much “fools gold” amongst which to fossick for the real gold. Attention is now being directed towards the problem of how to better focus on the most precious nuggets. This paper presents the hot spots methodology, adopting a multi-strategy and interactive approach to help focus on the important nuggets. The methodology first performs data mining and then explores the resulting models to find the important nuggets contained therein. This approach is demonstrated in insurance and fraud applications.

pacific asia conference on knowledge discovery and data mining | 2001

Feature Selection for Temporal Health Records

Rohan A. Baxter; Graham J. Williams; Hongxing He

In this paper we consider three alternative feature vector representations of patient health records. The longitudinal (temporal), irregular character of patient episode history, an integral part of a health record, provides some challenges in applying data mining techniques. The present application involves episode history of monitoring services for elderly patients with diabetes. The application task was to examine patterns of monitoring services for patients. This was approached by clustering patients into groups receiving similar patterns of care and visualising the features devised to highlight interesting patterns of care.

australasian joint conference on artificial intelligence | 2003

Association Rule Discovery with Unbalanced Class Distributions

Lifang Gu; Jiuyong Li; Hongxing He; Graham J. Williams; Simon Hawkins; Chris Kelman

There are many methods for finding association rules in very large data. However it is well known that most general association rule discovery methods find too many rules, many of which are uninteresting rules. Furthermore, the performances of many such algorithms deteriorate when the minimum support is low. They fail to find many interesting rules even when support is low, particularly in the case of significantly unbalanced classes. In this paper we present an algorithm which finds association rules based on a set of new interestingness criteria. The algorithm is applied to a real-world health data set and successfully identifies groups of patients with high risk of adverse reaction to certain drugs. A statistically guided method of selecting appropriate features has also been developed. Initial results have shown that the proposed algorithm can find interesting patterns from data sets with unbalanced class distributions without performance loss.

international conference on knowledge based and intelligent information and engineering systems | 2005

Representing association classification rules mined from health data

Jie Chen; Hongxing He; Jiuyong Li; Huidong Jin; Damien McAullay; Graham J. Williams; Ross Sparks; Chris Kelman

An association classification algorithm has been developed to explore adverse drug reactions in a large medical transaction dataset with unbalanced classes. Rules discovered can be used to alert medical practitioners when prescribing drugs, to certain categories of patients, to potential adverse effects. We assess the rules using survival charts and propose two kinds of probability trees to present them. Both of them represent the risk of given adverse drug reaction for certain categories of patients in terms of risk ratios, which are familiar to medical practitioners. The first approach shows risk ratios when all rule conditions apply. The second presents the risk associated with a single risk factor with other parts of the rule identifying the cohort of the patient subpopulation. Thus, the probability trees can present clearly the risk of specific adverse drug reactions to prescribers.

Explore More