David W. Patterson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David W. Patterson is active.

Explore More

Publication

Featured researches published by David W. Patterson.

Information Fusion | 2003

Ensemble feature selection with the simple Bayesian classification

Alexey Tsymbal; Seppo Puuronen; David W. Patterson

Abstract A popular method for creating an accurate classifier from a set of training data is to build several classifiers, and then to combine their predictions. The ensembles of simple Bayesian classifiers have traditionally not been a focus of research. One way to generate an ensemble of accurate and diverse simple Bayesian classifiers is to use different feature subsets generated with the random subspace method. In this case, the ensemble consists of multiple classifiers constructed by randomly selecting feature subsets, that is, classifiers constructed in randomly chosen subspaces. In this paper, we present an algorithm for building ensembles of simple Bayesian classifiers in random subspaces. The EFS_SBC algorithm includes a hill-climbing-based refinement cycle, which tries to improve the accuracy and diversity of the base classifiers built on random feature subsets. We conduct a number of experiments on a collection of 21 real-world and synthetic data sets, comparing the EFS_SBC ensembles with the single simple Bayes, and with the boosted simple Bayes. In many cases the EFS_SBC ensembles have higher accuracy than the single simple Bayesian classifier, and than the boosted Bayesian ensemble. We find that the ensembles produced focusing on diversity have lower generalization error, and that the degree of importance of diversity in building the ensembles is different for different data sets. We propose several methods for the integration of simple Bayesian classifiers in the ensembles. In a number of cases the techniques for dynamic integration of classifiers have significantly better classification accuracy than their simple static analogues. We suggest that a reason for that is that the dynamic integration better utilizes the ensemble coverage than the static integration.

Journal of Property Valuation and Investment | 1998

Neural networks: the prediction of residential values

Stanley McGreal; Alastair Adair; Dylan McBurney; David W. Patterson

The potential application of data mining techniques in the extraction of information from property data sets is discussed. Particular interest is focused upon neural networks in the valuation of residential property with an evaluation of their ability to predict. Model testing infers a wide variation in the range of outputs with best results for stratified market subsets, using postal code as a locational delimiter. The paper questions whether predicted outcomes are within the range of valuation acceptability and examines issues relating to potential biasing and repeatability of results.

multiple classifier systems | 2004

Dynamic Integration of Regression Models

Niall Rooney; David W. Patterson; Sarab S. Anand; Alexey Tsymbal

In this paper we adapt the recently proposed Dynamic Integration ensemble techniques for regression problems and compare their performance to the base models and to the popular ensemble technique of Stacked Regression. We show that the Dynamic Integration techniques are as effective for regression as Stacked Regression when the base models are simple. In addition, we demonstrate an extension to both Stacked Regression and Dynamic Integration to reduce the ensemble set in size and assess its effectiveness.

computer-based medical systems | 2005

Case-based tissue classification for monitoring leg ulcer healing

Mykola Galushka; Huiru Zheng; David W. Patterson; L. Bradley

The ability to automatically monitor the wound healing process would reduce the workload of professionals, provide standardization, reduce costs, and improve the quality of care for patients. Here we propose an automatic monitoring system for leg ulcers based on case-based reasoning. We focus on the first stage of the monitoring process in this work, that of tissue classification and examine a number of different feature extraction techniques based on texture and Red, Green, and Blue histograms. Results clearly show a case-based approach to be ideal for this type of task.

international conference on smart homes and health telematics | 2006

Temporal data mining for smart homes

Mykola Galushka; David W. Patterson; Niall Rooney

Temporal data mining is a relatively new area of research in computer science. It can provide a large variety of different methods and techniques for handling and analyzing temporal data generated by smart-home environments. Temporal data mining in general fits into a two level architecture, where initially a transformation technique reduces data dimensionality in the first level and indexing techniques provide efficient access to the data in the second level. This infrastructure of temporal data mining provides the basis for high-level data mining operations such as clustering, classification, rule discovery and prediction. These operations can form the basis for developing different smart-home applications, capable of addressing a number of situations occurring within this environment. This paper outlines the main temporal data mining techniques available and provides examples of where they can be applied within a smart home environment.

international conference of the ieee engineering in medicine and biology society | 2004

New protocol for leg ulcer tissue classification from colour images

Huiru Zheng; L. Bradley; David W. Patterson; Mykola Galushka; J. Winder

Measurement of wound healing status is very important for monitoring progress in individual patients. Tissue classification is a vital step in the development of an automatic measurement system for wound healing assessment. We present a new tissue classification protocol using the RGB (Red, Green and Blue) histogram distributions of pixel values from wound color images. These three histogram distributions (extracted features) were used as three two-dimensional (2D) input signals for classification. This protocol has been carried out using the KNN classifier and results show that the proposed protocol provides an extremely competent practical method for the classification of wound tissues.

european conference on information retrieval | 2004

Contextual Document Clustering

Vladimir Dobrynin; David W. Patterson; Niall Rooney

In this paper we present a novel algorithm for document clustering. This approach is based on distributional clustering where subject related words, which have a narrow context, are identified to form meta-tags for that subject. These contextual words form the basis for creating thematic clusters of documents. In a similar fashion to other research papers on document clustering, we analyze the quality of this approach with respect to document categorization problems and show it to outperform the information theoretic method of sequential information bottleneck.

Knowledge Based Systems | 2008

SOPHIA-TCBR: A knowledge discovery framework for textual case-based reasoning

David W. Patterson; Niall Rooney; Mykola Galushka; Vladimir Dobrynin; Elena Smirnova

In this paper, we present a novel textual case-based reasoning system called SOPHIA-TCBR which provides a means of clustering semantically related textual cases where individual clusters are formed through the discovery of narrow themes which then act as attractors for related cases. During this process, SOPHIA-TCBR automatically discovers appropriate case and similarity knowledge. It then is able to organize the cases within each cluster by forming a minimum spanning tree, based on their semantic similarity. SOPHIAs capability as a case-based text classifier is benchmarked against the well known and widely utilised k-Means approach. Results show that SOPHIA either equals or outperforms k-Means based on 2 different case-bases, and as such is an attractive approach for case-based classification. We demonstrate the quality of the knowledge discovery process by showing the high level of topic similarity between adjacent cases within the minimum spanning tree. We show that the formation of the minimum spanning tree makes it possible to identify a kernel region within the cluster, which has a higher level of similarity between cases than the cluster in its entirety, and that this corresponds directly to a higher level of topic homogeneity. We demonstrate that the topic homogeneity increases as the average semantic similarity between cases in the kernel increases. Finally having empirically demonstrated the quality of the knowledge discovery process in SOPHIA, we show how it can be competently applied to case-based retrieval.

Information Processing and Management | 2006

A scaleable document clustering approach for large document corpora

Niall Rooney; David W. Patterson; Mykola Galushka; Vladimir Dobrynin

In this paper, the scalability and quality of the contextual document clustering (CDC) approach is demonstrated for large data-sets using the whole Reuters Corpus Volume 1 (RCV1) collection. CDC is a form of distributional clustering, which automatically discovers contexts of narrow scope within a document corpus. These contexts act as attractors for clustering documents that are semantically related to each other. Once clustered, the documents are organized into a minimum spanning tree so that the topical similarity of adjacent documents within this structure can be assessed. The pre-defined categories from three different document category sets are used to assess the quality of CDC in terms of its ability to group and structure semantically related documents given the contexts. Quality is evaluated based on two factors, the category overlap between adjacent documents within a cluster, and how well a representative document categorizes all the other documents within a cluster. As the RCV1 collection was collated in a time ordered fashion, it was possible to assess the stability of clusters formed from documents within one time interval when presented with new unseen documents at subsequent time intervals. We demonstrate that CDC is a powerful and scaleable technique with the ability to create stable clusters of high quality. Additionally, to our knowledge this is the first time that a collection as large as RCV1 has been analyzed in its entirety using a static clustering approach.

Expert Systems With Applications | 1997

A case-based reasoning approach to the selection of comparable evidence for retail rent determination

Brenna O'Roarty; David W. Patterson; Stanley McGreal; Alastair Adair

Abstract Case-based reasoning is an artificial intelligence technique which utilizes past experience to solve current problems and in this respect it mirrors the process involved in real estate appraisal. This paper investigates its application as a computer-assisted valuation tool to the specific domain of retail rent determination. As property appraisal is goal orientated, it is essential that the most appropriate examples of previous rent determinations are selected. 1n exploring a case-based reasoning approach to the retail real estate domain five models are built namely; pure inductive, inductive (Q-model), inductive (prototype), inductive (Q-model and prototype) and nearest neighbour.

Explore More