Stuart J. Inglis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stuart J. Inglis is active.

Explore More

Publication

Featured researches published by Stuart J. Inglis.

Machine Learning | 1998

Using Model Trees for Classification

Eibe Frank; Yong Wang; Stuart J. Inglis; Geoffrey Holmes; Ian H. Witten

Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They can be applied to classification problems by employing a standard method of transforming a classification problem into a problem of function approximation. Surprisingly, using this simple transformation the model tree inducer M5′, based on Quinlans M5, generates more accurate classifiers than the state-of-the-art decision tree learner C5.0, particularly when most of the attributes are numeric.

data compression conference | 1994

Compression-based template matching

Stuart J. Inglis; Ian H. Witten

Textual image compression is a method of both lossy and lossless image compression that is particularly effective for images containing repeated sub-images, notably pages of text. This paper addresses the problem of pattern comparison by using an information or compression based approach. Following Mohiuddin et al. ( 1984), the authors use the amount of uncertainty or entropy between marks as the criterion for the matching process. The entropy model they use is the context-based compression model proposed by Langdon and Rissanen (1981) and further developed by Moffat (1991). There are two principal issues to investigate when studying template matching methods: their susceptibility to different kinds of noise, and how they respond to errors in the initial registration. Because of the computation-intensive nature of the comparison operation, many schemes have been devised to pre-filter or screen the marks in advance to determine those that will surely fail the match. They present a novel method of screening which uses a quad-tree decomposition and finds local centroids at each tree level.<<ETX>>

Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007) | 2007

Jumble Java Byte Code to Measure the Effectiveness of Unit Tests

Sean Alistair Irvine; Tin Pavlinic; Leonard E. Trigg; John G. Cleary; Stuart J. Inglis; Mark Utting

Jumble is a byte code level mutation testing tool for Java which inter-operates with JUnit. It has been designed to operate in an industrial setting with large projects. Heuristics have been included to speed the checking of mutations, for example, noting which test fails for each mutation and running this first in subsequent mutation checks. Significant effort has been put into ensuring that it can test code which uses custom class loading and reflection. This requires careful attention to class path handling and coexistence with foreign class-loaders. Jumble is currently used on a continuous basis within an agile programming environment with approximately 370,000 lines of Java code under source control. This checks out project code every fifteen minutes and runs an incremental set of unit tests and mutation tests for modified classes. Jumble is being made available as open source.

Proceedings of the IEEE | 1994

Textual image compression: two-stage lossy/lossless encoding of textual images

Ian H. Witten; Tim Bell; Hugh Emberson; Stuart J. Inglis; Alistair Moffat

A two-stage method for compressing bilevel images is described that is particularly effective for images containing repeated subimages, notably text. In the first stage, connected groups of pixels, corresponding approximately to individual characters, are extracted from the image. These are matched against an adaptively constructed library of patterns seen so far, and the resulting sequence of symbol identification numbers is coded and transmitted. From this information, along with the library itself and the offset from one mark to the next, an approximate image can be reconstructed. The result is a lossy method of compression that outperforms other schemes. The second stage employs the reconstructed image as an aid for encoding the original image using a statistical context-based compression technique. This yields a total bandwidth for exact transmission appreciably undercutting that required by other lossless binary image compression methods. Taken together, the lossy, and lossless methods provide an effective two-stage progressive transmission capability for textual images which has application for legal, medical, and historical purposes, and to archiving in general. >

IEEE Computer | 1994

Displaying 3D images: algorithms for single-image random-dot stereograms

Harold W. Thimbleby; Stuart J. Inglis; Ian H. Witten

A new, simple, and symmetric algorithm can be implemented that results in higher levels of detail in solid objects than previously possible with autostereograms. In a stereoscope, an optical instrument similar to binoculars, each eye views a different picture and thereby receives the specific image that would have arisen naturally. An early suggestion for a color stereo computer display involved a rotating filter wheel held in front of the eyes. In contrast, this article describes a method for viewing on paper or on an ordinary computer screen without special equipment, although it is limited to the display of 3D monochromatic objects. (The image can be colored, say, for artistic reasons, but the method we describe does not allow colors to be allocated in a way that corresponds to an arbitrary coloring of the solid object depicted.) The image can easily be constructed by computer from any 3D scene or solid object description.<<ETX>>

data compression conference | 1998

Correcting English text using PPM models

W. J. Teahan; Stuart J. Inglis; John G. Cleary; Geoffrey Holmes

An essential component of many applications in natural language processing is a language modeler able to correct errors in the text being processed. For optical character recognition (OCR), poor scanning quality or extraneous pixels in the image may cause one or more characters to be mis-recognized, while for spelling correction, two characters may be transposed, or a character may be inadvertently inserted or missed out, This paper describes a method for correcting English text using a PPM model. A method that segments words in English text is introduced and is shown to be a significant improvement over previously used methods. A similar technique is also applied as a post-processing stage after pages have been recognized by a state-of-the-art commercial OCR system. We show that the accuracy of the OCR system can be increased from 96.3% to 96.9%, a decrease of about 14 errors per page.

bioRxiv | 2015

Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines

John G. Cleary; Ross Braithwaite; Kurt Gaastra; Brian Hilbush; Stuart J. Inglis; Sean Alistair Irvine; Alan Timothy Jon Jackson; Richard Littin; Mehul Rathod; David Ware; Justin M. Zook; Len Trigg; Francisco M. De La Vega

Summary To evaluate and compare the performance of variant calling methods and their confidence scores, comparisons between a test call set and a “gold standard” need to be carried out. Unfortunately, these comparisons are not straightforward with the current Variant Call Files (VCF), which are the standard output of most variant calling algorithms for high-throughput sequencing data. Comparisons of VCFs are often confounded by the different representations of indels, MNPs, and combinations thereof with SNVs in complex regions of the genome, resulting in misleading results. A variant caller is inherently a classification method designed to score putative variants with confidence scores that could permit controlling the rate of false positives (FP) or false negatives (FN) for a given application. Receiver operator curves (ROC) and the area under the ROC (AUC) are efficient metrics to evaluate a test call set versus a gold standard. However, in the case of VCF data this also requires a special accounting to deal with discrepant representations. We developed a novel algorithm for comparing variant call sets that deals with complex call representation discrepancies and through a dynamic programing method that minimizes false positives and negatives globally across the entire call sets for accurate performance evaluation of VCFs. Availability RTG Tools is implemented as a multithreaded Java application and source code is available under BSD license at: https://github.com/RealTimeGenomics/rtg-tools Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

data compression conference | 1996

Bi-level document image compression using layout information

Stuart J. Inglis; Ian H. Witten

Most bi-level images stored on computers today comprise scanned text, and are stored using generic bi-level image technology based either on classical run-length coding, such as the CCITT Group 4 method, or on modern schemes such as JBIG that predict pixels from their local image context. However, image compression methods that are tailored specifically for images known to contain printed text can provide noticeably superior performance because they effectively enlarge the context to the character level, at least for those predictions for which such a context is relevant. To deal effectively with general documents that contain text and pictures, it is necessary to detect layout and structural information from the image, and employ different compression techniques for different parts of the image. The authors extend previous work in document image compression in two ways. First, we include automatic discrimination between text and non-text zones in an image. Second, the system is tested on a large real-world image corpus.

Archive | 1998