Sean Alistair Irvine
University of Waikato
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sean Alistair Irvine.
IEEE Transactions on Communications | 1997
Colin Boyd; John G. Cleary; Sean Alistair Irvine; Ingrid Rinsma-Melchert; Ian H. Witten
Arithmetic coding for data compression has gained widespread acceptance as the right method for optimum compression when used with a suitable source model. A technique to implement error detection as part of the arithmetic coding process is described. Heuristic arguments are given to show that a small amount of extra redundancy can be very effective in detecting errors very quickly, and practical tests confirm this prediction.
Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007) | 2007
Sean Alistair Irvine; Tin Pavlinic; Leonard E. Trigg; John G. Cleary; Stuart J. Inglis; Mark Utting
Jumble is a byte code level mutation testing tool for Java which inter-operates with JUnit. It has been designed to operate in an industrial setting with large projects. Heuristics have been included to speed the checking of mutations, for example, noting which test fails for each mutation and running this first in subsequent mutation checks. Significant effort has been put into ensuring that it can test code which uses custom class loading and reflection. This requires careful attention to class path handling and coexistence with foreign class-loaders. Jumble is currently used on a continuous basis within an agile programming environment with approximately 370,000 lines of Java code under source control. This checks out project code every fifteen minutes and runs an incremental set of unit tests and mutation tests for modified classes. Jumble is being made available as open source.
Computers & Security | 1995
John G. Cleary; Sean Alistair Irvine; Ingrid Rinsma-Melchert
Arithmetic coding is a technique which converts a given probability distribution into an optimal code and is commonly used in compression schemes. The use of arithmetic coding as an encryption scheme is considered. The simple case of a single binary probability distribution with a fixed (but unknown) probability is considered. We show that for a chosen plaintext attack w + 2 symbols is sufficient to uniquely determine a w-bit probability. For many known plaintexts w + m + O(logm) symbols, where m is the length of an initial sequence containing just one of (the two possible) symbols, is sufficient. It is noted that many extensions to this basic scheme are vulnerable to the same attack provided the arithmetic coder can be repeatedly reset to its initial state. If it cannot be reset then their vulnerability remains an open question.
bioRxiv | 2015
John G. Cleary; Ross Braithwaite; Kurt Gaastra; Brian Hilbush; Stuart J. Inglis; Sean Alistair Irvine; Alan Timothy Jon Jackson; Richard Littin; Mehul Rathod; David Ware; Justin M. Zook; Len Trigg; Francisco M. De La Vega
Summary To evaluate and compare the performance of variant calling methods and their confidence scores, comparisons between a test call set and a “gold standard” need to be carried out. Unfortunately, these comparisons are not straightforward with the current Variant Call Files (VCF), which are the standard output of most variant calling algorithms for high-throughput sequencing data. Comparisons of VCFs are often confounded by the different representations of indels, MNPs, and combinations thereof with SNVs in complex regions of the genome, resulting in misleading results. A variant caller is inherently a classification method designed to score putative variants with confidence scores that could permit controlling the rate of false positives (FP) or false negatives (FN) for a given application. Receiver operator curves (ROC) and the area under the ROC (AUC) are efficient metrics to evaluate a test call set versus a gold standard. However, in the case of VCF data this also requires a special accounting to deal with discrepant representations. We developed a novel algorithm for comparing variant call sets that deals with complex call representation discrepancies and through a dynamic programing method that minimizes false positives and negatives globally across the entire call sets for accurate performance evaluation of VCFs. Availability RTG Tools is implemented as a multithreaded Java application and source code is available under BSD license at: https://github.com/RealTimeGenomics/rtg-tools Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.
bioRxiv | 2018
Justin M. Zook; Jennifer H. McDaniel; Hemang Parikh; Sean Alistair Irvine; Len Trigg; Rebecca M. Truty; Cory Y. McLean; Francisco M. De La Vega; Chunlin Xiao; Stephen T. Sherry; Marc L. Salit
Benchmark small variant calls from the Genome in a Bottle Consortium (GIAB) for the CEPH/HapMap genome NA12878 (HG001) have been used extensively for developing, optimizing, and demonstrating performance of sequencing and bioinformatics methods. Here, we develop a reproducible, cloud-based pipeline to integrate multiple sequencing datasets and form benchmark calls, enabling application to arbitrary human genomes. We use these reproducible methods to form high-confidence calls with respect to GRCh37 and GRCh38 for HG001 and 4 additional broadly-consented genomes from the Personal Genome Project that are available as NIST Reference Materials. These new genomes’ broad, open consent with few restrictions on availability of samples and data is enabling a uniquely diverse array of applications. Our new methods produce 17% more high-confidence SNPs, 176% more indels, and 12% larger regions than our previously published calls. To demonstrate that these calls can be used for accurate benchmarking, we compare other high-quality callsets to ours (e.g., Illumina Platinum Genomes), and we demonstrate that the majority of discordant calls are errors in the other callsets, We also highlight challenges in interpreting performance metrics when benchmarking against imperfect high-confidence calls. We show that benchmarking tools from the Global Alliance for Genomics and Health can be used with our calls to stratify performance metrics by variant type and genome context and elucidate strengths and weaknesses of a method.
Cancer Research | 2016
Francisco M. Vega; Ryan T. Koehler; Yannick Pouliot; Yosr Bouhlal; Austin P. So; Federico Goodsaid; Sean Alistair Irvine; Len Trigg; Lincoln Nadauld
Cancer tumor profiling by targeted resequencing of actionable cancer genes is rapidly becoming the standard approach for selecting targeted therapies and clinical trials in refractory cancer patients. In this clinical scenario, a tumor sample is obtained from an FFPE block and sequenced by targeted next-generation sequencing (NGS) to uncover actionable somatic mutations in relevant cancer genes. Some of the challenges that arise in analyzing tumor-derived NGS data include distinguishing between somatic and germline variants in the absence of normal tissue data, recognizing pathogenic germline variants, and identifying sequencing errors (which occur at about 0.5% rate). Additional challenges arise when considering other clinical applications of NGS such as sequencing cell-free tumor DNA (cf-DNA) from plasma samples to monitor disease response or disease recurrence. Here we present a principled approach to identify both single-nucleotide and small insertion/deletion somatic mutations and germline variants from NGS data of tumor tissue that leverages the allelic fraction patterns in tumors and prior information from external databases through the use of a Bayesian Network algorithm. Our approach allows us to score each putative mutation or variant with respect to its probability of belonging to each variant class, versus classification as a sequencing error. The method enables the joint calling of related samples form the same patient, such as cases where a cf-DNA sample and primary tumor sample are both profiled improving sensitivity and specificity. We validated our method by analyzing data obtained with the TOMA OS-Seq targeted sequencing RUO assay for 98 cancer genes from a mixture of well-known genomes, patient case triads (where normal, tumor and cf-DNA are available), and a retrospective analysis of tumor patient data that underwent clinical tumor profiling for therapy selection. Citation Format: Francisco M. De La Vega, Ryan T. Koehler, Yannick Pouliot, Yosr Bouhlal, Austin So, Federico Goodsaid, Sean Irvine, Len Trigg, Lincoln Nadauld. Joint somatic mutation and germline variant identification and scoring from tumor molecular profiling and ct-DNA monitoring of cancer patients by high-throughput sequencing. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 2712.
Archive | 1995
Sean Alistair Irvine; John G. Cleary; Ingrid Rinsma-Melchert
Archive | 2011
Stuart J. Inglis; Leonard E. Trigg; Richard Littin; David Ware; Sean Alistair Irvine; John G. Cleary; Graham Charles Gaylard; Mehul Rathod
Archive | 2011
Stuart J. Inglis; Leonard E. Trigg; Alan Timothy Jon Jackson; Sean Alistair Irvine
Archive | 2014
John G. Cleary; Stuart J. Inglis; Sean Alistair Irvine