Arthur W. Pightling | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arthur W. Pightling is active.

Explore More

Publication

Featured researches published by Arthur W. Pightling.

PLOS ONE | 2014

Choice of Reference Sequence and Assembler for Alignment of Listeria monocytogenes Short-Read Sequence Data Greatly Influences Rates of Error in SNP Analyses

Arthur W. Pightling; Nicholas Petronella; Franco Pagotto

The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should test a variety of conditions to achieve optimal results.

BMC Microbiology | 2015

The Listeria monocytogenes Core-Genome Sequence Typer (LmCGST): a bioinformatic pipeline for molecular characterization with next-generation sequence data

Arthur W. Pightling; Nicholas Petronella; Franco Pagotto

BackgroundNext-generation sequencing provides a powerful means of molecular characterization. However, methods such as single-nucleotide polymorphism detection or whole-chromosome sequence analysis are computationally expensive, prone to errors, and are still less accessible than traditional typing methods. Here, we present the Listeria monocytogenes core-genome sequence typing method for molecular characterization. This method uses a high-confidence core (HCC) genome, calculated to ensure accurate identification of orthologs. We also developed an evolutionarily relevant nomenclature based upon phylogenetic analysis of HCC genomes. Finally, we created a pipeline (LmCGST; https://sourceforge.net/projects/lmcgst/files/) that takes in raw next-generation sequencing reads, calculates a subject HCC profile, compares it to an expandable database, assigns a sequence type, and performs a phylogenetic analysis.ResultsWe analyzed 29 high-quality, closed Listeria monocytogenes chromosome sequences and identified loci that are reliable targets for automated molecular characterization methods. We identified 1013 open-reading frames that comprise our high-confidence core (HCC) genome. We then populated a database with HCC profiles from 114 taxa. We sequenced 84 randomly selected isolates from the Listeriosis Reference Service for Canada’s collection and analysed them with the LmCGST pipeline. In addition, we generated pulsed-field gel electrophoresis, ribotyping, and in silico multi-locus sequence typing (MLST) data for the 84 isolates and compared the results to those obtained using the CGST method. We found that all of the methods yielded results that are generally congruent. However, due to the increased numbers of categories, the CGST method provides much greater discriminatory power than the other methods tested here.ConclusionsWe show that the CGST method provides increased discriminatory power relative to typing methods such as pulsed-field gel electrophoresis, ribotyping, and multi-locus sequence typing while it addresses several shortcomings of other methods of molecular characterization with next-generation sequence data. It uses discrete, well-defined groupings (types) of organisms that are phylogenetically relevant and easily interpreted. In addition, the CGST scheme can be expanded to include additional loci and HCC profiles in the future. In total, the CGST method provides an approach to the molecular characterization of Listeria monocytogenes with next-generation sequence data that is highly reproducible, easily standardized, portable, and accessible.

Genome Announcements | 2014

Draft Genome Sequence of Listeria monocytogenes Strain LI0521 (syn. HPB7171), Isolated in 1983 during an Outbreak in Massachusetts Caused by Contaminated Cheese

Arthur W. Pightling; Min Lin; Franco Pagotto

ABSTRACT Listeria monocytogenes, a pathogenic food-borne bacterium, is the causative agent of both sporadic and outbreak cases of human listeriosis. Here, we present the genome sequence of L. monocytogenes reference strain LI0521, isolated during an outbreak involving contaminated cheese, which has been used as the model during several proteomic studies.

Genome Announcements | 2014

Draft Genome Sequence of Cronobacter sakazakii Clonal Complex 45 Strain HPB5174, Isolated from a Powdered Infant Formula Facility in Ireland.

Arthur W. Pightling; Franco Pagotto

ABSTRACT Cronobacter sakazakii is a food-borne pathogenic bacterium that may cause severe illness in neonates and the elderly. We present the genome sequence of a rare strain (ST40, CC45), commonly found in multiple food processing facilities and in powdered infant formula and only indicted in a single clinical case.

Genome Announcements | 2014

Draft Genome Sequences of Two Clostridium botulinum Group II (Nonproteolytic) Type B Strains (DB-2 and KAPB-3)

Nicholas Petronella; Robyn Kenwell; Franco Pagotto; Arthur W. Pightling

ABSTRACT Clostridium botulinum is important for food safety and studies of neurotoxins associated with human botulism. We present the draft genome sequences of two strains belonging to group II type B: one collected from Pacific Ocean sediments (DB-2) and another obtained during a botulism outbreak (KAPB-3).

Genome Announcements | 2015

Genome Sequence of Listeria monocytogenes Strain HPB5415, Collected during a 2008 Listeriosis Outbreak in Canada

Arthur W. Pightling; Franco Pagotto

ABSTRACT Listeria monocytogenes strain HPB5415—isolated from deli meat—was found in 2008 to have the same pulsed-field gel electrophoresis patterns as a clinical strain (08-5923). However, whether nucleotide differences (single nucleotide polymorphisms [SNPs]) exist between their genomes was not determined. We sequenced the L. monocytogenes strain HPB5415 genome and identified 52 SNPs relative to strain 08-5923.

PLOS ONE | 2016

Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples

James B. Pettengill; Arthur W. Pightling; Joseph D. Baugher; Hugh Rand; Errol Strain

The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.

Genome Announcements | 2016

Genome Sequence of the Listeria monocytogenes Food Isolate HPB913, Collected in Canada in 1993

Arthur W. Pightling; Hugh Rand; Errol Strain; Franco Pagotto

ABSTRACT Listeria monocytogenes is a pathogenic bacterium of importance to public health and food safety agencies. We present the genome sequence of the serotype 1/2a L. monocytogenes food isolate HPB913, which was collected in Canada in 1993 as part of an investigation into a sporadic case of foodborne illness.

Genome Announcements | 2016

Genome Sequence of Listeria monocytogenes Strain HPB2088 (Serotype 1/2a), an Environmental Isolate Collected in Canada in 1994.

Arthur W. Pightling; Hugh Rand; Errol Strain; Franco Pagotto

ABSTRACT Listeria monocytogenes is a foodborne pathogen that causes severe illness. Thus, ongoing efforts at real-time whole-genome sequencing are of utmost importance. However, it is also important that retrospective analyses that place these data into context be performed. Here, we present the genome sequence of strain HPB2088, which was collected in 1994.

Frontiers in Microbiology | 2018

Interpreting whole-genome sequence analyses of foodborne bacteria for regulatory applications and outbreak investigations

Arthur W. Pightling; James B. Pettengill; Yan Luo; Joseph D. Baugher; Hugh Rand; Errol Strain

Whole-genome sequence (WGS) analysis has revolutionized the food safety industry by enabling high-resolution typing of foodborne bacteria. Higher resolving power allows investigators to identify origins of contamination during illness outbreaks and regulatory activities quickly and accurately. Government agencies and industry stakeholders worldwide are now analyzing WGS data routinely. Although researchers have published many studies that assess the efficacy of WGS data analysis for source attribution, guidance for interpreting WGS analyses is lacking. Here, we provide the framework for interpreting WGS analyses used by the Food and Drug Administration’s Center for Food Safety and Applied Nutrition (CFSAN). We based this framework on the experiences of CFSAN investigators, collaborations and interactions with government and industry partners, and evaluation of the published literature. A fundamental question for investigators is whether two or more bacteria arose from the same source of contamination. Analysts often count the numbers of nucleotide differences [single-nucleotide polymorphisms (SNPs)] between two or more genome sequences to measure genetic distances. However, using SNP thresholds alone to assess whether bacteria originated from the same source can be misleading. Bacteria that are isolated from food, environmental, or clinical samples are representatives of bacterial populations. These populations are subject to evolutionary forces that can change genome sequences. Therefore, interpreting WGS analyses of foodborne bacteria requires a more sophisticated approach. Here, we present a framework for interpreting WGS analyses that combines SNP counts with phylogenetic tree topologies and bootstrap support. We also clarify the roles of WGS, epidemiological, traceback, and other evidence in forming the conclusions of investigations. Finally, we present examples that illustrate the application of this framework to real-world situations.

Explore More