Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paul Bodily is active.

Publication


Featured researches published by Paul Bodily.


Genome Medicine | 2013

Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing

Jason O'Rawe; Tao Jiang; Guangqing Sun; Yiyang Wu; Wei Min Wang; Jingchu Hu; Paul Bodily; Lifeng Tian; Hakon Hakonarson; W. Evan Johnson; Zhi Wei; Kai Wang; Gholson J. Lyon

BackgroundTo facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be.MethodsWe sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage.ResultsSNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family.ConclusionsOur results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.


PLOS ONE | 2014

De novo genome assembly of the fungal plant pathogen Pyrenophora semeniperda.

Marcus Makina Soliai; Susan E. Meyer; David E. Elzinga; Russell A. Hermansen; Paul Bodily; Aaron Hart; Craig E. Coleman

Pyrenophora semeniperda (anamorph Drechslera campulata) is a necrotrophic fungal seed pathogen that has a wide host range within the Poaceae. One of its hosts is cheatgrass (Bromus tectorum), a species exotic to the United States that has invaded natural ecosystems of the Intermountain West. As a natural pathogen of cheatgrass, P. semeniperda has potential as a biocontrol agent due to its effectiveness at killing seeds within the seed bank; however, few genetic resources exist for the fungus. Here, the genome of P. semeniperda isolate assembled from sequence reads of 454 pyrosequencing is presented. The total assembly is 32.5 Mb and includes 11,453 gene models encoding putative proteins larger than 24 amino acids. The models represent a variety of putative genes that are involved in pathogenic pathways typically found in necrotrophic fungi. In addition, extensive rearrangements, including inter- and intrachromosomal rearrangements, were found when the P. semeniperda genome was compared to P. tritici-repentis, a related fungal species.


Molecular Ecology | 2017

Opsins have evolved under the permanent heterozygote model: insights from phylotranscriptomics of Odonata

Anton Suvorov; Nicholas O. Jensen; Camilla R. Sharkey; M. Stanley Fujimoto; Paul Bodily; Haley M. Cahill Wightman; T. Heath Ogden; Mark J. Clement; Seth M. Bybee

Gene duplication plays a central role in adaptation to novel environments by providing new genetic material for functional divergence and evolution of biological complexity. Several evolutionary models have been proposed for gene duplication to explain how new gene copies are preserved by natural selection, but these models have rarely been tested using empirical data. Opsin proteins, when combined with a chromophore, form a photopigment that is responsible for the absorption of light, the first step in the phototransduction cascade. Adaptive gene duplications have occurred many times within the animal opsins’ gene family, leading to novel wavelength sensitivities. Consequently, opsins are an attractive choice for the study of gene duplication evolutionary models. Odonata (dragonflies and damselflies) have the largest opsin repertoire of any insect currently known. Additionally, there is tremendous variation in opsin copy number between species, particularly in the long‐wavelength‐sensitive (LWS) class. Using comprehensive phylotranscriptomic and statistical approaches, we tested various evolutionary models of gene duplication. Our results suggest that both the blue‐sensitive (BS) and LWS opsin classes were subjected to strong positive selection that greatly weakens after multiple duplication events, a pattern that is consistent with the permanent heterozygote model. Due to the immense interspecific variation and duplicability potential of opsin genes among odonates, they represent a unique model system to test hypotheses regarding opsin gene duplication and diversification at the molecular level.


BMC Bioinformatics | 2016

A novel approach for multi-SNP GWAS and its application in Alzheimer's disease

Paul Bodily; M. Stanley Fujimoto; Justin T. Page; Mark J. Clement; Mark T. W. Ebbert; Perry G. Ridge

BackgroundGenome-wide association studies (GWAS) have effectively identified genetic factors for many diseases. Many diseases, including Alzheimer’s disease (AD), have epistatic causes, requiring more sophisticated analyses to identify groups of variants which together affect phenotype.ResultsBased on the GWAS statistical model, we developed a multi-SNP GWAS analysis to identify pairs of variants whose common occurrence signaled the Alzheimer’s disease phenotype.ConclusionsDespite not having sufficient data to demonstrate significance, our preliminary experimentation identified a high correlation between GRIA3 and HLA-DRB5 (an AD gene). GRIA3 has not been previously reported in association with AD, but is known to play a role in learning and memory.


Bioinformatics | 2015

ScaffoldScaffolder: solving contig orientation via bidirected to directed graph reduction

Paul Bodily; M. Stanley Fujimoto; Quinn Snell; Dan Ventura; Mark J. Clement

MOTIVATION The contig orientation problem, which we formally define as the MAX-DIR problem, has at times been addressed cursorily and at times using various heuristics. In setting forth a linear-time reduction from the MAX-CUT problem to the MAX-DIR problem, we prove the latter is NP-complete. We compare the relative performance of a novel greedy approach with several other heuristic solutions. RESULTS Our results suggest that our greedy heuristic algorithm not only works well but also outperforms the other algorithms due to the nature of scaffold graphs. Our results also demonstrate a novel method for identifying inverted repeats and inversion variants, both of which contradict the basic single-orientation assumption. Such inversions have previously been noted as being difficult to detect and are directly involved in the genetic mechanisms of several diseases. AVAILABILITY AND IMPLEMENTATION http://bioresearch.byu.edu/scaffoldscaffolder. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


BMC Bioinformatics | 2014

Effects of error-correction of heterozygous next-generation sequencing data

M. Stanley Fujimoto; Paul Bodily; Nozomu Okuda; Mark J. Clement; Quinn Snell

BackgroundError correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumption ignores the true genomic composition of many organisms that are diploid or polyploid. In this survey, two different error correction packages, Quake and ECHO, are examined to see how they perform on next-generation sequence data from heterozygous genomes.ResultsQuake and ECHO perform well and were able to correct many errors found within the data. However, errors that occur at heterozygous positions had unique trends. Errors at these positions were sometimes corrected incorrectly, introducing errors into the dataset with the possibility of creating a chimeric read. Quake was much less likely to create chimeric reads. Quakes read trimming removed a large portion of the original data and often left reads with few heterozygous markers. ECHO resulted in more chimeric reads and introduced more errors than Quake but preserved heterozygous markers.Using real E. coli sequencing data and their assemblies after error correction, the assembly statistics improved. It was also found that segregating reads by haplotype can improve the quality of an assembly.ConclusionsThese findings suggest that Quake and ECHO both have strengths and weaknesses when applied to heterozygous data. With the increased interest in haplotype specific analysis, new tools that are designed to be haplotype-aware are necessary that do not have the weaknesses of Quake and ECHO.


international conference on bioinformatics | 2014

Haplotype-centered mapping for improved alignments and genetic association studies

Paul Bodily; Mark J. Clement; Quinn Snell; M. Stanley Fujimoto; Perry G. Ridge

Next-Generation Sequencing experiments have been used to identify genotypes that are associated with many medical conditions. An important part of Next Generation read processing is the mapping of short reads to a reference genome. Although many algorithms have been created to perform this mapping, there are many reads that cannot be mapped because they are sequenced from low complexity regions of the genome (repeat regions) or from regions that are divergent from the reference genome. This research shows that when reads are first assembled into longer contigs that are then mapped to the reference genome, mapping efficiency and accuracy increases. When two contigs map to the same location, the contigs can provide haplotype information that can be used to perform association studies based on phased SNPs on a haplotype.


international conference on bioinformatics | 2013

Application of a MAX-CUT Heuristic to the Contig Orientation Problem in Genome Assembly

Paul Bodily; Mark J. Clement; Quinn Snell; Jared C. Price; Stanley Fujimoto; Nozomu Okuda

In the context of genome assembly, the contig orientation problem is described as the problem of removing sufficient edges from the scaffold graph so that the remaining subgraph assigns a consistent orientation to all sequence nodes in the graph. This problem can also be phrased as a weighted MAX-CUT problem. The performance of MAX-CUT heuristics in this application is untested. We present a greedy heuristic solution to the contig orientation problem and compare its performance to a weighted MAX-CUT semi-definite programming heuristic solution on several graphs. We note that the contig orientation problem can be used to identify inverted repeats and inverted haplotypes, as these represent sequences whose orientation appears ambiguous in the conventional genome assembly framework.


international conference on bioinformatics | 2014

A structured approach to ensemble learning for Alzheimer's disease prediction

Matthew Seeley; Mark J. Clement; Christophe G. Giraud-Carrier; Quinn Snell; Paul Bodily; Stanley Fujimoto

This research employs an exhaustive search of different attribute selection algorithms in order to provide a more structured approach to learning design for prediction of Alzheimers clinical dementia rating (CDR).


international parallel and distributed processing symposium | 2012

Parallel Pair-HMM SNP Detection

Nathan L. Clement; Brent A. Shepherd; Paul Bodily; Sukhbat Tumur-Ochir; Younghoon Gim; Quinn Snell; Mark J. Clement; W. Evan Johnson

I. MOTIVATION: Due to the massive amounts of data generated from each instrument run, next generation sequencing technologies have presented researchers with unique analytical challenges which require innovative, computationally efficient statistical solutions. Here we present a parallel implementation of a probabilistic Pair-Hidden Markov Model for base calling and SNP detection in next generation sequencing data. Our approach incorporates multiple sources of error into the base calling procedure which leads to more accurate results. In addition, our approach applies a likelihood ratio test that provides researchers with straight-forward SNP calling cutoffs based on a p-value cutoff or a false discovery control. II. RESULTS: We have developed GNUMAP-SNP, which is a highly accurate approach for the identification of SNPs in next generation sequencing data. By utilizing a novel probabilistic Pair-Hidden Markov Model, GNUMAP-SNP effectively accounts for uncertainty in the read calls as well as read mapping in an unbiased fashion. Our results show that GNUMAP-SNP has both high sensitivity and high specificity throughout the genome, which is especially true in repeat regions or in areas with low read coverage. In addition, we propose a statistical framework that accounts for the background noise using straightforward statistical cutoffs which filters out false-positive results. The parallel implementation of SNP calling achieves near linear speedup on distributed memory or shared memory platforms. III. AVAILABILITY: GNUMAP-SNP is available as a module in the GNUMAP probabilistic read mapping software. GNUMAP is freely available for download at: http://dna.cs.byu.edu/gnumap/.

Collaboration


Dive into the Paul Bodily's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Quinn Snell

Brigham Young University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dan Ventura

Brigham Young University

View shared research outputs
Top Co-Authors

Avatar

Cole A. Lyman

Brigham Young University

View shared research outputs
Top Co-Authors

Avatar

Gholson J. Lyon

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lifeng Tian

Children's Hospital of Philadelphia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zhi Wei

New Jersey Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge