Michael Stromberg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Stromberg is active.

Explore More

Publication

Featured researches published by Michael Stromberg.

Nature | 2011

Mapping copy number variation by population-scale genome sequencing

Ryan E. Mills; Klaudia Walter; Chip Stewart; Robert E. Handsaker; Ken Chen; Can Alkan; Alexej Abyzov; Seungtai Yoon; Kai Ye; R. Keira Cheetham; Asif T. Chinwalla; Donald F. Conrad; Yutao Fu; Fabian Grubert; Iman Hajirasouliha; Fereydoun Hormozdiari; Lilia M. Iakoucheva; Zamin Iqbal; Shuli Kang; Jeffrey M. Kidd; Miriam K. Konkel; Joshua M. Korn; Ekta Khurana; Deniz Kural; Hugo Y. K. Lam; Jing Leng; Ruiqiang Li; Yingrui Li; Chang-Yun Lin; Ruibang Luo

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

Nature Methods | 2008

Whole-genome sequencing and variant discovery in C. elegans

LaDeana W. Hillier; Gabor T. Marth; Aaron R. Quinlan; David J. Dooling; Ginger Fewell; Derek Barnett; Paul Fox; Jarret Glasscock; Matthew Hickenbotham; Weichun Huang; Vincent Magrini; Ryan Richt; Sacha Sander; Donald A Stewart; Michael Stromberg; Eric F. Tsung; Todd Wylie; Tim Schedl; Richard Wilson; Elaine R. Mardis

Massively parallel sequencing instruments enable rapid and inexpensive DNA sequence data production. Because these instruments are new, their data require characterization with respect to accuracy and utility. To address this, we sequenced a Caernohabditis elegans N2 Bristol strain isolate using the Solexa Sequence Analyzer, and compared the reads to the reference genome to characterize the data and to evaluate coverage and representation. Massively parallel sequencing facilitates strain-to-reference comparison for genome-wide sequence variant discovery. Owing to the short-read-length sequences produced, we developed a revised approach to determine the regions of the genome to which short reads could be uniquely mapped. We then aligned Solexa reads from C. elegans strain CB4858 to the reference, and screened for single-nucleotide polymorphisms (SNPs) and small indels. This study demonstrates the utility of massively parallel short read sequencing for whole genome resequencing and for accurate discovery of genome-wide polymorphisms.

Genome Research | 2008

Rapid whole-genome mutational profiling using next-generation sequencing technologies.

Douglas R. Smith; Aaron R. Quinlan; Heather E. Peckham; Kathryn Makowsky; Wei Tao; Betty Woolf; Lei Shen; William F. Donahue; Nadeem Tusneem; Michael Stromberg; Donald A Stewart; Lu Zhang; Swati Ranade; Jason Warner; Clarence Lee; Brittney E. Coleman; Zheng Zhang; Stephen F. McLaughlin; Joel A. Malek; Jon M. Sorenson; Alan Blanchard; Jarrod Chapman; David Hillman; Feng Chen; Daniel S. Rokhsar; Kevin McKernan; Thomas W. Jeffries; Gabor T. Marth; Paul M. Richardson

Forward genetic mutational studies, adaptive evolution, and phenotypic screening are powerful tools for creating new variant organisms with desirable traits. However, mutations generated in the process cannot be easily identified with traditional genetic tools. We show that new high-throughput, massively parallel sequencing technologies can completely and accurately characterize a mutant genome relative to a previously sequenced parental (reference) strain. We studied a mutant strain of Pichia stipitis, a yeast capable of converting xylose to ethanol. This unusually efficient mutant strain was developed through repeated rounds of chemical mutagenesis, strain selection, transformation, and genetic manipulation over a period of seven years. We resequenced this strain on three different sequencing platforms. Surprisingly, we found fewer than a dozen mutations in open reading frames. All three sequencing technologies were able to identify each single nucleotide mutation given at least 10-15-fold nominal sequence coverage. Our results show that detecting mutations in evolved and engineered organisms is rapid and cost-effective at the whole-genome level using new sequencing technologies. Identification of specific mutations in strains with altered phenotypes will add insight into specific gene functions and guide further metabolic engineering efforts.

Bioinformatics | 2011

BamTools: a C++ API and toolkit for analyzing and managing BAM files

Derek Barnett; Erik K. Garrison; Aaron R. Quinlan; Michael Stromberg; Gabor T. Marth

MOTIVATION Analysis of genomic sequencing data requires efficient, easy-to-use access to alignment results and flexible data management tools (e.g. filtering, merging, sorting, etc.). However, the enormous amount of data produced by current sequencing technologies is typically stored in compressed, binary formats that are not easily handled by the text-based parsers commonly used in bioinformatics research. RESULTS We introduce a software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. AVAILABILITY BamTools was written in C++, and is supported on Linux, Mac OSX and MS Windows. Source code and documentation are freely available at http://github.org/pezmaster31/bamtools.

Nature Methods | 2008

Pyrobayes: an improved base caller for SNP discovery in pyrosequences

Aaron R. Quinlan; Donald A Stewart; Michael Stromberg; Gabor T. Marth

Previously reported applications of the 454 Life Sciences pyrosequencing technology have relied on deep sequence coverage for accurate polymorphism discovery because of frequent insertion and deletion sequence errors. Here we report a new base calling program, Pyrobayes, for pyrosequencing reads. Pyrobayes permits accurate single-nucleotide polymorphism (SNP) calling in resequencing applications, even in shallow read coverage, primarily because it produces more confident base calls than the native base calling program.

PLOS Genetics | 2011

A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans

Chip Stewart; Deniz Kural; Michael Stromberg; Jerilyn A. Walker; Miriam K. Konkel; Adrian M. Stütz; Alexander E. Urban; Fabian Grubert; Hugo Y. K. Lam; Wan Ping Lee; Michele A. Busby; Amit Indap; Erik Garrison; Chad D. Huff; Jinchuan Xing; Michael Snyder; Lynn B. Jorde; Mark A. Batzer; Jan O. Korbel; Gabor T. Marth

As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations.

PLOS ONE | 2014

MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping

Wan Ping Lee; Michael Stromberg; Alistair Ward; Chip Stewart; Erik Garrison; Gabor T. Marth

MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).

Bioinformatics | 2013

Isaac: Ultra-fast whole genome secondary analysis on Illumina sequencing platforms

Come Raczy; Roman Petrovski; Christopher T. Saunders; Ilya Chorny; Semyon Kruglyak; Elliott H. Margulies; Han-Yu Chuang; Morten Källberg; Swathi A. Kumar; Arnold Liao; Kristina M. Little; Michael Stromberg; Stephen Tanner

SUMMARY An ultrafast DNA sequence aligner (Isaac Genome Alignment Software) that takes advantage of high-memory hardware (>48 GB) and variant caller (Isaac Variant Caller) have been developed. We demonstrate that our combined pipeline (Isaac) is four to five times faster than BWA + GATK on equivalent hardware, with comparable accuracy as measured by trio conflict rates and sensitivity. We further show that Isaac is effective in the detection of disease-causing variants and can easily/economically be run on commodity hardware. AVAILABILITY Isaac has an open source license and can be obtained at https://github.com/sequencing.

BMC Genomics | 2011

Expression divergence measured by transcriptome sequencing of four yeast species

Michele A. Busby; Jesse M. Gray; Allen M. Costa; Chip Stewart; Michael Stromberg; Derek Barnett; Jeffrey H. Chuang; Michael Springer; Gabor T. Marth

BackgroundThe evolution of gene expression is a challenging problem in evolutionary biology, for which accurate, well-calibrated measurements and methods are crucial.ResultsWe quantified gene expression with whole-transcriptome sequencing in four diploid, prototrophic strains of Saccharomyces species grown under the same condition to investigate the evolution of gene expression. We found that variation in expression is gene-dependent with large variations in each genes expression between replicates of the same species. This confounds the identification of genes differentially expressed across species. To address this, we developed a statistical approach to establish significance bounds for inter-species differential expression in RNA-Seq data based on the variance measured across biological replicates. This metric estimates the combined effects of technical and environmental variance, as well as Poisson sampling noise by isolating each component. Despite a paucity of large expression changes, we found a strong correlation between the variance of gene expression change and species divergence (R2 = 0.90).ConclusionWe provide an improved methodology for measuring gene expression changes in evolutionary diverged species using RNA Seq, where experimental artifacts can mimic evolutionary effects.GEO Accession Number: GSE32679

international conference on bioinformatics | 2017

Nirvana: Clinical Grade Variant Annotator

Michael Stromberg; Rajat Roy; Julien Lajugie; Yu Jiang; Haochen Li; Elliott H. Margulies

Sequencing an individual genome typically produces approximately three million variants compared to the human reference genome. The consequence for each variant depends on the location and nature of the variant and is a key question for genetic analysts performing clinical diagnosis. Variant annotation describes how a variant affects the samples genome. These annotations include the functional consequence on the different transcripts for a gene or in proximal regulatory regions. Annotation also includes additional data on what is known about a given variant that can help in understanding its relevance to a given line of investigation. Often this data is provided by different sources and contain allele frequencies for different populations, clinical implications, relevance to cancer types, additional studies, etc. Ultimately this information helps clinicians interpret variants when providing a diagnosis. The three most widely used open source annotation tools are VEP, SnpEff and AnnoVar. VEP is widely considered to be most accurate of the three, but is also slower than both SnpEff and AnnoVar. When annotating the variants from a 30x genome (NA12878), VEP finished in 18 hours whereas SnpEff 4.3g and AnnoVar finish in 15 min and 67 min respectively using one core. We present Nirvana, an open source clinical variant annotator, that is both accurate (over 99.9% concordance with VEP) and fast (takes 7 min to annotate NA12878). Nirvana is used in all of Illuminas relevant analysis pipelines and is tested rigorously to ensure adherence to clinical standards.

Explore More