Noah Spies | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Noah Spies is active.

Explore More

Publication

Featured researches published by Noah Spies.

Scientific Data | 2016

Extensive sequencing of seven human genomes to characterize benchmark reference materials.

Justin M. Zook; David N. Catoe; Jennifer H. McDaniel; Lindsay Vang; Noah Spies; Arend Sidow; Ziming Weng; Yuling Liu; Christopher E. Mason; Noah Alexander; Elizabeth Henaff; Alexa B. R. McIntyre; Dhruva Chandramohan; Feng Chen; Erich Jaeger; Ali Moshrefi; Khoa Pham; William Stedman; Tiffany Liang; Michael Saghbini; Zeljko Dzakula; Alex Hastie; Han Cao; Gintaras Deikus; Eric E. Schadt; Robert Sebra; Ali Bashir; Rebecca Truty; Christopher C. Chang; Natali Gulbahce

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.

Trends in Genetics | 2015

Concepts in solid tumor evolution.

Arend Sidow; Noah Spies

Evolutionary mechanisms in cancer progression give tumors their individuality. Cancer evolution is different from organismal evolution, however, and we discuss where concepts from evolutionary genetics are useful or limited in facilitating an understanding of cancer. Based on these concepts we construct and apply the simplest plausible model of tumor growth and progression. Simulations using this simple model illustrate the importance of stochastic events early in tumorigenesis, highlight the dominance of exponential growth over linear growth and differentiation, and explain the clonal substructure of tumors.

Bioinformatics | 2015

svviz: a read viewer for validating structural variants

Noah Spies; Justin M. Zook; Marc L. Salit; Arend Sidow

UNLABELLED Visualizing read alignments is the most effective way to validate candidate structural variants (SVs) with existing data. We present svviz, a sequencing read visualizer for SVs that sorts and displays only reads relevant to a candidate SV. svviz works by searching input bam(s) for potentially relevant reads, realigning them against the inferred sequence of the putative variant allele as well as the reference allele and identifying reads that match one allele better than the other. Separate views of the two alleles are then displayed in a scrollable web browser view, enabling a more intuitive visualization of each allele, compared with the single reference genome-based view common to most current read browsers. The browser view facilitates examining the evidence for or against a putative variant, estimating zygosity, visualizing affected genomic annotations and manual refinement of breakpoints. svviz supports data from most modern sequencing platforms. AVAILABILITY AND IMPLEMENTATION svviz is implemented in python and freely available from http://svviz.github.io/.

Genome Medicine | 2015

Cell-lineage heterogeneity and driver mutation recurrence in pre-invasive breast neoplasia

Ziming Weng; Noah Spies; Shirley Zhu; Daniel E. Newburger; Dorna Kashef-Haghighi; Serafim Batzoglou; Arend Sidow; Robert B. West

BackgroundAll cells in an individual are related to one another by a bifurcating lineage tree, in which each node is an ancestral cell that divided into two, each branch connects two nodes, and the root is the zygote. When a somatic mutation occurs in an ancestral cell, all its descendants carry the mutation, which can then serve as a lineage marker for the phylogenetic reconstruction of tumor progression. Using this concept, we investigate cell lineage relationships and genetic heterogeneity of pre-invasive neoplasias compared to invasive carcinomas.MethodsWe deeply sequenced over a thousand phylogenetically informative somatic variants in 66 morphologically independent samples from six patients that represent a spectrum of normal, early neoplasia, carcinoma in situ, and invasive carcinoma. For each patient, we obtained a highly resolved lineage tree that establishes the phylogenetic relationships among the pre-invasive lesions and with the invasive carcinoma.ResultsThe trees reveal lineage heterogeneity of pre-invasive lesions, both within the same lesion, and between histologically similar ones. On the basis of the lineage trees, we identified a large number of independent recurrences of PIK3CA H1047 mutations in separate lesions in four of the six patients, often separate from the diagnostic carcinoma.ConclusionsOur analyses demonstrate that multi-sample phylogenetic inference provides insights on the origin of driver mutations, lineage heterogeneity of neoplastic proliferations, and the relationship of genomically aberrant neoplasias with the primary tumors. PIK3CA driver mutations may be comparatively benign inducers of cellular proliferation.

eLife | 2015

Constraint and divergence of global gene expression in the mammalian embryo

Noah Spies; Cheryl L. Smith; Jesse M. Rodriguez; Julie C. Baker; Serafim Batzoglou; Arend Sidow

The effects of genetic variation on gene regulation in the developing mammalian embryo remain largely unexplored. To globally quantify these effects, we crossed two divergent mouse strains and asked how genotype of the mother or of the embryo drives gene expression phenotype genomewide. Embryonic expression of 331 genes depends on the genotype of the mother. Embryonic genotype controls allele-specific expression of 1594 genes and a highly overlapping set of cis-expression quantitative trait loci (eQTL). A marked paucity of trans-eQTL suggests that the widespread expression differences do not propagate through the embryonic gene regulatory network. The cis-eQTL genes exhibit lower-than-average evolutionary conservation and are depleted for developmental regulators, consistent with purifying selection acting on expression phenotype of pattern formation genes. The widespread effect of maternal and embryonic genotype in conjunction with the purifying selection we uncovered suggests that embryogenesis is an important and understudied reservoir of phenotypic variation. DOI: http://dx.doi.org/10.7554/eLife.05538.001

bioRxiv | 2018

genomeview - an extensible python-based genomics visualization engine

Noah Spies; Justin M. Zook; Arend Sidow; Marc L. Salit

Visual inspection and analysis is integral to quality control, hypothesis generation, methods development and validation of genomic data. The richness and complexity of genomic data necessitates customized visualizations highlighting specific features of interest while hiding the often vast tide of irrelevant attributes. However, the majority of genome-visualization occurs either in general-purpose tools such as IGV (Robinson et al, 2011) or the UCSC Genome Browser (Kent et al, 2002) - which offer many options to adjust visualization parameters, but very little in the way of extensibility - or narrowly-focused tools aiming to solve a single visualization problem. Here, we present genomeview, a python-based visualization engine which is easy to extend and simple to integrate into existing analysis pipelines.

bioRxiv | 2018

Comprehensive, Integrated, and Phased Whole-Genome Analysis of the Primary ENCODE Cell Line K562

Bo Zhou; Steve S. Ho; Xiaowei Zhu; Xianglong Zhang; Noah Spies; Seunggyu Byeon; Joseph G. Arthur; Reenal Pattni; Noa Ben-Efraim; Michael S. Haney; Rajini R Haraksingh; Giltae Song; Dimitri Perrin; Wing Hung Wong; Alexej Abyzov; Alexander E. Urban

K562 is one of the most widely used cell lines in biomedical research. It is one of three tier-one cell lines of ENCODE, and one of the cell lines most commonly used for large-scale CRISPR/Cas9 geneediting screens. Although the functional genomic and epigenomic characteristics of K562 are extensively studied, its genome sequence has never been comprehensively analyzed, and higher-order structural features of its genome beyond its karyotype were only cursorily known. The high degree of aneuploidy in K562 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from K562 requires an understanding of the cell line’s genome sequence and genome structure. We performed deep short-insert whole-genome sequencing, mate-pair sequencing, linked-read sequencing, karyotyping, and array CGH and used a combination of novel and established computational methods to identify and catalog a wide spectrum of genome sequence variants and genome structural features in K562: copy numbers (CN) by chromosome segments, SNVs and Indels (allele frequency-corrected by CN), phased haplotype blocks (N50 = 2.72 Mb), structural variants (SVs) including complex genomic rearrangements, and novel mobile element insertions. A large fraction of SVs was also phased, sequence assembled, and experimentally validated. Many chromosomes show striking loss of heterozygosity. To demonstrate the utility of this knowledge, we re-analyzed K562 RNA-Seq and whole-genome bisulfite sequencing data to detect and phase allelespecific expression and DNA methylation patterns, respectively. Furthermore, we used the haplotype information to produce a phased CRISPR targeting map, i.e. a catalog of loci where CRISPR guide RNAs will bind in an allele-specific manner. Finally, we show examples where deeper insights into genomic regulatory complexity could be gained by taking knowledge of genomic structural contexts into account. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 and as the basis of advanced analyses of the rich amounts of the functional genomics data produced by ENCODE for K562. It is also an example for advanced, integrated whole-genome sequence and structure analysis, beyond standard short-read/short-insert whole-genome sequencing, of human genomes in general and in particular of cancer genomes with large numbers of complex sequence alterations.

bioRxiv | 2018

Haplotype-resolved and integrated genome analysis of ENCODE cell line HepG2

Bo Zhou; Steve S. Ho; Stephanie U. Greer; Noah Spies; John M. Bell; Xianglong Zhang; Xiaowei Zhu; Joseph G. Arthur; Seunggyu Byeon; Reenal Pattni; Ishan Saha; Giltae Song; Hanlee P. Ji; Dimitri Perrin; Wing Hung Wong; Alexej Abyzov; Alexander E. Urban

HepG2 is one of the most widely used human cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher-order structural features of its genome beyond its karyotype were only cursorily known. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line’s genome sequence and genome structure. We performed deep whole-genome sequencing, mate-pair sequencing and linked-read sequencing to identify a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments, SNVs and Indels (both corrected for copy-number), phased haplotype blocks, structural variants (SVs) including complex genomic rearrangements, and novel mobile element insertions. A large number of SVs were phased, sequence assembled and experimentally validated. Several chromosomes show striking loss of heterozygosity. We re-analyzed HepG2 RNA-Seq and whole-genome bisulfite sequencing data for allele-specific expression and phased DNA methylation. We show examples where deeper insights into genomic regulatory complexity could be gained by taking knowledge of genomic structural contexts into account. Furthermore, we used the haplotype information to produce an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize HepG2.

BMC Genomics | 2016

svclassify: a method to establish benchmark structural variant calls

Hemang Parikh; Marghoob Mohiyuddin; Hugo Y. K. Lam; Hariharan K. Iyer; Desu Chen; Mark Pratt; Gabor Bartha; Noah Spies; Wolfgang Losert; Justin M. Zook; Marc L. Salit

Nature Biotechnology | 2017

Genome-wide reconstruction of complex structural variants using read clouds

Noah Spies; Ziming Weng; Alex Bishara; Jennifer H. McDaniel; David N. Catoe; Justin M. Zook; Marc L. Salit; Robert B. West; Serafim Batzoglou; Arend Sidow

Explore More