Joshua Z. Levin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joshua Z. Levin is active.

Explore More

Publication

Featured researches published by Joshua Z. Levin.

Nature Biotechnology | 2011

Full-length transcriptome assembly from RNA-Seq data without a reference genome

Manfred Grabherr; Brian J. Haas; Moran Yassour; Joshua Z. Levin; Dawn Anne Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica Di Palma; Bruce Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev

Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

Nature Biotechnology | 2010

Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs

Mitchell Guttman; Manuel Garber; Joshua Z. Levin; Julie Donaghey; James Robinson; Xian Adiconis; Lin Fan; Magdalena J. Koziol; Andreas Gnirke; Chad Nusbaum; John L. Rinn; Eric S. Lander; Aviv Regev

Massively parallel cDNA sequencing (RNA-Seq) provides an unbiased way to study a transcriptome, including both coding and noncoding genes. Until now, most RNA-Seq studies have depended crucially on existing annotations and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We applied it to mouse embryonic stem cells, neuronal precursor cells and lung fibroblasts to accurately reconstruct the full-length gene structures for most known expressed genes. We identified substantial variation in protein coding genes, including thousands of novel 5′ start sites, 3′ ends and internal coding exons. We then determined the gene structures of more than a thousand large intergenic noncoding RNA (lincRNA) and antisense loci. Our results open the way to direct experimental manipulation of thousands of noncoding RNAs and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.

Nature | 2013

Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells

Alex K. Shalek; Rahul Satija; Xian Adiconis; Rona S. Gertner; Jellert T. Gaublomme; Raktima Raychowdhury; Schraga Schwartz; Nir Yosef; Christine M. Malboeuf; Diana Lu; John J. Trombetta; Dave Gennert; Andreas Gnirke; Alon Goren; Nir Hacohen; Joshua Z. Levin; Hongkun Park; Aviv Regev

Recent molecular studies have shown that, even when derived from a seemingly homogenous population, individual cells can exhibit substantial differences in gene expression, protein levels and phenotypic output, with important functional consequences. Existing studies of cellular heterogeneity, however, have typically measured only a few pre-selected RNAs or proteins simultaneously, because genomic profiling methods could not be applied to single cells until very recently. Here we use single-cell RNA sequencing to investigate heterogeneity in the response of mouse bone-marrow-derived dendritic cells (BMDCs) to lipopolysaccharide. We find extensive, and previously unobserved, bimodal variation in messenger RNA abundance and splicing patterns, which we validate by RNA-fluorescence in situ hybridization for select transcripts. In particular, hundreds of key immune genes are bimodally expressed across cells, surprisingly even for genes that are very highly expressed at the population average. Moreover, splicing patterns demonstrate previously unobserved levels of heterogeneity between cells. Some of the observed bimodality can be attributed to closely related, yet distinct, known maturity states of BMDCs; other portions reflect differences in the usage of key regulatory circuits. For example, we identify a module of 137 highly variable, yet co-regulated, antiviral response genes. Using cells from knockout mice, we show that variability in this module may be propagated through an interferon feedback circuit, involving the transcriptional regulators Stat2 and Irf7. Our study demonstrates the power and promise of single-cell genomics in uncovering functional diversity between cells and in deciphering cell states and circuits.

Genome Research | 2012

Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis

Andrea Pauli; Eivind Valen; Michael F. Lin; Manuel Garber; Nadine L. Vastenhouw; Joshua Z. Levin; Lin Fan; Albin Sandelin; John L. Rinn; Aviv Regev; Alexander F. Schier

Long noncoding RNAs (lncRNAs) comprise a diverse class of transcripts that structurally resemble mRNAs but do not encode proteins. Recent genome-wide studies in humans and the mouse have annotated lncRNAs expressed in cell lines and adult tissues, but a systematic analysis of lncRNAs expressed during vertebrate embryogenesis has been elusive. To identify lncRNAs with potential functions in vertebrate embryogenesis, we performed a time-series of RNA-seq experiments at eight stages during early zebrafish development. We reconstructed 56,535 high-confidence transcripts in 28,912 loci, recovering the vast majority of expressed RefSeq transcripts while identifying thousands of novel isoforms and expressed loci. We defined a stringent set of 1133 noncoding multi-exonic transcripts expressed during embryogenesis. These include long intergenic ncRNAs (lincRNAs), intronic overlapping lncRNAs, exonic antisense overlapping lncRNAs, and precursors for small RNAs (sRNAs). Zebrafish lncRNAs share many of the characteristics of their mammalian counterparts: relatively short length, low exon number, low expression, and conservation levels comparable to that of introns. Subsets of lncRNAs carry chromatin signatures characteristic of genes with developmental functions. The temporal expression profile of lncRNAs revealed two novel properties: lncRNAs are expressed in narrower time windows than are protein-coding genes and are specifically enriched in early-stage embryos. In addition, several lncRNAs show tissue-specific expression and distinct subcellular localization patterns. Integrative computational analyses associated individual lncRNAs with specific pathways and functions, ranging from cell cycle regulation to morphogenesis. Our study provides the first systematic identification of lncRNAs in a vertebrate embryo and forms the foundation for future genetic, genomic, and evolutionary studies.

Nature | 2013

The African coelacanth genome provides insights into tetrapod evolution.

Chris T. Amemiya; Jessica Alföldi; Alison P. Lee; Shaohua Fan; Hervé Philippe; Iain MacCallum; Ingo Braasch; Tereza Manousaki; Igor Schneider; Nicolas Rohner; Chris Organ; Domitille Chalopin; Jeramiah J. Smith; Mark Robinson; Rosemary A. Dorrington; Marco Gerdol; Bronwen Aken; Maria Assunta Biscotti; Marco Barucca; Denis Baurain; Aaron M. Berlin; Francesco Buonocore; Thorsten Burmester; Michael S. Campbell; Adriana Canapa; John P. Cannon; Alan Christoffels; Gianluca De Moro; Adrienne L. Edkins; Lin Fan

The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.

Science | 2011

Comparative Functional Genomics of the Fission Yeasts

Nicholas Rhind; Zehua Chen; Moran Yassour; Dawn Anne Thompson; Brian J. Haas; Naomi Habib; Ilan Wapinski; Sushmita Roy; Michael F. Lin; David I. Heiman; Sarah K. Young; Kanji Furuya; Yabin Guo; Alison L. Pidoux; Huei Mei Chen; Barbara Robbertse; Jonathan M. Goldberg; Keita Aoki; Elizabeth H. Bayne; Aaron M. Berlin; Christopher A. Desjardins; Edward Dobbs; Livio Dukaj; Lin Fan; Michael Fitzgerald; Courtney French; Sharvari Gujja; Klavs Wörgler Hansen; Daniel Keifenheim; Joshua Z. Levin

A combined analysis of genome sequence, structure, and expression gives insights into fission yeast biology. The fission yeast clade—comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus, and S. japonicus—occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, which suggests a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the budding yeast of Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.

Bioinformatics | 2012

RNA-SeQC

David S. DeLuca; Joshua Z. Levin; Andrey Sivachenko; Timothy Fennell; Marc-Danie Nazaire; Chris Williams; Michael Reich; Wendy Winckler; Gad Getz

Summary: RNA-seq, the application of next-generation sequencing to RNA, provides transcriptome-wide characterization of cellular activity. Assessment of sequencing performance and library quality is critical to the interpretation of RNA-seq data, yet few tools exist to address this issue. We introduce RNA-SeQC, a program which provides key measures of data quality. These metrics include yield, alignment and duplication rates; GC bias, rRNA content, regions of alignment (exon, intron and intragenic), continuity of coverage, 3′/5′ bias and count of detectable transcripts, among others. The software provides multi-sample evaluation of library construction protocols, input materials and other experimental parameters. The modularity of the software enables pipeline integration and the routine monitoring of key measures of data quality such as the number of alignable reads, duplication rates and rRNA contamination. RNA-SeQC allows investigators to make informed decisions about sample inclusion in downstream analysis. In summary, RNA-SeQC provides quality control measures critical to experiment design, process optimization and downstream computational analysis. Availability and implementation: See www.genepattern.org to run online, or www.broadinstitute.org/rna-seqc/ for a command line tool. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

PLOS Pathogens | 2012

Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection

Matthew R. Henn; Christian L. Boutwell; Patrick Charlebois; Niall J. Lennon; Karen A. Power; Alexander R. Macalalad; Aaron M. Berlin; Christine M. Malboeuf; Elizabeth Ryan; Sante Gnerre; Michael C. Zody; Rachel L. Erlich; Lisa Green; Andrew Berical; Yaoyu Wang; Monica Casali; Hendrik Streeck; Allyson K. Bloom; Tim Dudek; Damien C. Tully; Ruchi M. Newman; Karen L. Axten; Adrianne D. Gladden; Laura Battis; Michael Kemper; Qiandong Zeng; Terrance Shea; Sharvari Gujja; Carmen Zedlack; Olivier Gasser

Deep sequencing technologies have the potential to transform the study of highly variable viral pathogens by providing a rapid and cost-effective approach to sensitively characterize rapidly evolving viral quasispecies. Here, we report on a high-throughput whole HIV-1 genome deep sequencing platform that combines 454 pyrosequencing with novel assembly and variant detection algorithms. In one subject we combined these genetic data with detailed immunological analyses to comprehensively evaluate viral evolution and immune escape during the acute phase of HIV-1 infection. The majority of early, low frequency mutations represented viral adaptation to host CD8+ T cell responses, evidence of strong immune selection pressure occurring during the early decline from peak viremia. CD8+ T cell responses capable of recognizing these low frequency escape variants coincided with the selection and evolution of more effective secondary HLA-anchor escape mutations. Frequent, and in some cases rapid, reversion of transmitted mutations was also observed across the viral genome. When located within restricted CD8 epitopes these low frequency reverting mutations were sufficient to prime de novo responses to these epitopes, again illustrating the capacity of the immune response to recognize and respond to low frequency variants. More importantly, rapid viral escape from the most immunodominant CD8+ T cell responses coincided with plateauing of the initial viral load decline in this subject, suggestive of a potential link between maintenance of effective, dominant CD8 responses and the degree of early viremia reduction. We conclude that the early control of HIV-1 replication by immunodominant CD8+ T cell responses may be substantially influenced by rapid, low frequency viral adaptations not detected by conventional sequencing approaches, which warrants further investigation. These data support the critical need for vaccine-induced CD8+ T cell responses to target more highly constrained regions of the virus in order to ensure the maintenance of immunodominant CD8 responses and the sustained decline of early viremia.

Genome Research | 2010

Integrative analysis of the melanoma transcriptome

Michael F. Berger; Joshua Z. Levin; Krishna Vijayendran; Andrey Sivachenko; Xian Adiconis; Jared Maguire; Laura A. Johnson; James Robinson; Roeland Verhaak; Carrie Sougnez; Robert C. Onofrio; Liuda Ziaugra; Kristian Cibulskis; Elisabeth Laine; Jordi Barretina; Wendy Winckler; David E. Fisher; Gad Getz; Matthew Meyerson; David B. Jaffe; Stacey B. Gabriel; Eric S. Lander; Reinhard Dummer; Andreas Gnirke; Chad Nusbaum; Levi A. Garraway

Global studies of transcript structure and abundance in cancer cells enable the systematic discovery of aberrations that contribute to carcinogenesis, including gene fusions, alternative splice isoforms, and somatic mutations. We developed a systematic approach to characterize the spectrum of cancer-associated mRNA alterations through integration of transcriptomic and structural genomic data, and we applied this approach to generate new insights into melanoma biology. Using paired-end massively parallel sequencing of cDNA (RNA-seq) together with analyses of high-resolution chromosomal copy number data, we identified 11 novel melanoma gene fusions produced by underlying genomic rearrangements, as well as 12 novel readthrough transcripts. We mapped these chimeric transcripts to base-pair resolution and traced them to their genomic origins using matched chromosomal copy number information. We also used these data to discover and validate base-pair mutations that accumulated in these melanomas, revealing a surprisingly high rate of somatic mutation and lending support to the notion that point mutations constitute the major driver of melanoma progression. Taken together, these results may indicate new avenues for target discovery in melanoma, while also providing a template for large-scale transcriptome studies across many tumor types.

Nature Chemical Biology | 2013

Peptidomic discovery of short open reading frame–encoded peptides in human cells

Sarah A. Slavoff; Andrew J. Mitchell; Adam G. Schwaid; Moran N. Cabili; Jiao Ma; Joshua Z. Levin; Amir D Karger; Bogdan Budnik; John L. Rinn; Alan Saghatelian

The amount of the transcriptome that is translated into polypeptides is of fundamental importance. We developed a peptidomic strategy to detect short ORF (sORF)-encoded polypeptides (SEPs) in human cells. We identified 90 SEPs, 86 of which are novel, the largest number of human SEPs ever reported. SEP abundances range from 10-1000 molecules per cell, identical to known proteins. SEPs arise from sORFs in non-coding RNAs as well as multi-cistronic mRNAs, and many SEPs initiate with non-AUG start codons, indicating that non-canonical translation may be more widespread in mammals than previously thought. In addition, coding sORFs are present in a small fraction (8/1866) of long intergenic non-coding RNAs (lincRNAs). Together, these results provide the strongest evidence to date that the human proteome is more complex than previously appreciated.

Explore More