Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paul Zumbo is active.

Publication


Featured researches published by Paul Zumbo.


Genome Biology | 2013

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data

Franck Rapaport; Raya Khanin; Yupu Liang; Mono Pirun; Azra Krek; Paul Zumbo; Christopher E. Mason; Nicholas D. Socci; Doron Betel

A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth.


Nature Genetics | 2013

Relapse-specific mutations in NT5C2 in childhood acute lymphoblastic leukemia

Julia Meyer; Jinhua Wang; Laura E. Hogan; Jun Yang; Smita Dandekar; Jay Patel; Zuojian Tang; Paul Zumbo; Sheng-Bo Li; Jiri Zavadil; Ross L. Levine; Timothy Cardozo; Stephen P. Hunger; Elizabeth A. Raetz; William E. Evans; Christopher E. Mason; William L. Carroll

Relapsed childhood acute lymphoblastic leukemia (ALL) carries a poor prognosis, despite intensive retreatment, owing to intrinsic drug resistance. The biological pathways that mediate resistance are unknown. Here, we report the transcriptome profiles of matched diagnosis and relapse bone marrow specimens from ten individuals with pediatric B-lymphoblastic leukemia using RNA sequencing. Transcriptome sequencing identified 20 newly acquired, novel nonsynonymous mutations not present at initial diagnosis, with 2 individuals harboring relapse-specific mutations in the same gene, NT5C2, encoding a 5′-nucleotidase. Full-exon sequencing of NT5C2 was completed in 61 further relapse specimens, identifying additional mutations in 5 cases. Enzymatic analysis of mutant proteins showed that base substitutions conferred increased enzymatic activity and resistance to treatment with nucleoside analog therapies. Clinically, all individuals who harbored NT5C2 mutations relapsed early, within 36 months of initial diagnosis (P = 0.03). These results suggest that mutations in NT5C2 are associated with the outgrowth of drug-resistant clones in ALL.


Nature Biotechnology | 2014

Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study.

Sheng Li; Scott Tighe; Charles M. Nicolet; Deborah S. Grove; Shawn Levy; William G. Farmerie; Agnes Viale; Chris L. Wright; Peter A. Schweitzer; Yuan Gao; Dewey Kim; Joe Boland; Belynda Hicks; Ryan Kim; Sagar Chhangawala; Nadereh Jafari; Nalini Raghavachari; Jorge Gandara; Natàlia Garcia-Reyero; Cynthia Hendrickson; David Roberson; Jeffrey Rosenfeld; Todd Smith; Jason G. Underwood; May Wang; Paul Zumbo; Don Baldwin; George Grills; Christopher E. Mason

High-throughput RNA sequencing (RNA-seq) greatly expands the potential for genomics discoveries, but the wide variety of platforms, protocols and performance capabilitites has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (poly-A–selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS and Roche 454). The results show high intraplatform (Spearman rank R > 0.86) and inter-platform (R > 0.83) concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. In addition, rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.


Cell systems | 2015

Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics

Ebrahim Afshinnekoo; Cem Meydan; Shanin Chowdhury; Dyala Jaroudi; Collin Boyer; Nick Bernstein; Julia M. Maritz; Darryl Reeves; Jorge Gandara; Sagar Chhangawala; Sofia Ahsanuddin; Amber Simmons; Timothy Nessel; Bharathi Sundaresh; Elizabeth Pereira; Ellen Jorgensen; Sergios-Orestis Kolokotronis; Nell Kirchberger; Isaac Garcia; David Gandara; Sean Dhanraj; Tanzina Nawrin; Yogesh Saletore; Noah Alexander; Priyanka Vijay; Elizabeth M. Hénaff; Paul Zumbo; Michael Walsh; Gregory D. O'Mullan; Scott Tighe

SUMMARY The panoply of microorganisms and other species present in our environment influence human health and disease, especially in cities, but have not been profiled with metagenomics at a city-wide scale. We sequenced DNA from surfaces across the entire New York City (NYC) subway system, the Gowanus Canal, and public parks. Nearly half of the DNA (48%) does not match any known organism; identified organisms spanned 1,688 bacterial, viral, archaeal, and eukaryotic taxa, which were enriched for harmless genera associated with skin (e.g., Acinetobacter). Predicted ancestry of human DNA left on subway surfaces can recapitulate U.S. Census demographic data, and bacterial signatures can reveal a station’s history, such as marine-associated bacteria in a hurricane-flooded station. Some evidence of pathogens was found (Bacillus anthracis), but a lack of reported cases in NYC suggests that the pathogens represent a normal, urban microbiome. This baseline metagenomic map of NYC could help long-term disease surveillance, bioterrorism threat mitigation, and health management in the built environment of cities.


Nature Biotechnology | 2014

Detecting and correcting systematic variation in large-scale RNA sequencing data

Sheng Li; Paweł P. Łabaj; Paul Zumbo; Peter Sykacek; Wei Shi; Leming Shi; John H. Phan; Po-Yen Wu; May Wang; Charles Wang; Danielle Thierry-Mieg; Jean Thierry-Mieg; David P. Kreil; Christopher E. Mason

High-throughput RNA sequencing (RNA-seq) enables comprehensive scans of entire transcriptomes, but best practices for analyzing RNA-seq data have not been fully defined, particularly for data collected with multiple sequencing platforms or at multiple sites. Here we used standardized RNA samples with built-in controls to examine sources of error in large-scale RNA-seq studies and their impact on the detection of differentially expressed genes (DEGs). Analysis of variations in guanine-cytosine content, gene coverage, sequencing error rate and insert size allowed identification of decreased reproducibility across sites. Moreover, commonly used methods for normalization (cqn, EDASeq, RUV2, sva, PEER) varied in their ability to remove these systematic biases, depending on sample complexity and initial data quality. Normalization methods that combine data from genes across sites are strongly recommended to identify and remove site-specific effects and can substantially improve RNA-seq studies.


BMC Bioinformatics | 2013

An optimized algorithm for detecting and annotating regional differential methylation

Sheng Li; Francine E. Garrett-Bakelman; Altuna Akalin; Paul Zumbo; Ross L. Levine; Bik To; Ian D. Lewis; Anna L. Brown; Richard J. D'Andrea; Ari Melnick; Christopher E. Mason

BackgroundDNA methylation profiling reveals important differentially methylated regions (DMRs) of the genome that are altered during development or that are perturbed by disease. To date, few programs exist for regional analysis of enriched or whole-genome bisulfate conversion sequencing data, even though such data are increasingly common. Here, we describe an open-source, optimized method for determining empirically based DMRs (eDMR) from high-throughput sequence data that is applicable to enriched whole-genome methylation profiling datasets, as well as other globally enriched epigenetic modification data.ResultsHere we show that our bimodal distribution model and weighted cost function for optimized regional methylation analysis provides accurate boundaries of regions harboring significant epigenetic modifications. Our algorithm takes the spatial distribution of CpGs into account for the enrichment assay, allowing for optimization of the definition of empirical regions for differential methylation. Combined with the dependent adjustment for regional p-value combination and DMR annotation, we provide a method that may be applied to a variety of datasets for rapid DMR analysis. Our method classifies both the directionality of DMRs and their genome-wide distribution, and we have observed that shows clinical relevance through correct stratification of two Acute Myeloid Leukemia (AML) tumor sub-types.ConclusionsOur weighted optimization algorithm eDMR for calling DMRs extends an established DMR R pipeline (methylKit) and provides a needed resource in epigenomics. Our method enables an accurate and scalable way of finding DMRs in high-throughput methylation sequencing experiments. eDMR is available for download at http://code.google.com/p/edmr/.


Toxicological Sciences | 2015

Mining the Archives: A Cross-Platform Analysis of Gene Expression Profiles in Archival Formalin-Fixed Paraffin-Embedded Tissues

A. Francina Webster; Paul Zumbo; Jennifer Fostel; Jorge Gandara; Susan D. Hester; Leslie Recio; Andrew Williams; Charles E. Wood; Carole L. Yauk; Christopher E. Mason

Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for transcriptomic research. However, use of FFPE samples in genomic studies has been limited by technical challenges resulting from nucleic acid degradation. Here we evaluated gene expression profiles derived from fresh-frozen (FRO) and FFPE mouse liver tissues preserved in formalin for different amounts of time using 2 DNA microarray protocols and 2 whole-transcriptome sequencing (RNA-seq) library preparation methodologies. The ribo-depletion protocol outperformed the other methods by having the highest correlations of differentially expressed genes (DEGs), and best overlap of pathways, between FRO and FFPE groups. The effect of sample time in formalin (18 h or 3 weeks) on gene expression profiles indicated that test article treatment, not preservation method, was the main driver of gene expression profiles. Meta- and pathway analyses indicated that biological responses were generally consistent for 18 h and 3 week FFPE samples compared with FRO samples. However, clear erosion of signal intensity with time in formalin was evident, and DEG numbers differed by platform and preservation method. Lastly, we investigated the effect of time in paraffin on genomic profiles. Ribo-depletion RNA-seq analysis of 8-, 19-, and 26-year-old control blocks resulted in comparable quality metrics, including expected distributions of mapped reads to exonic, untranslated region, intronic, and ribosomal fractions of the transcriptome. Overall, our results indicate that FFPE samples are appropriate for use in genomic studies in which frozen samples are not available, and that ribo-depletion RNA-seq is the preferred method for this type of analysis in archival and long-aged FFPE samples.


Advances in Experimental Medicine and Biology | 2010

Standardizing the Next Generation of Bioinformatics Software Development with BioHDF (HDF5)

Christopher E. Mason; Paul Zumbo; Stephan J. Sanders; Mike Folk; Dana Robinson; Ruth Aydt; Martin Gollery; Mark Welsh; N. Eric Olson; Todd Smith

Next Generation Sequencing technologies are limited by the lack of standard bioinformatics infrastructures that can reduce data storage, increase data processing performance, and integrate diverse information. HDF technologies address these requirements and have a long history of use in data-intensive science communities. They include general data file formats, libraries, and tools for working with the data. Compared to emerging standards, such as the SAM/BAM formats, HDF5-based systems demonstrate significantly better scalability, can support multiple indexes, store multiple data types, and are self-describing. For these reasons, HDF5 and its BioHDF extension are well suited for implementing data models to support the next generation of bioinformatics applications.


Cell systems | 2015

Modern Methods for Delineating Metagenomic Complexity

Ebrahim Afshinnekoo; Cem Meydan; Shanin Chowdhury; Dyala Jaroudi; Collin Boyer; Nick Bernstein; Julia M. Maritz; Darryl Reeves; Jorge Gandara; Sagar Chhangawala; Sofia Ahsanuddin; Amber Simmons; Timothy Nessel; Bharathi Sundaresh; Elizabeth Pereira; Ellen Jorgensen; Sergios-Orestis Kolokotronis; Nell Kirchberger; Isaac Garcia; David Gandara; Sean Dhanraj; Tanzina Nawrin; Yogesh Saletore; Noah Alexander; Priyanka Vijay; Elizabeth M. Hénaff; Paul Zumbo; Michael Walsh; Gregory D. O’Mullan; Scott Tighe

We appreciate the comments of Ackelsberg et al. (Ackelsberg et al., 2015xAckelsberg, J., Rakeman, J., Hughes, S., Peterson, J., Mead, P., Schriefer, M., Kingry, L., Hoffmaster, A., and Gee, J. Cell Syst. 2015; 1: 4–5Abstract | Full Text | Full Text PDF | Scopus (1)See all ReferencesAckelsberg et al., 2015) and have decided to revise the paper (Afshinnekoo et al., 2015xAfshinnekoo, E., Meydan, C., Chowdhury, S., Jaroudi, D., Boyer, C., Bernstein, N., Maritz, J.M., Reeves, D., Gandara, J., Chhangawala, S. et al. Cell Syst. 2015; 1: 72–87Abstract | Full Text | Full Text PDF | Scopus (23)See all ReferencesAfshinnekoo et al., 2015) as follows:Figure 3B has been corrected to show the general coverage of the Yersinia pestis pMT1 plasmid, but not the murine toxin gene (yMT). The initial claim of “…consistent 20× coverage across the murine toxin gene…” was erroneously based on looking at annotations from related plasmids and comparing different reference sequences. In actuality no reads mapped to the yMT gene.The result of low coverage to the Bacillus anthracis plasmids (pXO1 and pXO2) and no evidence of plcR SNP—an often defining feature of anthrax—is now reported in the Results section.The language in the Summary, Results, and Discussion has been revised, and speculative text about pathogenic organisms has been deleted. We now state that although all our metagenomic analysis tools identified reads with similarity to B. anthracis and Y. pestis sequences, there is minimal coverage to the backbone genome of these organisms, and there is no strong evidence to suggest these organisms are in fact present and no evidence of pathogenicity.Furthermore, in regards to the concerns of the culture methods we have posted subsequent details on the study website (http://www.pathomap.org/2015/04/13/culture-methods/) and below.A second culture experiment was performed to address the question of antibiotic resistance (Afshinnekoo et al., 2015xAfshinnekoo, E., Meydan, C., Chowdhury, S., Jaroudi, D., Boyer, C., Bernstein, N., Maritz, J.M., Reeves, D., Gandara, J., Chhangawala, S. et al. Cell Syst. 2015; 1: 72–87Abstract | Full Text | Full Text PDF | Scopus (23)See all ReferencesAfshinnekoo et al., 2015, Figure 4A). Bacteria were cultured in LB agar and then spread onto LB plates, after lawn growth, single colonies were picked and then plated onto antibiotic plates (kanamycin – 50 ug/ml, chloramphenicol – 35 ug/ml, and ampicillin – 100 ug/ml) and growth was assessed. Plates were incubated at 37°C. As a control, air samples were taken and cultured at every location. In all cases, these did not yield growth. The non-selective plate done last when replica plating also serves as a control. There was no quantitative confirmation of bacterial versus non-bacterial organisms, although there was no observable fungal growth in the samples. Further experiments are being done to dive deeper into the question of viability of microorganisms on the subway system as well as the presence of antibiotic-resistant bacteria.The field of metagenomics is relatively new but has great potential to serve an incredibly important role both in our understanding of the world around us—with key applications in the built environment—as well as the clinical realm. Nevertheless, there are still major hurdles and challenges that the field faces in order to realize this potential. We welcome and appreciate the discussion (http://microbe.net/2015/02/17/the-long-road-from-data-to-wisdom-and-from-dna-to-pathogen/) prompted by our study, and we anticipate that this large dataset will enable further experimentation, additional testing of taxonomic tools, and hopefully help in developing methodologies for metagenomic analysis.


Mbio | 2016

Features of Circulating Parainfluenza Virus Required for Growth in Human Airway

Laura M. Palermo; Manik Uppal; Lucy Skrabanek; Paul Zumbo; Soren Germer; B. K. Rima; Devra Huey; Stefan Niewiesk; Matteo Porotto; Anne Moscona

ABSTRACT Respiratory paramyxoviruses, including the highly prevalent human parainfluenza viruses, cause the majority of childhood croup, bronchiolitis, and pneumonia, yet there are currently no vaccines or effective treatments. Paramyxovirus research has relied on the study of laboratory-adapted strains of virus in immortalized cultured cell lines. We show that findings made in such systems about the receptor interaction and viral fusion requirements for entry and fitness—mediated by the receptor binding protein and the fusion protein—can be drastically different from the requirements for infection in vivo. Here we carried out whole-genome sequencing and genomic analysis of circulating human parainfluenza virus field strains to define functional and structural properties of proteins of circulating strains and to identify the genetic basis for properties that confer fitness in the field. The analysis of clinical strains suggests that the receptor binding-fusion molecule pairs of circulating viruses maintain a balance of properties that result in an inverse correlation between fusion in cultured cells and growth in vivo. Future analysis of entry mechanisms and inhibitory strategies for paramyxoviruses will benefit from considering the properties of viruses that are fit to infect humans, since a focus on viruses that have adapted to laboratory work provides a distinctly different picture of the requirements for the entry step of infection. IMPORTANCE Mechanistic information about viral infection—information that impacts antiviral and vaccine development—is generally derived from viral strains grown under laboratory conditions in immortalized cells. This study uses whole-genome sequencing of clinical strains of human parainfluenza virus 3—a globally important respiratory paramyxovirus—in cell systems that mimic the natural human host and in animal models. By examining the differences between clinical isolates and laboratory-adapted strains, the sequence differences are correlated to mechanistic differences in viral entry. For this ubiquitous and pathogenic respiratory virus to infect the human lung, modulation of the processes of receptor engagement and fusion activation occur in a manner quite different from that carried out by the entry glycoprotein-expressing pair of laboratory strains. These marked contrasts in the viral properties necessary for infection in cultured immortalized cells and in natural host tissues and animals will influence future basic and clinical studies. Mechanistic information about viral infection—information that impacts antiviral and vaccine development—is generally derived from viral strains grown under laboratory conditions in immortalized cells. This study uses whole-genome sequencing of clinical strains of human parainfluenza virus 3—a globally important respiratory paramyxovirus—in cell systems that mimic the natural human host and in animal models. By examining the differences between clinical isolates and laboratory-adapted strains, the sequence differences are correlated to mechanistic differences in viral entry. For this ubiquitous and pathogenic respiratory virus to infect the human lung, modulation of the processes of receptor engagement and fusion activation occur in a manner quite different from that carried out by the entry glycoprotein-expressing pair of laboratory strains. These marked contrasts in the viral properties necessary for infection in cultured immortalized cells and in natural host tissues and animals will influence future basic and clinical studies.

Collaboration


Dive into the Paul Zumbo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ross L. Levine

Memorial Sloan Kettering Cancer Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge