Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Héctor Corrada Bravo is active.

Publication


Featured researches published by Héctor Corrada Bravo.


Nature Reviews Genetics | 2010

Tackling the widespread and critical impact of batch effects in high-throughput data

Jeffrey T. Leek; Robert B. Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W. Evan Johnson; Donald Geman; Keith A. Baggerly; Rafael A. Irizarry

High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. One often overlooked complication with such studies is batch effects, which occur because measurements are affected by laboratory conditions, reagent lots and personnel differences. This becomes a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and our own analyses, we argue that batch effects (as well as other technical and biological artefacts) are widespread and critical to address. We review experimental and computational approaches for doing so.


Nature Methods | 2015

Orchestrating high-throughput genomic analysis with Bioconductor

Wolfgang Huber; Vincent J. Carey; Robert Gentleman; Simon Anders; Marc Carlson; Benilton Carvalho; Héctor Corrada Bravo; Sean Davis; Laurent Gatto; Thomas Girke; Raphael Gottardo; Florian Hahne; Kasper D. Hansen; Rafael A. Irizarry; Michael S. Lawrence; Michael I. Love; James W. MacDonald; Valerie Obenchain; Andrzej K. Oleś; Hervé Pagès; Alejandro Reyes; Paul Shannon; Gordon K. Smyth; Dan Tenenbaum; Levi Waldron; Martin Morgan

Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.


Nature Methods | 2013

Differential abundance analysis for microbial marker-gene surveys

Joseph N. Paulson; O. Colin Stine; Héctor Corrada Bravo; Mihai Pop

We introduce a methodology to assess differential abundance in sparse high-throughput microbial marker-gene survey data. Our approach, implemented in the metagenomeSeq Bioconductor package, relies on a novel normalization technique and a statistical model that accounts for undersampling-a common feature of large-scale marker-gene studies. Using simulated data and several published microbiota data sets, we show that metagenomeSeq outperforms the tools currently used in this field.We introduce a methodology to assess differential abundance in sparse high-throughput microbial marker-gene survey data. Our approach, implemented in the metagenomeSeq Bioconductor package, relies on a novel normalization technique and a statistical model that accounts for undersampling—a common feature of large-scale marker-gene studies. Using simulated data and several published microbiota data sets, we show that metagenomeSeq outperforms the tools currently used in this field.


Genome Biology | 2014

Diarrhea in young children from low-income countries leads to large-scale alterations in intestinal microbiota composition

Mihai Pop; Alan W. Walker; Joseph N. Paulson; Brianna Lindsay; Martin Antonio; M. Anowar Hossain; Joseph Oundo; Boubou Tamboura; Volker Mai; Irina Astrovskaya; Héctor Corrada Bravo; Richard Rance; Mark D. Stares; Myron M. Levine; Sandra Panchalingam; Karen Kotloff; Usman N. Ikumapayi; Chinelo Ebruke; Mitchell Adeyemi; Dilruba Ahmed; Firoz Ahmed; Meer T. Alam; Ruhul Amin; Sabbir Siddiqui; John B. Ochieng; Emmanuel Ouma; Jane Juma; Euince Mailu; Richard Omore; J. Glenn Morris

BackgroundDiarrheal diseases continue to contribute significantly to morbidity and mortality in infants and young children in developing countries. There is an urgent need to better understand the contributions of novel, potentially uncultured, diarrheal pathogens to severe diarrheal disease, as well as distortions in normal gut microbiota composition that might facilitate severe disease.ResultsWe use high throughput 16S rRNA gene sequencing to compare fecal microbiota composition in children under five years of age who have been diagnosed with moderate to severe diarrhea (MSD) with the microbiota from diarrhea-free controls. Our study includes 992 children from four low-income countries in West and East Africa, and Southeast Asia. Known pathogens, as well as bacteria currently not considered as important diarrhea-causing pathogens, are positively associated with MSD, and these include Escherichia/Shigella, and Granulicatella species, and Streptococcus mitis/pneumoniae groups. In both cases and controls, there tend to be distinct negative correlations between facultative anaerobic lineages and obligate anaerobic lineages. Overall genus-level microbiota composition exhibit a shift in controls from low to high levels of Prevotella and in MSD cases from high to low levels of Escherichia/Shigella in younger versus older children; however, there was significant variation among many genera by both site and age.ConclusionsOur findings expand the current understanding of microbiota-associated diarrhea pathogenicity in young children from developing countries. Our findings are necessarily based on correlative analyses and must be further validated through epidemiological and molecular techniques.


Journal of Virology | 2011

Influence of host gene transcription level and orientation on HIV-1 latency in a primary cell model

Liang Shan; Hung-Chih Yang; S. Alireza Rabi; Héctor Corrada Bravo; Neeta S. Shroff; Rafael A. Irizarry; Hao Zhang; Joseph B. Margolick; Janet D. Siliciano; Robert F. Siliciano

ABSTRACT Human immunodeficiency virus type 1 (HIV-1) establishes a latent reservoir in resting memory CD4+ T cells. This latent reservoir is a major barrier to the eradication of HIV-1 in infected individuals and is not affected by highly active antiretroviral therapy (HAART). Reactivation of latent HIV-1 is a possible strategy for elimination of this reservoir. The mechanisms with which latency is maintained are unclear. In the analysis of the regulation of HIV-1 gene expression, it is important to consider the nature of HIV-1 integration sites. In this study, we analyzed the integration and transcription of latent HIV-1 in a primary CD4+ T cell model of latency. The majority of integration sites in latently infected cells were in introns of transcription units. Serial analysis of gene expression (SAGE) demonstrated that more than 90% of those host genes harboring a latent integrated provirus were transcriptionally active, mostly at high levels. For latently infected cells, we observed a modest preference for integration in the same transcriptional orientation as the host gene (63.8% versus 36.2%). In contrast, this orientation preference was not observed in acutely infected or persistently infected cells. These results suggest that transcriptional interference may be one of the important factors in the establishment and maintenance of HIV-1 latency. Our findings suggest that disrupting the negative control of HIV-1 transcription by upstream host promoters could facilitate the reactivation of latent HIV-1 in some resting CD4+ T cells.


Biometrics | 2010

Model-Based Quality Assessment and Base-Calling for Second-Generation Sequencing Data

Héctor Corrada Bravo; Rafael A. Irizarry

Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or Ts, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance.


Genome Medicine | 2010

Overcoming bias and systematic errors in next generation sequencing data

Margaret A. Taub; Héctor Corrada Bravo; Rafael A. Irizarry

Considerable time and effort has been spent in developing analysis and quality assessment methods to allow the use of microarrays in a clinical setting. As is the case for microarrays and other high-throughput technologies, data from new high-throughput sequencing technologies are subject to technological and biological biases and systematic errors that can impact downstream analyses. Only when these issues can be readily identified and reliably adjusted for will clinical applications of these new technologies be feasible. Although much work remains to be done in this area, we describe consistently observed biases that should be taken into account when analyzing high-throughput sequencing data. In this article, we review current knowledge about these biases, discuss their impact on analysis results, and propose solutions.


Genome Medicine | 2014

Large hypomethylated blocks as a universal defining epigenetic alteration in human solid tumors

Winston Timp; Héctor Corrada Bravo; Oliver G. McDonald; Michael Goggins; Chris Umbricht; Martha A. Zeiger; Andrew P. Feinberg; Rafael A. Irizarry

BackgroundOne of the most provocative recent observations in cancer epigenetics is the discovery of large hypomethylated blocks, including single copy genes, in colorectal cancer, that correspond in location to heterochromatic LOCKs (large organized chromatin lysine-modifications) and LADs (lamin-associated domains).MethodsHere we performed a comprehensive genome-scale analysis of 10 breast, 28 colon, nine lung, 38 thyroid, 18 pancreas cancers, and five pancreas neuroendocrine tumors as well as matched normal tissue from most of these cases, as well as 51 premalignant lesions. We used a new statistical approach that allows the identification of large hypomethylated blocks on the Illumina HumanMethylation450 BeadChip platform.ResultsWe find that hypomethylated blocks are a universal feature of common solid human cancer, and that they occur at the earliest stage of premalignant tumors and progress through clinical stages of thyroid and colon cancer development. We also find that the disrupted CpG islands widely reported previously, including hypermethylated island bodies and hypomethylated shores, are enriched in hypomethylated blocks, with flattening of the methylation signal within and flanking the islands. Finally, we found that genes showing higher between individual gene expression variability are enriched within these hypomethylated blocks.ConclusionThus hypomethylated blocks appear to be a universal defining epigenetic alteration in human cancer, at least for common solid tumors.


PLOS Pathogens | 2016

Transcriptome Remodeling in Trypanosoma cruzi and Human Cells during Intracellular Infection

Yuan Li; Sheena Shah-Simpson; Kwame Okrah; A. Trey Belew; Jungmin Choi; Kacey L. Caradonna; Prasad K. Padmanabhan; David M. Ndegwa; M. Ramzi Temanni; Héctor Corrada Bravo; Najib M. El-Sayed; Barbara A. Burleigh

Intracellular colonization and persistent infection by the kinetoplastid protozoan parasite, Trypanosoma cruzi, underlie the pathogenesis of human Chagas disease. To obtain global insights into the T. cruzi infective process, transcriptome dynamics were simultaneously captured in the parasite and host cells in an infection time course of human fibroblasts. Extensive remodeling of the T. cruzi transcriptome was observed during the early establishment of intracellular infection, coincident with a major developmental transition in the parasite. Contrasting this early response, few additional changes in steady state mRNA levels were detected once mature T. cruzi amastigotes were formed. Our findings suggest that transcriptome remodeling is required to establish a modified template to guide developmental transitions in the parasite, whereas homeostatic functions are regulated independently of transcriptomic changes, similar to that reported in related trypanosomatids. Despite complex mechanisms for regulation of phenotypic expression in T. cruzi, transcriptomic signatures derived from distinct developmental stages mirror known or projected characteristics of T. cruzi biology. Focusing on energy metabolism, we were able to validate predictions forecast in the mRNA expression profiles. We demonstrate measurable differences in the bioenergetic properties of the different mammalian-infective stages of T. cruzi and present additional findings that underscore the importance of mitochondrial electron transport in T. cruzi amastigote growth and survival. Consequences of T. cruzi colonization for the host include dynamic expression of immune response genes and cell cycle regulators with upregulation of host cholesterol and lipid synthesis pathways, which may serve to fuel intracellular T. cruzi growth. Thus, in addition to the biological inferences gained from gene ontology and functional enrichment analysis of differentially expressed genes in parasite and host, our comprehensive, high resolution transcriptomic dataset provides a substantially more detailed interpretation of T. cruzi infection biology and offers a basis for future drug and vaccine discovery efforts.


Genome Biology | 2011

Metastats: an improved statistical method for analysis of metagenomic data

Joseph N. Paulson; Mihai Pop; Héctor Corrada Bravo

Metagenomic studies were originally focused on exploratory/validation projects but are rapidly being applied in a clinical setting. In this setting, researchers are interested in finding characteristics of the microbiome that correlate with the clinical status of the corresponding sample. Comparatively few computational/statistical tools have been developed that can assist in this process. Rather, most developments in the metagenomics community have focused on methods that compare samples as a whole. Specifically, the focus has been on developing robust methods for determining the level of similarity or difference between samples, rather than on identifying the specific characteristics that distinguish different samples from each other. Metastats [1] was the first statistical method developed specifically to address the questions asked in clinical studies. Metastats allows a comparison of metagenomic samples (represented as counts of individual features such as organisms, genes and functional groups) from two treatment populations (for example, healthy versus disease) and identifies those features that statistically distinguish the two populations. Here, we present major improvements to the Metastats software and the underlying statistical methods. First, we describe new approaches for data normalization that allow a more accurate assessment of differential abundance by reducing the covariance between individual features implicitly introduced by the traditionally used ratio-based normalization. These normalization techniques are also of interest for time-series analyses or in the estimation of microbial networks. A second extension of Metastats is a mixed-model zero-inflated Gaussian distribution that allows Metastats to account for a common characteristic of metagenomic data: the presence of many features with zero counts owing to undersampling of the community. The number of ‘missing features’ (zero counts) correlates with the amount of sequencing performed, thereby biasing abundance measurements and the differential abundance statistics derived from them. Using simulated and real data, we show that these methods significantly improve the accuracy of Metastats. We also describe the addition of several new statistical tests to our code (including presence/absence and the corresponding odds ratio, and penetrance calculations) that improve the usability of our software in clinical practice.

Collaboration


Dive into the Héctor Corrada Bravo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Grace Wahba

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kevin H. Eng

Roswell Park Cancer Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nathan D. Olson

National Institute of Standards and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge