Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rafael A. Irizarry is active.

Publication


Featured researches published by Rafael A. Irizarry.


Genome Biology | 2004

Bioconductor: open software development for computational biology and bioinformatics

Robert Gentleman; Vincent J. Carey; Douglas M. Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano M. Iacus; Rafael A. Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony Rossini; Gunther Sawitzki; Colin A. Smith; Gordon K. Smyth; Luke Tierney; Jean Yee Hwa Yang; Jianhua Zhang

The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.


Bioinformatics | 2003

A comparison of normalization methods for high density oligonucleotide array data based on variance and bias

Benjamin M. Bolstad; Rafael A. Irizarry; Magnus Åstrand; Terence P. Speed

MOTIVATION When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. RESULTS We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. AVAILABILITY Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. SUPPLEMENTARY INFORMATION Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html


Bioinformatics | 2004

affy---analysis of Affymetrix GeneChip data at the probe level

Laurent Gautier; Leslie Cope; Benjamin M. Bolstad; Rafael A. Irizarry

MOTIVATION The processing of the Affymetrix GeneChip data has been a recent focus for data analysts. Alternatives to the original procedure have been proposed and some of these new methods are widely used. RESULTS The affy package is an R package of functions and classes for the analysis of oligonucleotide arrays manufactured by Affymetrix. The package is currently in its second release, affy provides the user with extreme flexibility when carrying out an analysis and make it possible to access and manipulate probe intensity data. In this paper, we present the main classes and functions in the package and demonstrate how they can be used to process probe-level data. We also demonstrate the importance of probe-level analysis when using the Affymetrix GeneChip platform.


Journal of the American Statistical Association | 2004

A Model Based Background Adjustment for Oligonucleotide Expression Arrays

Zhijin Wu; Rafael A. Irizarry; Robert Gentleman; Francisco Martinez-Murillo; Forrest Spencer

High-density oligonucleotide expression arrays are widely used in many areas of biomedical research. Affymetrix GeneChip arrays are the most popular. In the Affymetrix system, a fair amount of further preprocessing and data reduction occurs after the image-processing step. Statistical procedures developed by academic groups have been successful in improving the default algorithms provided by the Affymetrix system. In this article we present a solution to one of the preprocessing steps—background adjustment—based on a formal statistical framework. Our solution greatly improves the performance of the technology in various practical applications. These arrays use short oligonucleotides to probe for genes in an RNA sample. Typically, each gene is represented by 11–20 pairs of oligonucleotide probes. The first component of these pairs is referred to as a perfect match probe and is designed to hybridize only with transcripts from the intended gene (i. e., specific hybridization). However, hybridization by other sequences (i. e., nonspecific hybridization) is unavoidable. Furthermore, hybridization strengths are measured by a scanner that introduces optical noise. Therefore, the observed intensities need to be adjusted to give accurate measurements of specific hybridization. We have found that the default ad hoc adjustment, provided as part of the Affymetrix system, can be improved through the use of estimators derived from a statistical model that uses probe sequence information. A final step in preprocessing is to summarize the probe-level data for each gene to define a measure of expression that represents the amount of the corresponding mRNA species. In this article we illustrate the practical consequences of not adjusting appropriately for the presence of nonspecific hybridization and provide a solution based on our background adjustment procedure. Software that computes our adjustment is available as part of the Bioconductor Project (http://www.bioconductor.org).


Nature | 2010

Epigenetic memory in induced pluripotent stem cells

Kitai Kim; Akiko Doi; Bo Wen; Kitwa Ng; Rui Zhao; Patrick Cahan; J. Kim; Martin J. Aryee; Hongkai Ji; Lauren I. R. Ehrlich; Akiko Yabuuchi; Ayumu Takeuchi; K. C. Cunniff; Huo Hongguang; Shannon McKinney-Freeman; Olaia Naveiras; Tae-Min Yoon; Rafael A. Irizarry; Namyoung Jung; Jun Seita; Jacob Hanna; Peter Murakami; Rudolf Jaenisch; Ralph Weissleder; Stuart H. Orkin; Irving L. Weissman; Andrew P. Feinberg; George Q. Daley

Somatic cell nuclear transfer and transcription-factor-based reprogramming revert adult cells to an embryonic state, and yield pluripotent stem cells that can generate all tissues. Through different mechanisms and kinetics, these two reprogramming methods reset genomic methylation, an epigenetic modification of DNA that influences gene expression, leading us to hypothesize that the resulting pluripotent stem cells might have different properties. Here we observe that low-passage induced pluripotent stem cells (iPSCs) derived by factor-based reprogramming of adult murine tissues harbour residual DNA methylation signatures characteristic of their somatic tissue of origin, which favours their differentiation along lineages related to the donor cell, while restricting alternative cell fates. Such an ‘epigenetic memory’ of the donor tissue could be reset by differentiation and serial reprogramming, or by treatment of iPSCs with chromatin-modifying drugs. In contrast, the differentiation and methylation of nuclear-transfer-derived pluripotent stem cells were more similar to classical embryonic stem cells than were iPSCs. Our data indicate that nuclear transfer is more effective at establishing the ground state of pluripotency than factor-based reprogramming, which can leave an epigenetic memory of the tissue of origin that may influence efforts at directed differentiation for applications in disease modelling or treatment.


Nature Genetics | 2009

The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores

Rafael A. Irizarry; Christine Ladd-Acosta; Bo Wen; Zhijin Wu; Carolina Montano; Patrick Onyango; Hengmi Cui; Kevin Gabo; Michael Rongione; Maree J. Webster; Hong-Fei Ji; James B. Potash; Sarven Sabunciyan; Andrew P. Feinberg

Alterations in DNA methylation (DNAm) in cancer have been known for 25 years, including hypomethylation of oncogenes and hypermethylation of tumor suppressor genes1. However, most studies of cancer methylation have assumed that functionally important DNAm will occur in promoters, and that most DNAm changes in cancer occur in CpG islands2,3. Here we show that most methylation alterations in colon cancer occur not in promoters, and also not in CpG islands but in sequences up to 2 kb distant which we term “CpG island shores.” CpG island shore methylation was strongly related to gene expression, and it was highly conserved in mouse, discriminating tissue types regardless of species of origin. There was a surprising overlap (45-65%) of the location of colon cancer-related methylation changes with those that distinguished normal tissues, with hypermethylation enriched closer to the associated CpG islands, and hypomethylation enriched further from the associated CpG island and resembling non-colon normal tissues. Thus, methylation changes in cancer are at sites that vary normally in tissue differentiation, and they are consistent with the epigenetic progenitor model of cancer4, that epigenetic alterations affecting tissue-specific differentiation are the predominant mechanism by which epigenetic changes cause cancer.


Nature Genetics | 2009

Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts

Akiko Doi; In-Hyun Park; Bo Wen; Peter Murakami; Martin J. Aryee; Rafael A. Irizarry; Brian Herb; Christine Ladd-Acosta; Junsung Rho; Sabine Loewer; Justine D. Miller; Thorsten M. Schlaeger; George Q. Daley; Andrew P. Feinberg

Induced pluripotent stem (iPS) cells are derived by epigenetic reprogramming, but their DNA methylation patterns have not yet been analyzed on a genome-wide scale. Here, we find substantial hypermethylation and hypomethylation of cytosine-phosphate-guanine (CpG) island shores in nine human iPS cell lines as compared to their parental fibroblasts. The differentially methylated regions (DMRs) in the reprogrammed cells (denoted R-DMRs) were significantly enriched in tissue-specific (T-DMRs; 2.6-fold, P < 10−4) and cancer-specific DMRs (C-DMRs; 3.6-fold, P < 10−4). Notably, even though the iPS cells are derived from fibroblasts, their R-DMRs can distinguish between normal brain, liver and spleen cells and between colon cancer and normal colon cells. Thus, many DMRs are broadly involved in tissue differentiation, epigenetic reprogramming and cancer. We observed colocalization of hypomethylated R-DMRs with hypermethylated C-DMRs and bivalent chromatin marks, and colocalization of hypermethylated R-DMRs with hypomethylated C-DMRs and the absence of bivalent marks, suggesting two mechanisms for epigenetic reprogramming in iPS cells and cancer.


Nature Methods | 2005

Multiple-laboratory comparison of microarray platforms

Rafael A. Irizarry; Daniel S. Warren; Forrest Spencer; Irene F. Kim; Shyam Biswal; Bryan Frank; Edward Gabrielson; Joe G. N. Garcia; Joel Geoghegan; Gregory G. Germino; Constance A. Griffin; Sara Hilmer; Eric P. Hoffman; Anne E. Jedlicka; Ernest S. Kawasaki; Francisco Martinez-Murillo; Laura A. Morsberger; Hannah Lee; David Petersen; John Quackenbush; Alan F. Scott; Michael Wilson; Yanqin Yang; Shui Qing Ye; Wayne Yu

Microarray technology is a powerful tool for measuring RNA expression for thousands of genes at once. Various studies have been published comparing competing platforms with mixed results: some find agreement, others do not. As the number of researchers starting to use microarrays and the number of cross-platform meta-analysis studies rapidly increases, appropriate platform assessments become more important. Here we present results from a comparison study that offers important improvements over those previously described in the literature. In particular, we noticed that none of the previously published papers consider differences between labs. For this study, a consortium of ten laboratories from the Washington, DC–Baltimore, USA, area was formed to compare data obtained from three widely used platforms using identical RNA samples. We used appropriate statistical analysis to demonstrate that there are relatively large differences in data obtained in labs using the same platform, but that the results from the best-performing labs agree rather well.


Nature Reviews Genetics | 2010

Tackling the widespread and critical impact of batch effects in high-throughput data

Jeffrey T. Leek; Robert B. Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W. Evan Johnson; Donald Geman; Keith A. Baggerly; Rafael A. Irizarry

High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. One often overlooked complication with such studies is batch effects, which occur because measurements are affected by laboratory conditions, reagent lots and personnel differences. This becomes a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and our own analyses, we argue that batch effects (as well as other technical and biological artefacts) are widespread and critical to address. We review experimental and computational approaches for doing so.


Nature Methods | 2015

Orchestrating high-throughput genomic analysis with Bioconductor

Wolfgang Huber; Vincent J. Carey; Robert Gentleman; Simon Anders; Marc Carlson; Benilton Carvalho; Héctor Corrada Bravo; Sean Davis; Laurent Gatto; Thomas Girke; Raphael Gottardo; Florian Hahne; Kasper D. Hansen; Rafael A. Irizarry; Michael S. Lawrence; Michael I. Love; James W. MacDonald; Valerie Obenchain; Andrzej K. Oleś; Hervé Pagès; Alejandro Reyes; Paul Shannon; Gordon K. Smyth; Dan Tenenbaum; Levi Waldron; Martin Morgan

Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.

Collaboration


Dive into the Rafael A. Irizarry's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Benilton Carvalho

State University of Campinas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Forrest Spencer

Johns Hopkins University School of Medicine

View shared research outputs
Top Co-Authors

Avatar

Matthew N. McCall

University of Rochester Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge