Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sandrine Dudoit is active.

Publication


Featured researches published by Sandrine Dudoit.


Genome Biology | 2004

Bioconductor: open software development for computational biology and bioinformatics

Robert Gentleman; Vincent J. Carey; Douglas M. Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano M. Iacus; Rafael A. Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony Rossini; Gunther Sawitzki; Colin A. Smith; Gordon K. Smyth; Luke Tierney; Jean Yee Hwa Yang; Jianhua Zhang

The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.


Journal of the American Statistical Association | 2002

Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data

Sandrine Dudoit; Jane Fridlyand; Terence P. Speed

A reliable and precise classification of tumors is essential for successful diagnosis and treatment of cancer. cDNA microarrays and high-density oligonucleotide chips are novel biotechnologies increasingly used in cancer research. By allowing the monitoring of expression levels in cells for thousands of genes simultaneously, microarray experiments may lead to a more complete understanding of the molecular variations among tumors and hence to a finer and more informative classification. The ability to successfully distinguish between tumor classes (already known or yet to be discovered) using gene expression data is an important aspect of this novel approach to cancer classification. This article compares the performance of different discrimination methods for the classification of tumors based on gene expression data. The methods include nearest-neighbor classifiers, linear discriminant analysis, and classification trees. Recent machine learning approaches, such as bagging and boosting, are also considered. The discrimination methods are applied to datasets from three recently published cancer gene expression studies.


BMC Bioinformatics | 2010

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments

James H. Bullard; Elizabeth Purdom; Kasper D. Hansen; Sandrine Dudoit

BackgroundHigh-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data.ResultsWe compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection.ConclusionsOur results have significant practical and methodological implications for the design and analysis of mRNA-Seq experiments. They highlight the importance of appropriate statistical methods for normalization and DE inference, to account for features of the sequencing platform that could impact the accuracy of results. They also reveal the need for further research in the development of statistical and computational methods for mRNA-Seq.


Proceedings of the National Academy of Sciences of the United States of America | 2002

Diversity, topographic differentiation, and positional memory in human fibroblasts.

Howard Y. Chang; Jen-Tsan Chi; Sandrine Dudoit; Chanda Bondre; Matt van de Rijn; David Botstein; Patrick O. Brown

A fundamental feature of the architecture and functional design of vertebrate animals is a stroma, composed of extracellular matrix and mesenchymal cells, which provides a structural scaffold and conduit for blood and lymphatic vessels, nerves, and leukocytes. Reciprocal interactions between mesenchymal and epithelial cells are known to play a critical role in orchestrating the development and morphogenesis of tissues and organs, but the roles played by specific stromal cells in controlling the design and function of tissues remain poorly understood. The principal cells of stromal tissue are called fibroblasts, a catch-all designation that belies their diversity. We characterized genome-wide patterns of gene expression in cultured fetal and adult human fibroblasts derived from skin at different anatomical sites. Fibroblasts from each site displayed distinct and characteristic transcriptional patterns, suggesting that fibroblasts at different locations in the body should be considered distinct differentiated cell types. Notable groups of differentially expressed genes included some implicated in extracellular matrix synthesis, lipid metabolism, and cell signaling pathways that control proliferation, cell migration, and fate determination. Several genes implicated in genetic diseases were found to be expressed in fibroblasts in an anatomic pattern that paralleled the phenotypic defects. Finally, adult fibroblasts maintained key features of HOX gene expression patterns established during embryogenesis, suggesting that HOX genes may direct topographic differentiation and underlie the detailed positional memory in fibroblasts.


Genome Biology | 2002

A prediction-based resampling method for estimating the number of clusters in a dataset

Sandrine Dudoit; Jane Fridlyand

BackgroundMicroarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems.ResultsWe have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study.ConclusionsFocusing on prediction accuracy in conjunction with resampling produces accurate and robust estimates of the number of clusters.


Journal of Computational and Graphical Statistics | 2002

Comparison of Methods for Image Analysis on cDNA Microarray Data

Yee Hwa Yang; Michael J. Buckley; Sandrine Dudoit; Terence P. Speed

Microarrays are part of a new class of biotechnologies which allow the monitoring of expression levels for thousands of genes simultaneously. Image analysis is an important aspect of microarray experiments, one that can have a potentially large impact on subsequent analyses such as clustering or the identification of differentially expressed genes. This article reviews a number of existing image analysis approaches for cDNA microarray experiments and proposes new addressing, segmentation, and background correction methods for extracting information from microarray scanned images. The segmentation component uses a seeded region growing algorithm which makes provision for spots of different shapes and sizes. The background estimation approach is based on an image analysis technique known as morphological opening. These new image analysis procedures are implemented in a software package named Spot, built on the R environment for statistical computing. The statistical properties of the different segmentation and background adjustment methods are examined using microarray data from a study of lipid metabolism in mice. It is shown that in some cases background adjustment can substantially reduce the precision—that is, increase the variability—of low-intensity spot values. In contrast, the choice of segmentation procedure has a smaller impact. The comparison further suggests that seeded region growing segmentation with morphological background correction provides precise and accurate estimates of foreground and background intensities.


Test | 2003

Resampling-based Multiple Testing for Microarray Data Analysis

Youngchao Ge; Sandrine Dudoit; Terence P. Speed

The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. Westfall and Young (1993) propose resampling-basedp-value adjustment procedures which are highly relevant to microarray experiments. This article discusses different criteria for error control in resampling-based multiple testing, including (a) the family wise error rate of West-fall and Young (1993) and (b) the false discovery rate developed by Benjamini and Hochberg (1995), both from a frequentist viewpoint; and (c) the positive false discovery rate of Storey (2002a), which has a Bayesian motivation. We also introduce our recently developed fast algorithm for implementing the minP adjustment to control family-wise error rate. Adjustedp-values for different approaches are applied to gene expression data from two recently published microarray studies. The properties of these procedures for multiple testing are compared.


Nucleic Acids Research | 2010

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Kasper D. Hansen; Steven E. Brenner; Sandrine Dudoit

Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.


Bioinformatics | 2003

Bagging to improve the accuracy of a clustering procedure

Sandrine Dudoit; Jane Fridlyand

MOTIVATION The microarray technology is increasingly being applied in biological and medical research to address a wide range of problems such as the classification of tumors. An important statistical question associated with tumor classification is the identification of new tumor classes using gene expression profiles. Essential aspects of this clustering problem include identifying accurate partitions of the tumor samples into clusters and assessing the confidence of cluster assignments for individual samples. RESULTS Two new resampling methods, inspired from bagging in prediction, are proposed to improve and assess the accuracy of a given clustering procedure. In these ensemble methods, a partitioning clustering procedure is applied to bootstrap learning sets and the resulting multiple partitions are combined by voting or the creation of a new dissimilarity matrix. As in prediction, the motivation behind bagging is to reduce variability in the partitioning results via averaging. The performances of the new and existing methods were compared using simulated data and gene expression data from two recently published cancer microarray studies. The bagged clustering procedures were in general at least as accurate and often substantially more accurate than a single application of the partitioning clustering procedure. A valuable by-product of bagged clustering are the cluster votes which can be used to assess the confidence of cluster assignments for individual observations. SUPPLEMENTARY INFORMATION For supplementary information on datasets, analyses, and software, consult http://www.stat.berkeley.edu/~sandrine and http://www.bioconductor.org.


Proceedings of the National Academy of Sciences of the United States of America | 2002

Stereotyped and specific gene expression programs in human innate immune responses to bacteria.

Jennifer C. Boldrick; Ash A. Alizadeh; Maximilian Diehn; Sandrine Dudoit; Chih Long Liu; Christopher E. Belcher; David Botstein; Louis M. Staudt; Patrick O. Brown; David A. Relman

The innate immune response is crucial for defense against microbial pathogens. To investigate the molecular choreography of this response, we carried out a systematic examination of the gene expression program in human peripheral blood mononuclear cells responding to bacteria and bacterial products. We found a remarkably stereotyped program of gene expression induced by bacterial lipopolysaccharide and diverse killed bacteria. An intricately choreographed expression program devoted to communication between cells was a prominent feature of the response. Other features suggested a molecular program for commitment of antigen-presenting cells to antigens captured in the context of bacterial infection. Despite the striking similarities, there were qualitative and quantitative differences in the responses to different bacteria. Modulation of this host-response program by bacterial virulence mechanisms was an important source of variation in the response to different bacteria.

Collaboration


Dive into the Sandrine Dudoit's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Terence P. Speed

Walter and Eliza Hall Institute of Medical Research

View shared research outputs
Top Co-Authors

Avatar

Alain Barrier

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sunduz Keles

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge