Is this you? Create Your Porfile

Terence P. Speed

Walter and Eliza Hall Institute of Medical Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Terence P. Speed is active.

Explore More

Publication

Featured researches published by Terence P. Speed.

Bioinformatics | 2003

A comparison of normalization methods for high density oligonucleotide array data based on variance and bias

Benjamin M. Bolstad; Rafael A. Irizarry; Magnus Åstrand; Terence P. Speed

MOTIVATION When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. RESULTS We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. AVAILABILITY Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. SUPPLEMENTARY INFORMATION Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html

Cancer Cell | 2010

Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA, IDH1, EGFR, and NF1

Roel G.W. Verhaak; Katherine A. Hoadley; Elizabeth Purdom; Victoria Wang; Yuan Qi; Matthew D. Wilkerson; C. Ryan Miller; Li Ding; Todd R. Golub; Jill P. Mesirov; Gabriele Alexe; Michael S. Lawrence; Michael O'Kelly; Pablo Tamayo; Barbara A. Weir; Stacey Gabriel; Wendy Winckler; Supriya Gupta; Lakshmi Jakkula; Heidi S. Feiler; J. Graeme Hodgson; C. David James; Jann N. Sarkaria; Cameron Brennan; Ari Kahn; Paul T. Spellman; Richard Wilson; Terence P. Speed; Joe W. Gray; Matthew Meyerson

The Cancer Genome Atlas Network recently cataloged recurrent genomic abnormalities in glioblastoma multiforme (GBM). We describe a robust gene expression-based molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes and integrate multidimensional genomic data to establish patterns of somatic mutations and DNA copy number. Aberrations and gene expression of EGFR, NF1, and PDGFRA/IDH1 each define the Classical, Mesenchymal, and Proneural subtypes, respectively. Gene signatures of normal brain cell types show a strong relationship between subtypes and different neural lineages. Additionally, response to aggressive therapy differs by subtype, with the greatest benefit in the Classical subtype and no benefit in the Proneural subtype. We provide a framework that unifies transcriptomic and genomic dimensions for GBM molecular stratification with important implications for future studies.

Journal of the American Statistical Association | 2002

Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data

Sandrine Dudoit; Jane Fridlyand; Terence P. Speed

A reliable and precise classification of tumors is essential for successful diagnosis and treatment of cancer. cDNA microarrays and high-density oligonucleotide chips are novel biotechnologies increasingly used in cancer research. By allowing the monitoring of expression levels in cells for thousands of genes simultaneously, microarray experiments may lead to a more complete understanding of the molecular variations among tumors and hence to a finer and more informative classification. The ability to successfully distinguish between tumor classes (already known or yet to be discovered) using gene expression data is an important aspect of this novel approach to cancer classification. This article compares the performance of different discrimination methods for the classification of tumors based on gene expression data. The methods include nearest-neighbor classifiers, linear discriminant analysis, and classification trees. Recent machine learning approaches, such as bagging and boosting, are also considered. The discrimination methods are applied to datasets from three recently published cancer gene expression studies.

Journal of Computational and Graphical Statistics | 2002

Comparison of Methods for Image Analysis on cDNA Microarray Data

Yee Hwa Yang; Michael J. Buckley; Sandrine Dudoit; Terence P. Speed

Microarrays are part of a new class of biotechnologies which allow the monitoring of expression levels for thousands of genes simultaneously. Image analysis is an important aspect of microarray experiments, one that can have a potentially large impact on subsequent analyses such as clustering or the identification of differentially expressed genes. This article reviews a number of existing image analysis approaches for cDNA microarray experiments and proposes new addressing, segmentation, and background correction methods for extracting information from microarray scanned images. The segmentation component uses a seeded region growing algorithm which makes provision for spots of different shapes and sizes. The background estimation approach is based on an image analysis technique known as morphological opening. These new image analysis procedures are implemented in a software package named Spot, built on the R environment for statistical computing. The statistical properties of the different segmentation and background adjustment methods are examined using microarray data from a study of lipid metabolism in mice. It is shown that in some cases background adjustment can substantially reduce the precision—that is, increase the variability—of low-intensity spot values. In contrast, the choice of segmentation procedure has a smaller impact. The comparison further suggests that seeded region growing segmentation with morphological background correction provides precise and accurate estimates of foreground and background intensities.

Test | 2003

Resampling-based Multiple Testing for Microarray Data Analysis

Youngchao Ge; Sandrine Dudoit; Terence P. Speed

The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. Westfall and Young (1993) propose resampling-basedp-value adjustment procedures which are highly relevant to microarray experiments. This article discusses different criteria for error control in resampling-based multiple testing, including (a) the family wise error rate of West-fall and Young (1993) and (b) the false discovery rate developed by Benjamini and Hochberg (1995), both from a frequentist viewpoint; and (c) the positive false discovery rate of Storey (2002a), which has a Bayesian motivation. We also introduce our recently developed fast algorithm for implementing the minP adjustment to control family-wise error rate. Adjustedp-values for different approaches are applied to gene expression data from two recently published microarray studies. The properties of these procedures for multiple testing are compared.

Nucleic Acids Research | 2012

Summarizing and correcting the GC content bias in high-throughput sequencing

Yuval Benjamini; Terence P. Speed

GC content bias describes the dependence between fragment count (read coverage) and GC content found in Illumina sequencing data. This bias can dominate the signal of interest for analyses that focus on measuring fragment abundance within a genome, such as copy number estimation (DNA-seq). The bias is not consistent between samples; and there is no consensus as to the best methods to remove it in a single sample. We analyze regularities in the GC bias patterns, and find a compact description for this unimodal curve family. It is the GC content of the full DNA fragment, not only the sequenced read, that most influences fragment count. This GC effect is unimodal: both GC-rich fragments and AT-rich fragments are underrepresented in the sequencing results. This empirical evidence strengthens the hypothesis that PCR is the most important cause of the GC bias. We propose a model that produces predictions at the base pair level, allowing strand-specific GC-effect correction regardless of the downstream smoothing or binning. These GC modeling considerations can inform other high-throughput sequencing analyses such as ChIP-seq and RNA-seq.

Bioinformatics | 2006

A genotype calling algorithm for affymetrix SNP arrays

Nusrat Rabbee; Terence P. Speed

MOTIVATION A classification algorithm, based on a multi-chip, multi-SNP approach is proposed for Affymetrix SNP arrays. Current procedures for calling genotypes on SNP arrays process all the features associated with one chip and one SNP at a time. Using a large training sample where the genotype labels are known, we develop a supervised learning algorithm to obtain more accurate classification results on new data. The method we propose, RLMM, is based on a robustly fitted, linear model and uses the Mahalanobis distance for classification. The chip-to-chip non-biological variance is reduced through normalization. This model-based algorithm captures the similarities across genotype groups and probes, as well as across thousands of SNPs for accurate classification. In this paper, we apply RLMM to Affymetrix 100 K SNP array data, present classification results and compare them with genotype calls obtained from the Affymetrix procedure DM, as well as to the publicly available genotype calls from the HapMap project.

Bioinformatics | 2002

An HMM model for coiled-coil domains and a comparison with PSSM-based predictions

Mauro Delorenzi; Terence P. Speed

MOTIVATION Large-scale sequence data require methods for the automated annotation of protein domains. Many of the predictive methods are based either on a Position Specific Scoring Matrix (PSSM) of fixed length or on a window-less Hidden Markov Model (HMM). The performance of the two approaches is tested for Coiled-Coil Domains (CCDs). The prediction of CCDs is used frequently, and its optimization seems worthwhile. RESULTS We have conceived MARCOIL, an HMM for the recognition of proteins with a CCD on a genomic scale. A cross-validated study suggests that MARCOIL improves predictions compared to the traditional PSSM algorithm, especially for some protein families and for short CCDs. The study was designed to reveal differences inherent in the two methods. Potential confounding factors such as differences in the dimension of parameter space and in the parameter values were avoided by using the same amino acid propensities and by keeping the transition probabilities of the HMM constant during cross-validation. AVAILABILTY The prediction program and the databases are available at http://www.wehi.edu.au/bioweb/Mauro/Marcoil

Journal of The Royal Statistical Society Series B-statistical Methodology | 2002

A model selection approach for the identification of quantitative trait loci in experimental crosses

Karl W. Broman; Terence P. Speed

We consider the problem of identifying the genetic loci (called quantitative trait loci (QTLs)) contributing to variation in a quantitative trait, with data on an experimental cross. A large number of different statistical approaches to this problem have been described; most make use of multiple tests of hypotheses, and many consider models allowing only a single QTL. We feel that the problem is best viewed as one of model selection. We discuss the use of model selection ideas to identify QTLs in experimental crosses. We focus on a back-cross experiment, with strictly additive QTLs, and concentrate on identifying QTLs, considering the estimation of their effects and precise locations of secondary importance. We present the results of a simulation study to compare the performances of the more prominent methods. Copyright 2002 Royal Statistical Society.

Nature Biotechnology | 2014

Normalization of RNA-seq data using factor analysis of control genes or samples

Davide Risso; John Ngai; Terence P. Speed; Sandrine Dudoit

Normalization of RNA-sequencing (RNA-seq) data has proven essential to ensure accurate inference of expression levels. Here, we show that usual normalization approaches mostly account for sequencing depth and fail to correct for library preparation and other more complex unwanted technical effects. We evaluate the performance of the External RNA Control Consortium (ERCC) spike-in controls and investigate the possibility of using them directly for normalization. We show that the spike-ins are not reliable enough to be used in standard global-scaling or regression-based normalization procedures. We propose a normalization strategy, called remove unwanted variation (RUV), that adjusts for nuisance technical effects by performing factor analysis on suitable sets of control genes (e.g., ERCC spike-ins) or samples (e.g., replicate libraries). Our approach leads to more accurate estimates of expression fold-changes and tests of differential expression compared to state-of-the-art normalization methods. In particular, RUV promises to be valuable for large collaborative projects involving multiple laboratories, technicians, and/or sequencing platforms.

Explore More