Matthew N. McCall
University of Rochester Medical Center
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Matthew N. McCall.
Biostatistics | 2010
Matthew N. McCall; Benjamin M. Bolstad; Rafael A. Irizarry
Robust multiarray analysis (RMA) is the most widely used preprocessing algorithm for Affymetrix and Nimblegen gene expression microarrays. RMA performs background correction, normalization, and summarization in a modular way. The last 2 steps require multiple arrays to be analyzed simultaneously. The ability to borrow information across samples provides RMA various advantages. For example, the summarization step fits a parametric model that accounts for probe effects, assumed to be fixed across arrays, and improves outlier detection. Residuals, obtained from the fitted model, permit the creation of useful quality metrics. However, the dependence on multiple arrays has 2 drawbacks: (1) RMA cannot be used in clinical settings where samples must be processed individually or in small batches and (2) data sets preprocessed separately are not comparable. We propose a preprocessing algorithm, frozen RMA (fRMA), which allows one to analyze microarrays individually or in small batches and then combine the data for analysis. This is accomplished by utilizing information from the large publicly available microarray databases. In particular, estimates of probe-specific effects and variances are precomputed and frozen. Then, with new data sets, these are used in concert with information from the new arrays to normalize and summarize the data. We find that fRMA is comparable to RMA when the data are analyzed as a single batch and outperforms RMA when analyzing multiple batches. The methods described here are implemented in the R package fRMA and are currently available for download from the software section of http://rafalab.jhsph.edu.
Nucleic Acids Research | 2011
Matthew N. McCall; Karan Uppal; Harris A. Jaffee; Michael J. Zilliox; Rafael A. Irizarry
Various databases have harnessed the wealth of publicly available microarray data to address biological questions ranging from across-tissue differential expression to homologous gene expression. Despite their practical value, these databases rely on relative measures of expression and are unable to address the most fundamental question—which genes are expressed in a given cell type. The Gene Expression Barcode is the first database to provide reliable absolute measures of expression for most annotated genes for 131 human and 89 mouse tissue types, including diseased tissue. This is made possible by a novel algorithm that leverages information from the GEO and ArrayExpress public repositories to build statistical models that permit converting data from a single microarray into expressed/unexpressed calls for each gene. For selected platforms, users may upload data and obtain results in a matter of seconds. The raw data, curated annotation, and code used to create our resource are also available at http://rafalab.jhsph.edu/barcode.
PLOS ONE | 2014
Baqer A. Haider; Alexander S. Baras; Matthew N. McCall; Joshua A. Hertel; Toby C. Cornish; Marc K. Halushka
Background MicroRNAs (miRNAs) are small (∼22-nt), stable RNAs that critically modulate post-transcriptional gene regulation. MicroRNAs can be found in the blood as components of serum, plasma and peripheral blood mononuclear cells (PBMCs). Many microRNAs have been reported to be specific biomarkers in a variety of non-neoplastic diseases. To date, no one has globally evaluated these proposed clinical biomarkers for general quality or disease specificity. We hypothesized that the cellular source of circulating microRNAs should correlate with cells involved in specific non-neoplastic disease processes. Appropriate cell expression data would inform on the quality and usefulness of each microRNA as a biomarker for specific diseases. We further hypothesized a useful clinical microRNA biomarker would have specificity to a single disease. Methods and Findings We identified 416 microRNA biomarkers, of which 192 were unique, in 104 publications covering 57 diseases. One hundred and thirty-nine microRNAs (33%) represented biologically plausible biomarkers, corresponding to non-ubiquitous microRNAs expressed in disease-appropriate cell types. However, at a global level, many of these microRNAs were reported as “specific” biomarkers for two or more unrelated diseases with 6 microRNAs (miR-21, miR-16, miR-146a, miR-155, miR-126 and miR-223) being reported as biomarkers for 9 or more distinct diseases. Other biomarkers corresponded to common patterns of cellular injury, such as the liver-specific microRNA, miR-122, which was elevated in a disparate set of diseases that injure the liver primarily or secondarily including hepatitis B, hepatitis C, sepsis, and myocardial infarction. Conclusions Only a subset of reported blood-based microRNA biomarkers have specificity for a particular disease. The remainder of the reported non-neoplastic biomarkers are either biologically implausible, non-specific, or uninterpretable due to limitations of our current understanding of microRNA expression.
Nucleic Acids Research | 2014
Oliver A. Kent; Matthew N. McCall; Toby C. Cornish; Marc K. Halushka
miR-143 and miR-145 are co-expressed microRNAs (miRNAs) that have been extensively studied as potential tumor suppressors. These miRNAs are highly expressed in the colon and are consistently reported as being downregulated in colorectal and other cancers. Through regulation of multiple targets, they elicit potent effects on cancer cell growth and tumorigenesis. Importantly, a recent discovery demonstrates that miR-143 and miR-145 are not expressed in colonic epithelial cells; rather, these two miRNAs are highly expressed in mesenchymal cells such as fibroblasts and smooth muscle cells. The expression patterns of miR-143 and miR-145 and other miRNAs were initially determined from tissue level data without consideration that multiple different cell types, each with their own unique miRNA expression patterns, make up each tissue. Herein, we discuss the early reports on the identification of dysregulated miR-143 and miR-145 expression in colorectal cancer and how lack of consideration of cellular composition of normal tissue led to the misconception that these miRNAs are downregulated in cancer. We evaluate mechanistic data from miR-143/145 studies in context of their cell type-restricted expression pattern and the potential of these miRNAs to be considered tumor suppressors. Further, we examine other examples of miRNAs being investigated in inappropriate cell types modulating pathways in a non-biological fashion. Our review highlights the importance of determining the cellular expression pattern of each miRNA, so that downstream studies are conducted in the appropriate cell type.
BMC Medical Genomics | 2011
Matthew N. McCall; Oliver A. Kent; Jianshi Yu; Karen Fox-Talbot; Ari Zaiman; Marc K. Halushka
BackgroundMicroRNAs are ~22-nt long regulatory RNAs that serve as critical modulators of post-transcriptional gene regulation. The diversity of miRNAs in endothelial cells (ECs) and the relationship of this diversity to epithelial and hematologic cells is unknown. We investigated the baseline miRNA signature of human ECs cultured from the aorta (HAEC), coronary artery (HCEC), umbilical vein (HUVEC), pulmonary artery (HPAEC), pulmonary microvasculature (HPMVEC), dermal microvasculature (HDMVEC), and brain microvasculature (HBMVEC) to understand the diversity of miRNA expression in ECs.ResultsWe identified 166 expressed miRNAs, of which 3 miRNAs (miR-99b, miR-20b and let-7b) differed significantly between EC types and predicted EC clustering. We confirmed the significance of these miRNAs by RT-PCR analysis and in a second data set by Sylamer analysis. We found wide diversity of miRNAs between endothelial, epithelial and hematologic cells with 99 miRNAs shared across cell types and 31 miRNAs unique to ECs. We show polycistronic miRNA chromosomal clusters have common expression levels within a given cell type.ConclusionsEC miRNA expression levels are generally consistent across EC types. Three microRNAs were variable within the dataset indicating potential regulatory changes that could impact on EC phenotypic differences. MiRNA expression in endothelial, epithelial and hematologic cells differentiate these cell types. This data establishes a valuable resource characterizing the diverse miRNA signature of ECs.
BMC Bioinformatics | 2011
Matthew N. McCall; Peter Murakami; Margus Lukk; Wolfgang Huber; Rafael A. Irizarry
BackgroundMicroarray technology has become a widely used tool in the biological sciences. Over the past decade, the number of users has grown exponentially, and with the number of applications and secondary data analyses rapidly increasing, we expect this rate to continue. Various initiatives such as the External RNA Control Consortium (ERCC) and the MicroArray Quality Control (MAQC) project have explored ways to provide standards for the technology. For microarrays to become generally accepted as a reliable technology, statistical methods for assessing quality will be an indispensable component; however, there remains a lack of consensus in both defining and measuring microarray quality.ResultsWe begin by providing a precise definition of microarray quality and reviewing existing Affymetrix GeneChip quality metrics in light of this definition. We show that the best-performing metrics require multiple arrays to be assessed simultaneously. While such multi-array quality metrics are adequate for bench science, as microarrays begin to be used in clinical settings, single-array quality metrics will be indispensable. To this end, we define a single-array version of one of the best multi-array quality metrics and show that this metric performs as well as the best multi-array metrics. We then use this new quality metric to assess the quality of microarry data available via the Gene Expression Omnibus (GEO) using more than 22,000 Affymetrix HGU133a and HGU133plus2 arrays from 809 studies.ConclusionsWe find that approximately 10 percent of these publicly available arrays are of poor quality. Moreover, the quality of microarray measurements varies greatly from hybridization to hybridization, study to study, and lab to lab, with some experiments producing unusable data. Many of the concepts described here are applicable to other high-throughput technologies.
Nucleic Acids Research | 2014
Matthew N. McCall; Harris A. Jaffee; Susan Zelisko; Neeraj Sinha; Guido Hooiveld; Rafael A. Irizarry; Michael J. Zilliox
The Gene Expression Barcode project, http://barcode.luhs.org, seeks to determine the genes expressed for every tissue and cell type in humans and mice. Understanding the absolute expression of genes across tissues and cell types has applications in basic cell biology, hypothesis generation for gene function and clinical predictions using gene expression signatures. In its current version, this project uses the abundant publicly available microarray data sets combined with a suite of single-array preprocessing, quality control and analysis methods. In this article, we present the improvements that have been made since the previous version of the Gene Expression Barcode in 2011. These include a variety of new data mining tools and summaries, estimated transcriptomes and curated annotations.
Bioinformatics | 2014
Matthew N. McCall; Helene McMurray; Hartmut Land; Anthony Almudevar
Motivation: Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. Despite extensive research in qPCR laboratory protocols, normalization and statistical analysis, little attention has been given to qPCR non-detects—those reactions failing to produce a minimum amount of signal. Results: We show that the common methods of handling qPCR non-detects lead to biased inference. Furthermore, we show that non-detects do not represent data missing completely at random and likely represent missing data occurring not at random. We propose a model of the missing data mechanism and develop a method to directly model non-detects as missing data. Finally, we show that our approach results in a sizeable reduction in bias when estimating both absolute and differential gene expression. Availability and implementation: The proposed algorithm is implemented in the R package, nondetects. This package also contains the raw data for the three example datasets used in this manuscript. The package is freely available at http://mnmccall.com/software and as part of the Bioconductor project. Contact: [email protected]
American Journal of Transplantation | 2012
Christopher T. Barry; M. D'Souza; Matthew N. McCall; Saman Safadjou; Charlotte K. Ryan; Randeep Kashyap; C.E. Marroquin; Mark S. Orloff; Anthony Almudevar; T. E. Godfrey
Donor livers are precious resources and it is, therefore, ethically imperative that we employ optimally sensitive and specific transplant selection criteria. Current selection criteria, the Milan criteria, for liver transplant candidates with hepatocellular carcinoma (HCC) are primarily based on radiographic characteristics of the tumor. Although the Milan criteria result in reasonably high survival and low‐recurrence rates, they do not assess an individual patients tumor biology and recurrence risk. Consequently, it is difficult to predict on an individual basis the risk for recurrent disease. To address this, we employed microarray profiling of microRNA (miRNA) expression from formalin fixed paraffin embedded tissues to define a biomarker that distinguishes between patients with and without HCC recurrence after liver transplant. In our cohort of 64 patients, this biomarker outperforms the Milan criteria in that it identifies patients outside of Milan who did not have recurrent disease and patients within Milan who had recurrence. We also describe a method to account for multifocal tumors in biomarker signature discovery.
Nucleic Acids Research | 2008
Matthew N. McCall; Rafael A. Irizarry
As the number of users of microarray technology continues to grow, so does the importance of platform assessments and comparisons. Spike-in experiments have been successfully used for internal technology assessments by microarray manufacturers and for comparisons of competing data analysis approaches. The microarray literature is saturated with statistical assessments based on spike-in experiment data. Unfortunately, the statistical assessments vary widely and are applicable only in specific cases. This has introduced confusion into the debate over best practices with regards to which platform, protocols and data analysis tools are best. Furthermore, cross-platform comparisons have proven difficult because reported concentrations are not comparable. In this article, we introduce two new spike-in experiments, present a novel statistical solution that enables cross-platform comparisons, and propose a comprehensive procedure for assessments based on spike-in experiments. The ideas are implemented in a user friendly Bioconductor package: spkTools. We demonstrated the utility of our tools by presenting the first spike-in-based comparison of the three major platforms–Affymetrix, Agilent and Illumina.