Is this you? Create Your Porfile

Keith A. Baggerly

University of Texas MD Anderson Cancer Center

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Keith A. Baggerly is active.

Explore More

Publication

Featured researches published by Keith A. Baggerly.

Nature Reviews Genetics | 2010

Tackling the widespread and critical impact of batch effects in high-throughput data

Jeffrey T. Leek; Robert B. Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W. Evan Johnson; Donald Geman; Keith A. Baggerly; Rafael A. Irizarry

High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. One often overlooked complication with such studies is batch effects, which occur because measurements are affected by laboratory conditions, reagent lots and personnel differences. This becomes a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and our own analyses, we argue that batch effects (as well as other technical and biological artefacts) are widespread and critical to address. We review experimental and computational approaches for doing so.

Bioinformatics | 2004

Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments

Keith A. Baggerly; Jeffrey S. Morris; Kevin R. Coombes

MOTIVATION There has been much interest in using patterns derived from surface-enhanced laser desorption and ionization (SELDI) protein mass spectra from serum to differentiate samples from patients both with and without disease. Such patterns have been used without identification of the underlying proteins responsible. However, there are questions as to the stability of this procedure over multiple experiments. RESULTS We compared SELDI proteomic spectra from serum from three experiments by the same group on separating ovarian cancer from normal tissue. These spectra are available on the web at http://clinicalproteomics.steem.com. In general, the results were not reproducible across experiments. Baseline correction prevents reproduction of the results for two of the experiments. In one experiment, there is evidence of a major shift in protocol mid-experiment which could bias the results. In another, structure in the noise regions of the spectra allows us to distinguish normal from cancer, suggesting that the normals and cancers were processed differently. Sets of features found to discriminate well in one experiment do not generalize to other experiments. Finally, the mass calibration in all three experiments appears suspect. Taken together, these and other concerns suggest that much of the structure uncovered in these experiments could be due to artifacts of sample processing, not to the underlying biology of cancer. We provide some guidelines for design and analysis in experiments like these to ensure better reproducible, biologically meaningfully results. AVAILABILITY The MATLAB and Perl code used in our analyses is available at http://bioinformatics.mdanderson.org

Nature Medicine | 2002

Steps toward mapping the human vasculature by phage display.

Wadih Arap; Mikhail G. Kolonin; Martin Trepel; Johanna Lahdenranta; Marina Cardó-Vila; Ricardo J. Giordano; Paul J. Mintz; Peter Ardelt; Virginia J. Yao; Claudia I. Vidal; Limor Chen; Anne L. Flamm; Heli Valtanen; Lisa Weavind; Marshall E. Hicks; Raphael E. Pollock; Gregory H. Botz; Corazon D. Bucana; Erkki Koivunen; Dolores J. Cahill; Patricia Troncoso; Keith A. Baggerly; Rebecca D. Pentz; Kim Anh Do; Christopher J. Logothetis; Renata Pasqualini

The molecular diversity of receptors in human blood vessels remains largely unexplored. We developed a selection method in which peptides that home to specific vascular beds are identified after administration of a peptide library. Here we report the first in vivo screening of a peptide library in a patient. We surveyed 47,160 motifs that localized to different organs. This large-scale screening indicates that the tissue distribution of circulating peptides is nonrandom. High-throughput analysis of the motifs revealed similarities to ligands for differentially expressed cell-surface proteins, and a candidate ligand–receptor pair was validated. These data represent a step toward the construction of a molecular map of human vasculature and may have broad implications for the development of targeted therapies.

Cancer Cell | 2014

Identification of Distinct Basal and Luminal Subtypes of Muscle-Invasive Bladder Cancer with Different Sensitivities to Frontline Chemotherapy

Woonyoung Choi; Sima Porten; Seungchan Kim; Daniel Levi Willis; Elizabeth R. Plimack; Jean H. Hoffman-Censits; Beat Roth; Tiewei Cheng; Mai Tran; I-Ling Lee; Jonathan J. Melquist; Jolanta Bondaruk; Tadeusz Majewski; Shizhen Zhang; Shanna Pretzsch; Keith A. Baggerly; Arlene O. Siefker-Radtke; Bogdan Czerniak; Colin P. Dinney; David J. McConkey

Muscle-invasive bladder cancers (MIBCs) are biologically heterogeneous and have widely variable clinical outcomes and responses to conventional chemotherapy. We discovered three molecular subtypes of MIBC that resembled established molecular subtypes of breast cancer. Basal MIBCs shared biomarkers with basal breast cancers and were characterized by p63 activation, squamous differentiation, and more aggressive disease at presentation. Luminal MIBCs contained features of active PPARγ and estrogen receptor transcription and were enriched with activating FGFR3 mutations and potential FGFR inhibitor sensitivity. p53-like MIBCs were consistently resistant to neoadjuvant methotrexate, vinblastine, doxorubicin and cisplatin chemotherapy, and all chemoresistant tumors adopted a p53-like phenotype after therapy. Our observations have important implications for prognostication, the future clinical development of targeted agents, and disease management with conventional chemotherapy.

Clinical Cancer Research | 2004

Selection of Potential Markers for Epithelial Ovarian Cancer with Gene Expression Arrays and Recursive Descent Partition Analysis

Karen H. Lu; Andrea P. Patterson; Lin Wang; Rebecca T. Marquez; Edward N. Atkinson; Keith A. Baggerly; Lance R. Ramoth; Daniel G. Rosen; Jinsong Liu; Ingegerd Hellström; David I. Smith; Lynn C. Hartmann; David A. Fishman; Andrew Berchuck; Rosemarie Schmandt; Regina S. Whitaker; David M. Gershenson; Gordon B. Mills; Robert C. Bast

Purpose: Advanced-stage epithelial ovarian cancer has a poor prognosis with long-term survival in less than 30% of patients. When the disease is detected in stage I, more than 90% of patients can be cured by conventional therapy. Screening for early-stage disease with individual serum tumor markers, such as CA125, is limited by the fact that no single marker is up-regulated and shed in adequate amounts by all ovarian cancers. Consequently, use of multiple markers in combination might detect a larger fraction of early-stage ovarian cancers. Experimental Design: To identify potential candidates for novel markers, we have used Affymetrix human genome arrays (U95 series) to analyze differences in gene expression of 41,441 known genes and expressed sequence tags between five pools of normal ovarian surface epithelial cells (OSE) and 42 epithelial ovarian cancers of different stages, grades, and histotypes. Recursive descent partition analysis (RDPA) was performed with 102 probe sets representing 86 genes that were up-regulated at least 3-fold in epithelial ovarian cancers when compared with normal OSE. In addition, a panel of 11 genes known to encode potential tumor markers [mucin 1, transmembrane (MUC1), mucin 16 (CA125), mesothelin, WAP four-disulfide core domain 2 (HE4), kallikrein 6, kallikrein 10, matrix metalloproteinase 2, prostasin, osteopontin, tetranectin, and inhibin] were similarly analyzed. Results: The 3-fold up-regulated genes were examined and four genes [Notch homologue 3 (NOTCH3), E2F transcription factor 3 (E2F3), GTPase activating protein (RACGAP1), and hematological and neurological expressed 1 (HN1)] distinguished all tumor samples from normal OSE. The 3-fold up-regulated genes were analyzed using RDPA, and the combination of elevated claudin 3 (CLDN3) and elevated vascular endothelial growth factor (VEGF) distinguished the cancers from normal OSE. The 11 known markers were analyzed using RDPA, and a combination of HE4, CA125, and MUC1 expression could distinguish tumor from normal specimens. Expression at the mRNA level in the candidate markers was examined via semiquantitative reverse transcription-PCR and was found to correlate well with the array data. Immunohistochemistry was performed to identify expression of the genes at the protein level in 158 ovarian cancers of different histotypes. A combination of CLDN3, CA125, and MUC1 stained 157 (99.4%) of 158 cancers, and all of the tumors were detected with a combination of CLDN3, CA125, MUC1, and VEGF. Conclusions: Our data are consistent with the possibility that a limited number of markers in combination might identify >99% of epithelial ovarian cancers despite the heterogeneity of the disease.

Bioinformatics | 2005

Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum

Jeffrey S. Morris; Kevin R. Coombes; John M. Koomen; Keith A. Baggerly; Ryuji Kobayashi

MOTIVATION Mass spectrometry yields complex functional data for which the features of scientific interest are peaks. A common two-step approach to analyzing these data involves first extracting and quantifying the peaks, then analyzing the resulting matrix of peak quantifications. Feature extraction and quantification involves a number of interrelated steps. It is important to perform these steps well, since subsequent analyses condition on these determinations. Also, it is difficult to compare the performance of competing methods for analyzing mass spectrometry data since the true expression levels of the proteins in the population are generally not known. RESULTS In this paper, we introduce a new method for feature extraction in mass spectrometry data that uses translation-invariant wavelet transforms and performs peak detection using the mean spectrum. We examine the methods performance through examples and simulation, and demonstrate the advantages of using the mean spectrum to detect peaks. We also describe a new physics-based computer model of mass spectrometry and demonstrate how one may design simulation studies based on this tool to systematically compare competing methods. AVAILABILITY MATLAB scripts to implement the methods described in this paper and R code for the virtual mass spectrometer are available at http://bioinformatics.mdanderson.org/software.html SUPPLEMENTARY INFORMATION http://bioinformatics.mdanderson.org/supplements.html.

Clinical Cancer Research | 2013

Differential Response to Neoadjuvant Chemotherapy Among 7 Triple-Negative Breast Cancer Molecular Subtypes

Hiroko Masuda; Keith A. Baggerly; Ying Wang; Ya Zhang; Ana M. Gonzalez-Angulo; Funda Meric-Bernstam; Vicente Valero; Brian D. Lehmann; Jennifer A. Pietenpol; Gabriel N. Hortobagyi; W. Fraser Symmans; Naoto Ueno

Purpose: The clinical relevancy of the 7-subtype classification of triple-negative breast cancer (TNBC) reported by Lehmann and colleagues is unknown. We investigated the clinical relevancy of TNBC heterogeneity by determining pathologic complete response (pCR) rates after neoadjuvant chemotherapy, based on TNBC subtypes. Experimental Design: We revalidated the Lehmann and colleagues experiments using Affymetrix CEL files from public datasets. We applied these methods to 146 patients with TNBC with gene expression microarrays obtained from June 2000 to March 2010 at our institution. Of those, 130 had received standard neoadjuvant chemotherapy and had evaluable pathologic response data. We classified the TNBC samples by subtype and then correlated subtype and pCR status using Fisher exact test and a logistic regression model. We also assessed survival and compared the subtypes with PAM50 intrinsic subtypes and residual cancer burden (RCB) index. Results: TNBC subtype and pCR status were significantly associated (P = 0.04379). The basal-like 1 (BL1) subtype had the highest pCR rate (52%); basal-like 2 (BL2) and luminal androgen receptor had the lowest (0% and 10%, respectively). TNBC subtype was an independent predictor of pCR status (P = 0.022) by a likelihood ratio test. The subtypes better predicted pCR status than did the PAM50 intrinsic subtypes (basal-like vs. non basal-like). Conclusions: Classifying TNBC by 7 subtypes predicts high versus low pCR rate. We confirm the clinical relevancy of the 7 subtypes of TNBC. We need to prospectively validate whether the pCR rate differences translate into long-term outcome differences. The 7-subtype classification may spur innovative personalized medicine strategies for patients with TNBC. Clin Cancer Res; 19(19); 5533–40. ©2013 AACR.

Clinical Cancer Research | 2005

Patterns of Gene Expression in Different Histotypes of Epithelial Ovarian Cancer Correlate with Those in Normal Fallopian Tube, Endometrium, and Colon

Rebecca T. Marquez; Keith A. Baggerly; Andrea P. Patterson; Jinsong Liu; Russell Broaddus; Michael Frumovitz; Edward N. Atkinson; David I. Smith; Lynn C. Hartmann; David A. Fishman; Andrew Berchuck; Regina S. Whitaker; David M. Gershenson; Gordon B. Mills; Robert C. Bast; Karen H. Lu

Purpose: Epithelial ovarian cancers are thought to arise from flattened epithelial cells that cover the ovarian surface or that line inclusion cysts. During malignant transformation, different histotypes arise that resemble epithelial cells from normal fallopian tube, endometrium, and intestine. This study compares gene expression in serous, endometrioid, clear cell, and mucinous ovarian cancers with that in the normal tissues that they resemble. Experimental Design: Expression of 63,000 probe sets was measured in 50 ovarian cancers, in 5 pools of normal ovarian epithelial brushings, and in mucosal scrapings from 4 normal fallopian tube, 5 endometrium, and 4 colon specimens. Using rank-sum analysis, genes whose expressions best differentiated the ovarian cancer histotypes and normal ovarian epithelium were used to determine whether a correlation based on gene expression existed between ovarian cancer histotypes and the normal tissues they resemble. Results: When compared with normal ovarian epithelial brushings, alterations in serous tumors correlated with those in normal fallopian tube (P = 0.0042) but not in other normal tissues. Similarly, mucinous cancers correlated with those in normal colonic mucosa (P = 0.0003), and both endometrioid and clear cell histotypes correlated with changes in normal endometrium (P = 0.0172 and 0.0002, respectively). Mucinous cancers displayed the greatest number of alterations in gene expression when compared with normal ovarian epithelial cells. Conclusion: Studies at a molecular level show distinct expression profiles of different histologies of ovarian cancer and support the long-held belief that histotypes of ovarian cancers come to resemble normal fallopian tube, endometrial, and colonic epithelium. Several potential molecular markers for mucinous ovarian cancers have been identified.

The Annals of Applied Statistics | 2009

Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology

Keith A. Baggerly; Kevin R. Coombes

High-throughput biological assays such as microarrays let us ask very detailed questions about how diseases operate, and promise to let us personalize therapy. Data processing, however, is often not described well enough to allow for exact reproduction of the results, leading to exercises in “forensic bioinformatics” where aspects of raw data and reported results are used to infer what methods must have been employed. Unfortunately, poor documentation can shift from an inconvenience to an active danger when it obscures not just methods but errors. In this report, we examine several related papers purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials are currently being allocated to treatment arms on the basis of these results. However, we show in five case studies that the results incorporate several simple errors that may be putting patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common. We then discuss steps we are taking to avoid such errors in our own investigations.

Clinical Chemistry | 2003

Quality Control and Peak Finding for Proteomics Data Collected from Nipple Aspirate Fluid by Surface-Enhanced Laser Desorption and Ionization

Kevin R. Coombes; Herbert A. Fritsche; Charlotte H. Clarke; Jeng Neng Chen; Keith A. Baggerly; Jeffrey S. Morris; Lian Chun Xiao; Mien Chie Hung; Henry M. Kuerer

BACKGROUND Recently, researchers have been using mass spectroscopy to study cancer. For use of proteomics spectra in a clinical setting, stringent quality-control procedures will be needed. METHODS We pooled samples of nipple aspirate fluid from healthy breasts and breasts with cancer to prepare a control sample. Aliquots of the control sample were used on two spots on each of three IMAC ProteinChip arrays (Ciphergen Biosystems, Inc.) on 4 successive days to generate 24 SELDI spectra. In 36 subsequent experiments, the control sample was applied to two spots of each ProteinChip array, and the resulting spectra were analyzed to determine how closely they agreed with the original 24 spectra. RESULTS We describe novel algorithms that (a) locate peaks in unprocessed proteomics spectra and (b) iteratively combine peak detection with baseline correction. These algorithms detected approximately 200 peaks per spectrum, 68 of which are detected in all 24 original spectra. The peaks were highly correlated across samples. Moreover, we could explain 80% of the variance, using only six principal components. Using a criterion that rejects a chip if the Mahalanobis distance from both control spectra to the center of the six-dimensional principal component space exceeds the 95% confidence limit threshold, we rejected 5 of the 36 chips. CONCLUSIONS Mahalanobis distance in principal component space provides a method for assessing the reproducibility of proteomics spectra that is robust, effective, easily computed, and statistically sound.

Explore More