Heidi Zhang
Fred Hutchinson Cancer Research Center
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Heidi Zhang.
Molecular & Cellular Proteomics | 2006
Armol Prakash; Parag Mallick; Jeffrey R. Whiteaker; Heidi Zhang; Amanda G. Paulovich; Mark R. Flory; Hookeun Lee; Ruedi Aebersold; Benno Schwikowski
Mass spectrometry-based proteomic experiments, in combination with liquid chromatography-based separation, can be used to compare complex biological samples across multiple conditions. These comparisons are usually performed on the level of protein lists generated from individual experiments. Unfortunately given the current technologies, these lists typically cover only a small fraction of the total protein content, making global comparisons extremely limited. Recently approaches have been suggested that are built on the comparison of computationally built feature lists instead of protein identifications. Although these approaches promise to capture a bigger spectrum of the proteins present in a complex mixture, their success is strongly dependent on the correctness of the identified features and the aligned retention times of these features across multiple experiments. In this experimental-computational study, we went one step further and performed the comparisons directly on the signal level. First signal maps were constructed that associate the experimental signals across multiple experiments. Then a feature detection algorithm used this integrated information to identify those features that are discriminating or common across multiple experiments. At the core of our approach is a score function that faithfully recognizes mass spectra from similar peptide mixtures and an algorithm that produces an optimal alignment (time warping) of the liquid chromatography experiments on the basis of raw MS signal, making minimal assumptions on the underlying data. We provide experimental evidence that suggests uniqueness and correctness of the resulting signal maps even on low accuracy mass spectrometers. These maps can be used for a variety of proteomic analyses. Here we illustrate the use of signal maps for the discovery of diagnostic biomarkers. An imple-mentation of our algorithm is available on our Web server.
pacific symposium on biocomputing | 2005
Pei Wang; Hua Tang; Heidi Zhang; Jeffrey R. Whiteaker; Amanda G. Paulovich; Martin W. McIntosh
We propose a two-step normalization procedure for high-throughput mass spectrometry (MS) data, which is a necessary step in biomarker clustering or classification. First, a global normalization step is used to remove sources of systematic variation between MS profiles due to, for instance, varying amounts of sample degradation over time. A probability model is then used to investigate the intensity-dependent missing events and provides possible substitutions for the missing values. We illustrate the performance of the method with a LC-MS data set of synthetic protein mixtures.
Molecular & Cellular Proteomics | 2007
Amol Prakash; Brian D. Piening; Jeff Whiteaker; Heidi Zhang; Scott A. Shaffer; Daniel B. Martin; Laura Hohmann; Kelly Cooke; James M. Olson; Stacey Hansen; Mark R. Flory; Hookeun Lee; Julian D. Watts; David R. Goodlett; Ruedi Aebersold; Amanda G. Paulovich; Benno Schwikowski
Mass spectrometry-based proteomics holds great promise as a discovery tool for biomarker candidates in the early detection of diseases. Recently much emphasis has been placed upon producing highly reliable data for quantitative profiling for which highly reproducible methodologies are indispensable. The main problems that affect experimental reproducibility stem from variations introduced by sample collection, preparation, and storage protocols and LC-MS settings and conditions. On the basis of a formally precise and quantitative definition of similarity between LC-MS experiments, we have developed Chaorder, a fully automatic software tool that can assess experimental reproducibility of sets of large scale LC-MS experiments. By visualizing the similarity relationships within a set of experiments, this tool can form the basis of systematic quality control and thus help assess the comparability of mass spectrometry data over time, across different laboratories, and between instruments. Applying Chaorder to data from multiple laboratories and a range of instruments, experimental protocols, and sample complexities revealed biases introduced by the sample processing steps, experimental protocols, and instrument choices. Moreover we show that reducing bias by correcting for just a few steps, for example randomizing the run order, does not provide much gain in statistical power for biomarker discovery.
data compression conference | 2006
Agnieszka C. Miguel; John F. Keane; Jeffrey R. Whiteaker; Heidi Zhang; Amanda G. Paulovich
The unrelenting growth of mass spectrometry (MS) based proteomic data to gigabytes per sample and terabytes per experiment motivates this investigation into compression methods suited to MS signal sources. The data for this study was derived from peptides of hand-mixed protein samples passed through a high performance liquid chromatography system (HPLC) and an electrospray ionization time-of-flight (ESI-TOF) mass spectrometer. Several lossless data compression methods were applied and yielded up to a 25:1 compression ratio relative to the original files containing base64 encoding of the data
computer-based medical systems | 2006
Agnieszka C. Miguel; John F. Keane; Jeffrey R. Whiteaker; Heidi Zhang; Amanda G. Paulovich
Summary form only given. The unrelenting growth of liquid chromatography-mass spectrometry (LC-MS) based proteomic data to gigabytes per sample and terabytes per experiment motivates this investigation into compression methods suited to MS signal sources. Compression is needed to facilitate storage, searching, archiving, retrieval, and communication of proteomic MS data. We demonstrate compression techniques that reduce the average file size by a factor of 25 without any loss of accuracy. We have designed two main methods to code the MS data. The first method predicts the mass-to-charge ratio based on the intensity values and encodes the residual with bzip2. The second algorithm maps the original intensity values onto a universal grid and either directly encodes them with bzip2 or applies an arithmetic coder to the results of run-length coding. The latter method achieves the highest compression ratios
Journal of Proteome Research | 2006
Adam Rauch; Matthew Bellew; Jimmy K. Eng; Matthew Fitzgibbon; Ted Holzman; Peter Hussey; Mark Igra; Brendan Maclean; Chen Wei Lin; Andrea Detter; Ruihua Fang; Vitor M. Faça; Phil Gafken; Heidi Zhang; Jeffrey Whitaker; David J. States; Sam Hanash; and Amanda Paulovich; Martin W. McIntosh
Journal of Proteome Research | 2007
Jeffrey R. Whiteaker; Heidi Zhang; Lei Zhao; Pei Wang; Karen S. Kelly-Spratt; Richard G. Ivey; Brian D. Piening; Li Chia Feng; Erik Kasarda; Kay E. Gurley; Jimmy K. Eng; Lewis A. Chodosh; Christopher J. Kemp; Martin W. McIntosh; Amanda G. Paulovich
Journal of Proteome Research | 2007
Jeffrey R. Whiteaker; Heidi Zhang; Jimmy K. Eng; Ruihua Fang; Brian D. Piening; Li Chia Feng; Travis D. Lorentzen; Regine M. Schoenherr; John F. Keane; Ted Holzman; Matthew Fitzgibbon; Chenwei Lin; Hui Zhang; Kelly Cooke; Tao Liu; David G. Camp; Leigh Anderson; Julian D. Watts; Richard D. Smith; Martin W. McIntosh; Amanda G. Paulovich
Journal of Proteome Research | 2006
Brian D. Piening; Pei Wang; Chaitanya S. Bangur; Jeffrey R. Whiteaker; Heidi Zhang; Li Chia Feng; John F. Keane; Jimmy K. Eng; Hua Tang; Amol Prakash; Martin W. McIntosh; Amanda G. Paulovich
Journal of Proteome Research | 2006
Adam Rauch; Matthew Bellew; Jimmy K. Eng; Matthew Fitzgibbon; Ted Holzman; Peter Hussey; Mark Igra; Brendan MacLean; Chen Wei Lin; Andrea Detter; Ruihua Fang; Vitor M. Faça; Phil Gafken; Heidi Zhang; Jeffrey R. Whiteaker; David J. States; Sam Hanash; and Amanda Paulovich; Martin W. McIntosh