Rachel L. Goldfeder
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rachel L. Goldfeder.
JAMA | 2014
Frederick E. Dewey; Megan E. Grove; Cuiping Pan; Benjamin A. Goldstein; Jonathan A. Bernstein; Hassan Chaib; Jason D. Merker; Rachel L. Goldfeder; Gregory M. Enns; Sean P. David; Neda Pakdaman; Kelly E. Ormond; Colleen Caleshu; Kerry Kingham; Teri E. Klein; Michelle Whirl-Carrillo; Kenneth Sakamoto; Matthew T. Wheeler; Atul J. Butte; James M. Ford; Linda M. Boxer; John P. A. Ioannidis; Alan C. Yeung; Russ B. Altman; Themistocles L. Assimes; Michael Snyder; Euan A. Ashley; Thomas Quertermous
IMPORTANCE Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication. OBJECTIVES To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings. DESIGN, SETTING, AND PARTICIPANTS An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings. MAIN OUTCOMES AND MEASURES Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up. RESULTS Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P < 001). CONCLUSIONS AND RELEVANCE In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine.
JAMA Cardiology | 2017
Michael V. McConnell; Anna Shcherbina; Aleksandra Pavlovic; Julian R. Homburger; Rachel L. Goldfeder; Daryl Waggot; Mildred K. Cho; Mary Rosenberger; William L. Haskell; Jonathan Myers; Mary Ann Champagne; Emmanuel Mignot; M Landray; Lionel Tarassenko; Robert A. Harrington; Alan C. Yeung; Euan A. Ashley
Importance Studies have established the importance of physical activity and fitness, yet limited data exist on the associations between objective, real-world physical activity patterns, fitness, sleep, and cardiovascular health. Objectives To assess the feasibility of obtaining measures of physical activity, fitness, and sleep from smartphones and to gain insights into activity patterns associated with life satisfaction and self-reported disease. Design, Setting, and Participants The MyHeart Counts smartphone app was made available in March 2015, and prospective participants downloaded the free app between March and October 2015. In this smartphone-based study of cardiovascular health, participants recorded physical activity, filled out health questionnaires, and completed a 6-minute walk test. The app was available to download within the United States. Main Outcomes and Measures The feasibility of consent and data collection entirely on a smartphone, the use of machine learning to cluster participants, and the associations between activity patterns, life satisfaction, and self-reported disease. Results From the launch to the time of the data freeze for this study (March to October 2015), the number of individuals (self-selected) who consented to participate was 48 968, representing all 50 states and the District of Columbia. Their median age was 36 years (interquartile range, 27-50 years), and 82.2% (30 338 male, 6556 female, 10 other, and 3115 unknown) were male. In total, 40 017 (81.7% of those who consented) uploaded data. Among those who consented, 20 345 individuals (41.5%) completed 4 of the 7 days of motion data collection, and 4552 individuals (9.3%) completed all 7 days. Among those who consented, 40 017 (81.7%) filled out some portion of the questionnaires, and 4990 (10.2%) completed the 6-minute walk test, made available only at the end of 7 days. The Heart Age Questionnaire, also available after 7 days, required entering lipid values and age 40 to 79 years (among 17 245 individuals, 43.1% of participants). Consequently, 1334 (2.7%) of those who consented completed all fields needed to compute heart age and a 10-year risk score. Physical activity was detected for a mean (SD) of 14.5% (8.0%) of individuals’ total recorded time. Physical activity patterns were identified by cluster analysis. A pattern of lower overall activity but more frequent transitions between active and inactive states was associated with equivalent self-reported cardiovascular disease as a pattern of higher overall activity with fewer transitions. Individuals’ perception of their activity and risk bore little relation to sensor-estimated activity or calculated cardiovascular risk. Conclusions and Relevance A smartphone-based study of cardiovascular health is feasible, and improvements in participant diversity and engagement will maximize yield from consented participants. Large-scale, real-world assessment of physical activity, fitness, and sleep using mobile devices may be a useful addition to future population health studies.
Science Translational Medicine | 2016
Russ B. Altman; Snehit Prabhu; Arend Sidow; Justin M. Zook; Rachel L. Goldfeder; David Litwack; Euan A. Ashley; George Asimenos; Carlos Bustamante; Katherine Donigan; Kathleen M. Giacomini; Elaine Johansen; Natalia Khuri; Eunice Lee; Xueying Sharon Liang; Marc L. Salit; Omar Serang; Zivana Tezak; Dennis P. Wall; Elizabeth Mansfield; Taha Kass-Hout
Progress in nine research areas will help generate the knowledge required to advance next-generation sequencing diagnostics to the clinic. Next-generation sequencing technologies are fueling a wave of new diagnostic tests. Progress on a key set of nine research challenge areas will help generate the knowledge required to advance effectively these diagnostics to the clinic.
Briefings in Bioinformatics | 2016
Idoia Ochoa; Mikel Hernaez; Rachel L. Goldfeder; Tsachy Weissman; Euan A. Ashley
Recent advancements in sequencing technology have led to a drastic reduction in genome sequencing costs. This development has generated an unprecedented amount of data that must be stored, processed, and communicated. To facilitate this effort, compression of genomic files has been proposed. Specifically, lossy compression of quality scores is emerging as a natural candidate for reducing the growing costs of storage. A main goal of performing DNA sequencing in population studies and clinical settings is to identify genetic variation. Though the field agrees that smaller files are advantageous, the cost of lossy compression, in terms of variant discovery, is unclear.Bioinformatic algorithms to identify SNPs and INDELs use base quality score information; here, we evaluate the effect of lossy compression of quality scores on SNP and INDEL detection. Specifically, we investigate how the output of the variant caller when using the original data differs from that obtained when quality scores are replaced by those generated by a lossy compressor. Using gold standard genomic datasets and simulated data, we are able to analyze how accurate the output of the variant calling is, both for the original data and that previously lossily compressed. We show that lossy compression can significantly alleviate the storage while maintaining variant calling performance comparable to that with the original data. Further, in some cases lossy compression can lead to variant calling performance that is superior to that using the original file. We envisage our findings and framework serving as a benchmark in future development and analyses of lossy genomic data compressors.
American Journal of Epidemiology | 2017
Rachel L. Goldfeder; Dennis P. Wall; Muin J. Khoury; John P. A. Ioannidis; Euan A. Ashley
Most human diseases have underlying genetic causes. To better understand the impact of genes on disease and its implications for medicine and public health, researchers have pursued methods for determining the sequences of individual genes, then all genes, and now complete human genomes. Massively parallel high-throughput sequencing technology, where DNA is sheared into smaller pieces, sequenced, and then computationally reordered and analyzed, enables fast and affordable sequencing of full human genomes. As the price of sequencing continues to decline, more and more individuals are having their genomes sequenced. This may facilitate better population-level disease subtyping and characterization, as well as individual-level diagnosis and personalized treatment and prevention plans. In this review, we describe several massively parallel high-throughput DNA sequencing technologies and their associated strengths, limitations, and error modes, with a focus on applications in epidemiologic research and precision medicine. We detail the methods used to computationally process and interpret sequence data to inform medical or preventative action.
Circulation-cardiovascular Genetics | 2017
Andrew R. Harper; Victoria N. Parikh; Rachel L. Goldfeder; Colleen Caleshu; Euan A. Ashley
Contemporary DNA sequencing approaches are increasingly used as diagnostic tools within clinical medicine, driven by rapid reductions in cost and improvements in speed. In 2014, Illumina, Inc. launched a system that could sequence an entire human genome for under
data compression conference | 2016
Idoia Ochoa; Mikel Hernaez; Rachel L. Goldfeder; Tsachy Weissman; Euan A. Ashley
1000,1 with sequencing and analysis achievable in under 2 days.2,3 For inherited disease, next-generation sequencing (NGS) technologies have been applied in 3 broad categories: (1) gene panels, where a collection of predefined genes for a given condition, or a group of closely related conditions, are sequenced; (2) whole-exome sequencing (WES), where the majority of the protein-coding portions of the genome (≈2% of the genome) is sequenced; and (3) whole-genome sequencing (WGS), where the majority of the genome is sequenced, including nonprotein-coding DNA. In the management of inherited cardiovascular disease, there has been increasing use of genetic testing as major healthcare systems establish centers of excellence. In most cases, these tests now feature NGS approaches. However, as we transition from traditional to NGS approaches, it is important that noninferiority with traditional practices is firmly established. Furthermore, our ability to correctly interpret the clinical impact of variants derived from these sequencing efforts remains suboptimal. The recent analysis and public release of sequence data from tens of thousands of individuals established that rare variation is common in humans, meaning that only a small proportion of rare variants will actually be causal of rare genetic disease. Indeed, even when rare variants emerge that could potentially explain a given presentation, the classification of such variants is often discordant between laboratories. Here, we will explore the challenge and opportunity of NGS for inherited cardiovascular disease. First, we describe recent advances in sequencing and interpretation facilitated by large-scale population genomics studies. Next, we outline current approaches to clinical genetic testing and describe areas of …
bioRxiv | 2016
Rachel L. Goldfeder; Euan A. Ashley
Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Much of the raw data are comprised of nucleotides and the corresponding quality scores that indicate their reliability. The latter are more difficult to compress and are themselves noisy. Lossless and lossy compression of the quality scores has recently been proposed to alleviate the storage costs, but reducing the noise in the quality scores has remained largely unexplored. This raw data is processed in order to identify variants; these genetic variants are used in important applications, such as medical decision making. Thus improving the performance of the variant calling by reducing the noise contained in the quality scores is important. We propose a denoising scheme that reduces the noise of the quality scores and we demonstrate improved inference with this denoised data. Specifically, we show that replacing the quality scores with those generated by the proposed denoiser results in more accurate variant calling in general. Moreover, a consequence of the denoising is that the entropy of the produced quality scores is smaller, and thus significant compression can be achieved with respect to lossless compression of the original quality scores. We expect our results to provide a baseline for future research in denoising of quality scores. The code used in this work as well as a Supplement with all the results are available at http://web.stanford.edu iochoa/DCCdenoiser_CodeAndSupplement.zip.
Genome Medicine | 2016
Rachel L. Goldfeder; James R. Priest; Justin M. Zook; Megan E. Grove; Daryl Waggott; Matthew T. Wheeler; Marc L. Salit; Euan A. Ashley
A requisite precondition for the application of next-generation sequencing to clinical medicine is the ability to confidently call genotype at each coding/splicing position of every gene of interest. Current gold standard technologies, such as Sanger sequencing and microarrays, allow confident identification of the genomic origin of the DNA of interest. A commonly used minimum standard for the adoption of new technology in medicine is non-inferiority. We developed a metric to quantify the extent to which current sequencing technologies reach this clinical grade reporting standard. This metric, the rationale for which we present here, is defined as the absolute number of base pairs per gene not callable with confidence, as specified by the presence of 20 high quality (Q20) bases from uniquely mapped (mapq>0) reads per locus. To illustrate the utility of this metric, we apply it across data from several commercially available clinical sequencing products. We present specific examples of coverage for genes known to be important for clinical medicine. We derive data from a variety of platforms including whole genome sequencing (Illumina Hiseq and X chemistry) and exome capture (including medically optimized capture from Agilent, Baylor Clinical Lab, and Personalis). We observe that compared to whole genomes (with ˜30x average coverage), augmented exomes perform far better for known disease causing genes, but less well for other genes and in untranslated regions. Increasing whole genome coverage improves this discrepancy with an average coverage of ˜45x representing the cross over point where performance equals that of exome capture for disease causing genes. A combination of some genome-wide coverage and augmented exon coverage may offer the most cost effective solution for clinical grade genome sequencing today. In summary, this coverage metric provides transparency regarding the current state of next-generation sequencing for clinical medicine and will inform genotype interpretation, technology improvement, and sequencing platform choices for physicians and laboratories. We provide an application on precision.fda.gov (Coverage of Key Genes app) to calculate this metric.
data compression conference | 2016
Claudio Alberti; Noah M. Daniels; Mikel Hernaez; Jan Voges; Rachel L. Goldfeder; Ana A. Hernandez-Lopez; Marco Mattavelli; Bonnie Berger