Madeleine Ball
Harvard University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Madeleine Ball.
Nature Biotechnology | 2009
Madeleine Ball; Jin Billy Li; Yuan Gao; Je-Hyuk Lee; Emily LeProust; In-Hyun Park; Bin Xie; George Q. Daley; George M. Church
Studies of epigenetic modifications would benefit from improved methods for high-throughput methylation profiling. We introduce two complementary approaches that use next-generation sequencing technology to detect cytosine methylation. In the first method, we designed ∼10,000 bisulfite padlock probes to profile ∼7,000 CpG locations distributed over the ENCODE pilot project regions and applied them to human B-lymphocytes, fibroblasts and induced pluripotent stem cells. This unbiased choice of targets takes advantage of existing expression and chromatin immunoprecipitation data and enabled us to observe a pattern of low promoter methylation and high gene-body methylation in highly expressed genes. The second method, methyl-sensitive cut counting, generated nontargeted genome-scale data for ∼1.4 million HpaII sites in the DNA of B-lymphocytes and confirmed that gene-body methylation in highly expressed genes is a consistent phenomenon throughout the human genome. Our observations highlight the usefulness of techniques that are not inherently or intentionally biased towards particular subsets like CpG islands or promoter regions.
Nature | 2012
Brock A. Peters; Bahram Ghaffarzadeh Kermani; Andrew Sparks; Oleg Alferov; Peter Hong; Andrei Alexeev; Yuan Jiang; Fredrik Dahl; Y. Tom Tang; Juergen Haas; Kimberly Robasky; Alexander Wait Zaranek; Je-Hyuk Lee; Madeleine Ball; Joseph E. Peterson; Helena Perazich; George Yeung; Jia Liu; Linsu Chen; Michael Kennemer; Kaliprasad Pothuraju; Karel Konvicka; Mike Tsoupko-Sitnikov; Krishna Pant; Jessica Ebert; Geoffrey B. Nilsen; Jonathan Baccash; Aaron L. Halpern; George M. Church; Radoje Drmanac
Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only ∼100 picograms of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants were assembled into long haplotype contigs. Removal of false positive single nucleotide variants not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 megabases. Cost-effective and accurate genome sequencing and haplotyping from 10–20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications.
Proceedings of the National Academy of Sciences of the United States of America | 2012
Madeleine Ball; Joseph V. Thakuria; Alexander Wait Zaranek; Tom Clegg; Abraham M. Rosenbaum; Xiaodi Wu; Misha Angrist; Jong Bhak; Jason Bobe; Matthew J. Callow; Carlos Cano; Michael F. Chou; Wendy K. Chung; Shawn M. Douglas; Preston W. Estep; Athurva Gore; Peter J. Hulick; Alberto Labarga; Je-Hyuk Lee; Jeantine E. Lunshof; Byung Chul Kim; Jong-Il Kim; Zhe Li; Michael F. Murray; Geoffrey B. Nilsen; Brock A. Peters; Anugraha M. Raman; Hugh Y. Rienhoff; Kimberly Robasky; Matthew T. Wheeler
Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved “open consent” process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain—we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.
PLOS Genetics | 2011
Frederick E. Dewey; Rong Chen; Sergio Cordero; Kelly E. Ormond; Colleen Caleshu; Konrad J. Karczewski; Michelle Whirl-Carrillo; Matthew T. Wheeler; Joel T. Dudley; Jake K. Byrnes; Omar E. Cornejo; Joshua W. Knowles; Mark Woon; Li Gong; Caroline F. Thorn; Joan M. Hebert; Emidio Capriotti; Sean P. David; Aleksandra Pavlovic; Anne West; Joseph V. Thakuria; Madeleine Ball; Alexander Wait Zaranek; Heidi L. Rehm; George M. Church; John West; Carlos Bustamante; Michael Snyder; Russ B. Altman; Teri E. Klein
Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (<1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.
Scientific Data | 2016
Justin M. Zook; David N. Catoe; Jennifer H. McDaniel; Lindsay Vang; Noah Spies; Arend Sidow; Ziming Weng; Yuling Liu; Christopher E. Mason; Noah Alexander; Elizabeth Henaff; Alexa B. R. McIntyre; Dhruva Chandramohan; Feng Chen; Erich Jaeger; Ali Moshrefi; Khoa Pham; William Stedman; Tiffany Liang; Michael Saghbini; Zeljko Dzakula; Alex Hastie; Han Cao; Gintaras Deikus; Eric E. Schadt; Robert Sebra; Ali Bashir; Rebecca Truty; Christopher C. Chang; Natali Gulbahce
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
Genome Medicine | 2014
Madeleine Ball; Jason Bobe; Michael F. Chou; Tom Clegg; Preston W. Estep; Jeantine E. Lunshof; Ward Vandewege; Alexander Wait Zaranek; George M. Church
BackgroundSince its initiation in 2005, the Harvard Personal Genome Project has enrolled thousands of volunteers interested in publicly sharing their genome, health and trait data. Because these data are highly identifiable, we use an ‘open consent’ framework that purposefully excludes promises about privacy and requires participants to demonstrate comprehension prior to enrollment.DiscussionOur model of non-anonymous, public genomes has led us to a highly participatory model of researcher-participant communication and interaction. The participants, who are highly committed volunteers, self-pursue and donate research-relevant datasets, and are actively engaged in conversations with both our staff and other Personal Genome Project participants. We have quantitatively assessed these communications and donations, and report our experiences with returning research-grade whole genome data to participants. We also observe some of the community growth and discussion that has occurred related to our project.SummaryWe find that public non-anonymous data is valuable and leads to a participatory research model, which we encourage others to consider. The implementation of this model is greatly facilitated by web-based tools and methods and participant education. Project results are long-term proactive participant involvement and the growth of a community that benefits both researchers and participants.
Genome Medicine | 2013
Jeantine E. Lunshof; Madeleine Ball
DNA is an identifier. We are not defined by our genome, but our DNA is ours and we can be identified through it. Despite the comments made at the time, it was neither wicked nor tacky when Craig Venter, shortly after the first human genome sequence was published in 2001, publicly revealed that he was one donor of the samples used in Celeras genome sequencing project [1]. Venter later explained that by identifying himself as a donor he had intended to demystify the human genome and to reduce public fears about the potential misuse of genetic information [2].
GigaScience | 2016
Qing Mao; Serban Ciotlos; Rebecca Yu Zhang; Madeleine Ball; Robert Chin; Paolo Carnevali; Nina Barua; Staci Nguyen; Misha R. Agarwal; Tom Clegg; Abram Connelly; Ward Vandewege; Alexander Wait Zaranek; Preston W. Estep; George M. Church; Radoje Drmanac; Brock A. Peters
BackgroundSince the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information.FindingsAs part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics’ Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics’ standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data.ConclusionsThese genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.
Scientific Reports | 2017
Yingleong Chan; Michael Tung; Alexander S. Garruss; Sarah W. Zaranek; Ying Kai Chan; Jeantine E. Lunshof; Alexander Wait Zaranek; Madeleine Ball; Michael F. Chou; Elaine T. Lim; George M. Church
The Personal Genome Project (PGP) is an effort to enroll many participants to create an open-access repository of genome, health and trait data for research. However, PGP participants are not enrolled for studying any specific traits and participants choose the phenotypes to disclose. To measure the extent and willingness and to encourage and guide participants to contribute phenotypes, we developed an algorithm to score and rank the phenotypes and participants of the PGP. The scoring algorithm calculates the participation index (P-index) for every participant, where 0 indicates no reported phenotypes and 100 indicate complete phenotype reporting. We calculated the P-index for all 5,015 participants in the PGP and they ranged from 0 to 96.7. We found that participants mainly have either high scores (P-index > 90, 29.5%) or low scores (P-index < 10, 57.8%). While, there are significantly more males than female participants (1,793 versus 1,271), females tend to have on average higher P-indexes (P = 0.015). We also reported the P-indexes of participants based on demographics and states like Missouri and Massachusetts have better P-indexes than states like Utah and Minnesota. The P-index can therefore be used as an unbiased way to measure and rank participant’s phenotypic contribution towards the PGP.
Human Mutation | 2017
Binghuang Cai; Biao Li; Nikki Kiga; Janita Thusberg; Timothy Bergquist; Yun-Ching Chen; Noushin Niknafs; Hannah Carter; Collin Tokheim; Violeta Beleva-Guthrie; Christopher Douville; Rohit Bhattacharya; Hui Ting Grace Yeo; Jean Fan; Sohini Sengupta; Dewey Kim; Melissa S. Cline; Tychele N. Turner; Mark Diekhans; Jan Zaucha; Lipika R. Pal; Chen Cao; Chen-Hsin Yu; Yizhou Yin; Marco Carraro; Manuel Giollo; Carlo Ferrari; Emanuela Leonardi; Jason Bobe; Madeleine Ball
The advent of next‐generation sequencing has dramatically decreased the cost for whole‐genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics communitys ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features.