Anthony A. Philippakis
Broad Institute
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Anthony A. Philippakis.
Nature Genetics | 2011
Mark A. DePristo; Eric Banks; Ryan Poplin; Kiran Garimella; Jared Maguire; Christopher Hartl; Anthony A. Philippakis; Guillermo Del Angel; Manuel A. Rivas; Matt Hanna; Aaron McKenna; Timothy Fennell; Andrew Kernytsky; Andrey Sivachenko; Kristian Cibulskis; Stacey B. Gabriel; David Altshuler; Mark J. Daly
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.
Cell | 2008
Michael F. Berger; Gwenael Badis; Andrew R. Gehrke; Shaheynoor Talukder; Anthony A. Philippakis; Lourdes Peña-Castillo; Trevis M. Alleyne; Sanie Mnaimneh; Olga Botvinnik; Esther T. Chan; Faiqua Khalid; Wen Zhang; Daniel E. Newburger; Savina A. Jaeger; Quaid Morris; Martha L. Bulyk; Timothy R. Hughes
Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success.
Human Mutation | 2015
Anthony A. Philippakis; Danielle R. Azzariti; Sergi Beltran; Anthony J. Brookes; Catherine A. Brownstein; Michael Brudno; Han G. Brunner; Orion J. Buske; Knox Carey; Cassie Doll; Sergiu Dumitriu; Stephanie O.M. Dyke; Johan T. den Dunnen; Helen V. Firth; Richard A. Gibbs; Marta Girdea; Michael Gonzalez; Melissa Haendel; Ada Hamosh; Ingrid A. Holm; Lijia Huang; Ben Hutton; Joel B. Krier; Andriy Misyura; Christopher J. Mungall; Justin Paschall; Benedict Paten; Peter N. Robinson; François Schiettecatte; Nara Sobreira
There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for “the needle in a haystack” to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease‐specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can “match” these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow.
Nature Methods | 2008
Jason Warner; Anthony A. Philippakis; Savina A. Jaeger; Fangxue Sherry He; Jolinta Lin; Martha L. Bulyk
We developed an algorithm, Lever, that systematically maps metazoan DNA regulatory motifs or motif combinations to sets of genes. Lever assesses whether the motifs are enriched in cis-regulatory modules (CRMs), predicted by our PhylCRM algorithm, in the noncoding sequences surrounding the genes. Lever analysis allows unbiased inference of functional annotations to regulatory motifs and candidate CRMs. We used human myogenic differentiation as a model system to statistically assess greater than 25,000 pairings of gene sets and motifs or motif combinations. We assigned functional annotations to candidate regulatory motifs predicted previously and identified gene sets that are likely to be co-regulated via shared regulatory motifs. Lever allows moving beyond the identification of putative regulatory motifs in mammalian genomes, toward understanding their biological roles. This approach is general and can be applied readily to any cell type, gene expression pattern or organism of interest.
Nature Genetics | 2011
Jessica Shea; Vineeta Agarwala; Anthony A. Philippakis; Jared Maguire; Eric Banks; Mark DePristo; Brian Thomson; Candace Guiducci; Robert C. Onofrio; Sekar Kathiresan; Stacey Gabriel; Noël P. Burtt; Mark J. Daly; Leif Groop; David Altshuler
Noncoding variants at human chromosome 9p21 near CDKN2A and CDKN2B are associated with type 2 diabetes, myocardial infarction, aneurysm, vertical cup disc ratio and at least five cancers. Here we compare approaches to more comprehensively assess genetic variation in the region. We carried out targeted sequencing at high coverage in 47 individuals and compared the results to pilot data from the 1000 Genomes Project. We imputed variants into type 2 diabetes and myocardial infarction cohorts directly from targeted sequencing, from a genotyped reference panel derived from sequencing and from 1000 Genomes Project low-coverage data. Polymorphisms with frequency >5% were captured well by all strategies. Imputation of intermediate-frequency polymorphisms required a higher density of tag SNPs in disease samples than is available on first-generation genome-wide association study (GWAS) arrays. Our association analyses identified more comprehensive sets of variants showing equivalent statistical association with type 2 diabetes or myocardial infarction, but did not identify stronger associations than the original GWAS signals.
PLOS Computational Biology | 2006
Anthony A. Philippakis; Brian W. Busser; Stephen S. Gisselbrecht; Fangxue Sherry He; Beatriz Estrada; Alan M. Michelson; Martha L. Bulyk
While combinatorial models of transcriptional regulation can be inferred for metazoan systems from a priori biological knowledge, validation requires extensive and time-consuming experimental work. Thus, there is a need for computational methods that can evaluate hypothesized cis regulatory codes before the difficult task of experimental verification is undertaken. We have developed a novel computational framework (termed “CodeFinder”) that integrates transcription factor binding site and gene expression information to evaluate whether a hypothesized transcriptional regulatory model (TRM; i.e., a set of co-regulating transcription factors) is likely to target a given set of co-expressed genes. Our basic approach is to simultaneously predict cis regulatory modules (CRMs) associated with a given gene set and quantify the enrichment for combinatorial subsets of transcription factor binding site motifs comprising the hypothesized TRM within these predicted CRMs. As a model system, we have examined a TRM experimentally demonstrated to drive the expression of two genes in a sub-population of cells in the developing Drosophila mesoderm, the somatic muscle founder cells. This TRM was previously hypothesized to be a general mode of regulation for genes expressed in this cell population. In contrast, the present analyses suggest that a modified form of this cis regulatory code applies to only a subset of founder cell genes, those whose gene expression responds to specific genetic perturbations in a similar manner to the gene on which the original model was based. We have confirmed this hypothesis by experimentally discovering six (out of 12 tested) new CRMs driving expression in the embryonic mesoderm, four of which drive expression in founder cells.
American Journal of Human Genetics | 2017
Kym M. Boycott; Ana Rath; Jessica X. Chong; Taila Hartley; Fowzan S. Alkuraya; Gareth Baynam; Anthony J. Brookes; Michael Brudno; Angel Carracedo; Johan T. den Dunnen; Stephanie O.M. Dyke; Xavier Estivill; Jack Goldblatt; Catherine Gonthier; Stephen C. Groft; Ivo Gut; Ada Hamosh; Philip Hieter; Sophie Höhn; Petra Kaufmann; Bartha Maria Knoppers; Jeffrey P. Krischer; Milan Macek; Gert Matthijs; Annie Olry; Samantha Parker; Justin Paschall; Anthony A. Philippakis; Heidi L. Rehm; Peter N. Robinson
Provision of a molecularly confirmed diagnosis in a timely manner for children and adults with rare genetic diseases shortens their “diagnostic odyssey,” improves disease management, and fosters genetic counseling with respect to recurrence risks while assuring reproductive choices. In a general clinical genetics setting, the current diagnostic rate is approximately 50%, but for those who do not receive a molecular diagnosis after the initial genetics evaluation, that rate is much lower. Diagnostic success for these more challenging affected individuals depends to a large extent on progress in the discovery of genes associated with, and mechanisms underlying, rare diseases. Thus, continued research is required for moving toward a more complete catalog of disease-related genes and variants. The International Rare Diseases Research Consortium (IRDiRC) was established in 2011 to bring together researchers and organizations invested in rare disease research to develop a means of achieving molecular diagnosis for all rare diseases. Here, we review the current and future bottlenecks to gene discovery and suggest strategies for enabling progress in this regard. Each successful discovery will define potential diagnostic, preventive, and therapeutic opportunities for the corresponding rare disease, enabling precision medicine for this patient population.
research in computational molecular biology | 2007
Anthony A. Philippakis; Aaron M. Qureshi; Michael F. Berger; Martha L. Bulyk
Our group has recently developed a compact, universal protein binding microarray (PBM) that can be used to determine the binding preferences of transcription factors (TFs) [1]. This design represents all possible sequence variants of a given length k (i.e., all k-mers) on a single array, allowing a complete characterization of the binding specificities of a given TF. Here, we present the mathematical foundations of this design based on de Bruijn sequences generated by linear feedback shift registers. We show that these sequences represent the maximum number of variants for any given set of array dimensions (i.e., number of spots and spot lengths), while also exhibiting desirable pseudorandomness properties. Moreover, de Bruijn sequences can be selected that represent gapped sequence patterns, further increasing the coverage of the array. This design yields a powerful experimental platform that allows the binding preferences of TFs to be determined with unprecedented resolution.
Development | 2010
Jonathan Enriquez; Hadi Boukhatmi; Laurence Dubois; Anthony A. Philippakis; Martha L. Bulyk; Alan M. Michelson; Michèle Crozatier; Alain Vincent
Hox transcription factors control many aspects of animal morphogenetic diversity. The segmental pattern of Drosophila larval muscles shows stereotyped variations along the anteroposterior body axis. Each muscle is seeded by a founder cell and the properties specific to each muscle reflect the expression by each founder cell of a specific combination of ‘identity’ transcription factors. Founder cells originate from asymmetric division of progenitor cells specified at fixed positions. Using the dorsal DA3 muscle lineage as a paradigm, we show here that Hox proteins play a decisive role in establishing the pattern of Drosophila muscles by controlling the expression of identity transcription factors, such as Nautilus and Collier (Col), at the progenitor stage. High-resolution analysis, using newly designed intron-containing reporter genes to detect primary transcripts, shows that the progenitor stage is the key step at which segment-specific information carried by Hox proteins is superimposed on intrasegmental positional information. Differential control of col transcription by the Antennapedia and Ultrabithorax/Abdominal-A paralogs is mediated by separate cis-regulatory modules (CRMs). Hox proteins also control the segment-specific number of myoblasts allocated to the DA3 muscle. We conclude that Hox proteins both regulate and contribute to the combinatorial code of transcription factors that specify muscle identity and act at several steps during the muscle-specification process to generate muscle diversity.
Journal of Computational Biology | 2008
Anthony A. Philippakis; Aaron M. Qureshi; Michael F. Berger; Martha L. Bulyk
Our group has recently developed a compact, universal protein binding microarray (PBM) that can be used to determine the binding preferences of transcription factors (TFs). This design represents all possible sequence variants of a given length k (i.e., all k-mers) on a single array, allowing a complete characterization of the binding specificities of a given TF. Here, we present the mathematical foundations of this design based on de Bruijn sequences generated by linear feedback shift registers. We show that these sequences represent the maximum number of variants for any given set of array dimensions (i.e., number of spots and spot lengths), while also exhibiting desirable pseudo-randomness properties. Moreover, de Bruijn sequences can be selected that represent gapped sequence patterns, further increasing the coverage of the array. This design yields a powerful experimental platform that allows the binding preferences of TFs to be determined with unprecedented resolution.
