Fabio C. P. Navarro
Yale University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fabio C. P. Navarro.
Nature Neuroscience | 2015
Schahram Akbarian; Chunyu Liu; James A. Knowles; Flora M. Vaccarino; Peggy J. Farnham; Gregory E. Crawford; Andrew E. Jaffe; Dalila Pinto; Stella Dracheva; Daniel H. Geschwind; Jonathan Mill; Angus C. Nairn; Alexej Abyzov; Sirisha Pochareddy; Shyam Prabhakar; Sherman M. Weissman; Patrick F. Sullivan; Matthew W. State; Zhiping Weng; Mette A. Peters; Kevin P. White; Mark Gerstein; Anahita Amiri; Chris Armoskus; Allison E. Ashley-Koch; Taejeong Bae; Andrea Beckel-Mitchener; Benjamin P. Berman; Gerhard A. Coetzee; Gianfilippo Coppola
Recent research on disparate psychiatric disorders has implicated rare variants in genes involved in global gene regulation and chromatin modification, as well as many common variants located primarily in regulatory regions of the genome. Understanding precisely how these variants contribute to disease will require a deeper appreciation for the mechanisms of gene regulation in the developing and adult human brain. The PsychENCODE project aims to produce a public resource of multidimensional genomic data using tissue- and cell type–specific samples from approximately 1,000 phenotypically well-characterized, high-quality healthy and disease-affected human post-mortem brains, as well as functionally characterize disease-associated regulatory elements and variants in model systems. We are beginning with a focus on autism spectrum disorder, bipolar disorder and schizophrenia, and expect that this knowledge will apply to a wide variety of psychiatric disorders. This paper outlines the motivation and design of PsychENCODE.
Nature Communications | 2016
Jane E. Freedman; Mark Gerstein; Eric Mick; Joel Rozowsky; Daniel Levy; Robert R. Kitchen; Saumya Das; Ravi V. Shah; Kirsty Danielson; Lea M. Beaulieu; Fabio C. P. Navarro; Yaoyu Wang; Timur R. Galeev; Alex Holman; Raymond Y. Kwong; Venkatesh L. Murthy; Selim E. Tanriverdi; Milka Koupenova; Ekaterina Mikhalev
There is growing appreciation for the importance of non-protein-coding genes in development and disease. Although much is known about microRNAs, limitations in bioinformatic analyses of RNA sequencing have precluded broad assessment of other forms of small-RNAs in humans. By analysing sequencing data from plasma-derived RNA from 40 individuals, here we identified over a thousand human extracellular RNAs including microRNAs, piwi-interacting RNA (piRNA), and small nucleolar RNAs. Using a targeted quantitative PCR with reverse transcription approach in an additional 2,763 individuals, we characterized almost 500 of the most abundant extracellular transcripts including microRNAs, piRNAs and small nucleolar RNAs. The presence in plasma of many non-microRNA small-RNAs was confirmed in an independent cohort. We present comprehensive data to demonstrate the broad and consistent detection of diverse classes of circulating non-cellular small-RNAs from a large population.
Genome Research | 2018
David Thybert; Maša Roller; Fabio C. P. Navarro; Ian T Fiddes; Ian Streeter; Christine Feig; David Martín-Gálvez; Mikhail Kolmogorov; Václav Janoušek; Wasiu Akanni; Bronwen Aken; Sarah Aldridge; Varshith Chakrapani; William Chow; Laura Clarke; Carla Cummins; Anthony G. Doran; Matthew Dunn; Leo Goodstadt; Kerstin Howe; Matthew Howell; Ambre Aurore Josselin; Robert C. Karn; Lilue Jingtao; Fergal Martin; Matthieu Muffato; Stefanie Nachtweide; Michael A. Quail; Cristina Sisu; Mario Stanke
Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology.
bioRxiv | 2018
Jingtao Lilue; Anthony G. Doran; Ian T Fiddes; Monica Abrudan; Joel Armstrong; Ruth Bennett; William Chow; Joanna Collins; Anne Czechanski; Petr Danecek; Mark Diekhans; Dirk-Dominic Dolle; Matthew Dunn; Richard Durbin; Dent Earl; Anne C. Ferguson-Smith; Paul Flicek; Jonathan Flint; Adam Frankish; Beiyuan Fu; Mark Gerstein; James Gilbert; Leo Goodstadt; Jennifer Harrow; Kerstin Howe; Mikhail Kolmogorov; Stefanie Koenig; Chris Lelliott; Jane Loveland; Richard Mott
The most commonly employed mammalian model organism is the laboratory mouse. A wide variety of genetically diverse inbred mouse strains, representing distinct physiological states, disease susceptibilities, and biological mechanisms have been developed over the last century. We report full length draft de novo genome assemblies for 16 of the most widely used inbred strains and reveal for the first time extensive strain-specific haplotype variation. We identify and characterise 2,567 regions on the current Genome Reference Consortium mouse reference genome exhibiting the greatest sequence diversity between strains. These regions are enriched for genes involved in defence and immunity, and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. Several immune related loci, some in previously identified QTLs for disease response have novel haplotypes not present in the reference that may explain the phenotype. We used these genomes to improve the mouse reference genome resulting in the completion of 10 new gene structures, and 62 new coding loci were added to the reference genome annotation. Notably this high quality collection of genomes revealed a previously unannotated gene (Efcab3-like) encoding 5,874 amino acids, one of the largest known in the rodent lineage. Interestingly, Efcab3-like−/− mice exhibit severe size anomalies in four regions of the brain suggesting a mechanism of Efcab3-like regulating brain development.
bioRxiv | 2017
Bernardo Rodriguez-Martin; Eva G. Alvarez; Adrian Baez-Ortega; Jonas Demeulemeester; Young Seok Ju; Jorge Zamora; Harald Detering; Yilong Li; Gianmarco Contino; Stefan Dentro; Alicia L. Bruzos; Ana Dueso-Barroso; Daniel Ardeljan; Marta Tojo; Nicola D. Roberts; Miguel Blanco; Paul A.W. Edwards; Joachim Weischenfeldt; Martin Santamarina; Montserrat Puiggròs; Zechen Chong; Ken Chen; Eunjung Lee; Jeremiah Wala; Keiran Raine; Adam Butler; Sebastian M. Waszak; Fabio C. P. Navarro; Steven E. Schumacher; Jean Monlong
About half of all cancers have somatic integrations of retrotransposons. To characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 37 histological cancer subtypes. We identified 19,166 somatically acquired retrotransposition events, affecting 35% of samples, and spanning a range of event types. L1 insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, sometimes removing tumour suppressor genes, as well as inducing complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage-fusion-bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of L1 retrotransposition in remodeling the cancer genome, with potential implications in the development of human tumours.
Nature Communications | 2016
Jane E. Freedman; Mark Gerstein; Eric Mick; Joel Rozowsky; Daniel Levy; Robert R. Kitchen; Saumya Das; Ravi V. Shah; Kirsty Danielson; Lea M. Beaulieu; Fabio C. P. Navarro; Yaoyu Wang; Timur R. Galeev; Alex Holman; Raymond Y. Kwong; Venkatesh L. Murthy; Selim E. Tanriverdi; Milka Koupenova; Ekaterina Mikhalev
Nature Communications 7: Article number: 11106 10.1038/ncomms11106 (2016); Published: April262016; Updated: June032016 The original version of this Article contained an error in the spelling of the author Milka Koupenova, which was incorrectly given as Milka Koupenova-Zamor. This has now been corrected in both the PDF and HTML versions of the Article.
bioRxiv | 2018
Gamze Gursoy; Arif Harmanci; Molly Green; Fabio C. P. Navarro; Mark Gerstein
Functional genomics experiments on human subjects present a privacy conundrum. On one hand, many of the conclusions we infer from these experiments are not tied to the identity of individuals but represent universal statements about biology and disease. On the other hand, by virtue of the experimental procedure, the sequencing reads are tagged with small bits of patients9 variant information, which presents privacy challenges in terms of data sharing. There is great desire to share data as broadly as possible. Therefore, measuring the amount of variant information leaked in a variety of experiments, particularly in relation to the amount of sequencing, is a key first step in reducing information leakage and determining an appropriate set point for sharing with minimal leakage. To this end, we derived information-theoretic measures for the private information leaked in experiments and developed various file formats to reduce this during sharing. We show that high-depth experiments such as Hi-C provide accurate genotyping that can lead to large privacy leaks. Counterintuitively, low-depth experiments such as ChIP and single-cell RNA sequencing, although not useful for genotyping, can create strong quasi-identifiers for re-identification through linking attacks. We show that partial and incomplete genotypes from many of these experiments can further be combined to construct an individual9s complete variant set and identify phenotypes. We provide a proof-of-concept analytic framework, in which the amount of leaked information can be estimated from the depth and breadth of the coverage as well as sequencing biases of a given functional genomics experiment. Finally, as a practical instantiation of our framework, we propose file formats that maximize the potential sharing of data while protecting individuals9 sensitive information. Depending on the desired sharing set point, our proposed format can achieve differential trade-offs in the privacy-utility balance. At the highest level of privacy, we mask all the variants leaked from reads, but still can create useable signal profiles that give complete recovery of the original gene expression levels.The generation of functional genomics datasets is surging, as they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intention of functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to share raw reads for better analyses and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, thus enabling principled privacy-utility trade-offs. It works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA-sequencing. The procedure depends on quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.
Nucleic Acids Research | 2018
Adam Frankish; Mark Diekhans; Anne-Maud Ferreira; Rory Johnson; Irwin Jungreis; Jane Loveland; Jonathan M Mudge; Cristina Sisu; James C. Wright; Joel Armstrong; If Barnes; Andrew E Berry; Alexandra Bignell; Silvia Carbonell Sala; Jacqueline Chrast; Fiona Cunningham; Tomás Di Domenico; Sarah Donaldson; Ian T Fiddes; Carlos García Girón; Jose Gonzalez; Tiago Grego; Matthew Hardy; Thibaut Hourlier; Toby Hunt; Osagie G Izuogu; Julien Lagarde; Fergal J Martin; Laura Martínez; Shamika Mohanan
Abstract The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Nature Genetics | 2018
Jingtao Lilue; Anthony G. Doran; Ian T Fiddes; Monica Abrudan; Joel Armstrong; Ruth Bennett; William Chow; Joanna Collins; Stephan C. Collins; Anne Czechanski; Petr Danecek; Mark Diekhans; Dirk-Dominik Dolle; Matthew Dunn; Richard Durbin; Dent Earl; Anne C. Ferguson-Smith; Paul Flicek; Jonathan Flint; Adam Frankish; Beiyuan Fu; Mark Gerstein; James Gilbert; Leo Goodstadt; Jennifer Harrow; Kerstin Howe; Ximena Ibarra-Soria; Mikhail Kolmogorov; Chris Lelliott; Darren W. Logan
We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.Sequence assemblies for the genomes of 16 widely used inbred laboratory mouse strains highlight considerable strain-specific haplotype variation and allow for the identification of regions with the greatest sequence diversity between strains.
Genome Biology | 2018
Timothy Becker; Wan-Ping Lee; Joseph Leone; Qihui Zhu; Chengsheng Zhang; Silvia Liu; Jack Sargent; Kritika Shanker; Adam Mil-homens; Eliza Cerveira; Mallory Ryan; Jane Cha; Fabio C. P. Navarro; Timur R. Galeev; Mark Gerstein; Ryan E. Mills; Dong-Guk Shin; Charles Lee; Ankit Malhotra
Comprehensive and accurate identification of structural variations (SVs) from next generation sequencing data remains a major challenge. We develop FusorSV, which uses a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms. It includes a fusion model built using analysis of 27 deep-coverage human genomes from the 1000 Genomes Project. We identify 843 novel SV calls that were not reported by the 1000 Genomes Project for these 27 samples. Experimental validation of a subset of these calls yields a validation rate of 86.7%. FusorSV is available at https://github.com/TheJacksonLaboratory/SVE.