Josep F. Abril
University of Barcelona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Josep F. Abril.
PLOS Biology | 2007
Samuel Levy; Granger Sutton; Pauline C. Ng; Lars Feuk; Aaron L. Halpern; Brian Walenz; Nelson Axelrod; Jiaqi Huang; Ewen F. Kirkness; Gennady Denisov; Yuan Lin; Jeffrey R. MacDonald; Andy Wing Chun Pang; Mary Shago; Timothy B. Stockwell; Alexia Tsiamouri; Vineet Bafna; Vikas Bansal; Saul Kravitz; Dana Busam; Karen Beeson; Tina McIntosh; Karin A. Remington; Josep F. Abril; John Gill; Jon Borman; Yu-Hui Rogers; Marvin Frazier; Stephen W. Scherer; Robert L. Strausberg
Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.
Nature Methods | 2013
Tamara Steijger; Josep F. Abril; Pär G. Engström; Felix Kokocinski; Tim Hubbard; Roderic Guigó; Jennifer Harrow; Paul Bertone
We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.
Nature | 2002
Gernot Glöckner; Ludwig Eichinger; Karol Szafranski; Justin A. Pachebat; Alan T. Bankier; Paul H. Dear; Rüdiger Lehmann; Cornelia Baumgart; Genís Parra; Josep F. Abril; Roderic Guigó; Kai Kumpf; Budi Tunggal; Edward C. Cox; Michael A. Quail; Matthias Platzer; André Rosenthal; Angelika A. Noegel; Bart Barrell; Marie-Adèle Rajandream; Jeffrey G. Williams; Robert R. Kay; Adam Kuspa; Richard A. Gibbs; Richard Sucgang; Donna Muzny; Brian Desany; Kathy Zeng; Baoli Zhu; Pieter J. de Jong
The genome of the lower eukaryote Dictyostelium discoideum comprises six chromosomes. Here we report the sequence of the largest, chromosome 2, which at 8 megabases (Mb) represents about 25% of the genome. Despite an A + T content of nearly 80%, the chromosome codes for 2,799 predicted protein coding genes and 73 transfer RNA genes. This gene density, about 1 gene per 2.6 kilobases (kb), is surpassed only by Saccharomyces cerevisiae (one per 2 kb) and is similar to that of Schizosaccharomyces pombe (one per 2.5 kb). If we assume that the other chromosomes have a similar gene density, we can expect around 11,000 genes in the D. discoideum genome. A significant number of the genes show higher similarities to genes of vertebrates than to those of other fully sequenced eukaryotes. This analysis strengthens the view that the evolutionary position of D. discoideum is located before the branching of metazoa and fungi but after the divergence of the plant kingdom, placing it close to the base of metazoan evolution.
Proceedings of the National Academy of Sciences of the United States of America | 2003
Roderic Guigó; Emmanouil T. Dermitzakis; Pankaj K. Agarwal; Chris P. Ponting; Genís Parra; Alexandre Reymond; Josep F. Abril; Evan Keibler; Robert Lyle; Catherine Ucla; Michael R. Brent
A primary motivation for sequencing the mouse genome was to accelerate the discovery of mammalian genes by using sequence conservation between mouse and human to identify coding exons. Achieving this goal proved challenging because of the large proportion of the mouse and human genomes that is apparently conserved but apparently does not code for protein. We developed a two-stage procedure that exploits the mouse and human genome sequences to produce a set of genes with a much higher rate of experimental verification than previously reported prediction methods. RT-PCR amplification and direct sequencing applied to an initial sample of mouse predictions that do not overlap previously known genes verified the regions flanking one intron in 139 predictions, with verification rates reaching 76%. On average, the confirmed predictions show more restricted expression patterns than the mouse orthologs of known human genes, and two-thirds lack homologs in fish genomes, demonstrating the sensitivity of this dual-genome approach to hard-to-find genes. We verified 112 previously unknown homologs of known proteins, including two homeobox proteins relevant to developmental biology, an aquaporin, and a homolog of dystrophin. We estimate that transcription and splicing can be verified for >1,000 gene predictions identified by this method that do not overlap known genes. This is likely to constitute a significant fraction of the previously unknown, multiexon mammalian genes.
The International Journal of Developmental Biology | 2009
Emili Saló; Josep F. Abril; Teresa Adell; Francesc Cebrià; Kay Eckelt; Enrique Fernández-Taboada; Mette Handberg-Thorsager; Marta Iglesias; M. Dolores Molina; Gustavo Rodríguez-Esteban
Planarians can undergo dramatic changes in body size and regenerate their entire body plan from small pieces after cutting. This remarkable morphological plasticity has made them an excellent model in which to analyze phenomena such as morphogenesis, restoration of pattern and polarity, control of tissue proportions and tissue homeostasis. They have a unique population of pluripotent stem cells in the adult that can give rise to all differentiated cell types, including the germ cells. These cellular characteristics provide an excellent opportunity to study the mechanisms involved in the maintenance and differentiation of cell populations in intact and regenerating animals. Until recently, the planarian model system lacked opportunities for genetic analysis; however, this handicap was overcome in the last decade through the development of new molecular methods which have been successfully applied to planarians. These techniques have allowed analysis of the temporal and spatial expression of genes, as well as interference with gene function, generating the first phenotypes by loss or gain of function. Finally, the sequencing of the planarian genome has provided the essential tools for an in-depth analysis of the genomic regulation of this model system. In this review, we provide an overview of planarians as a model system for research into development and regeneration and describe new lines of investigation in this area.
Bioinformatics | 2000
Josep F. Abril; Roderic Guigó
gff2psis a program for visualizing annotations of genomic sequences. The program takes the annotated features on a genomic sequence in GFF format as input, and produces a visual output in PostScript. While it can be used in a very simple way, it also allows for a great degree of customization through a number of options and/or customization files.
BMC Genomics | 2010
Josep F. Abril; Francesc Cebrià; Gustavo Rodríguez-Esteban; Thomas Horn; Susanna Fraguas; Beatriz Calvo; Kerstin Bartscherer; Emili Saló
BackgroundFreshwater planarians are an attractive model for regeneration and stem cell research and have become a promising tool in the field of regenerative medicine. With the availability of a sequenced planarian genome, the recent application of modern genetic and high-throughput tools has resulted in revitalized interest in these animals, long known for their amazing regenerative capabilities, which enable them to regrow even a new head after decapitation. However, a detailed description of the planarian transcriptome is essential for future investigation into regenerative processes using planarians as a model system.ResultsIn order to complement and improve existing gene annotations, we used a 454 pyrosequencing approach to analyze the transcriptome of the planarian species Schmidtea mediterranea Altogether, 598,435 454-sequencing reads, with an average length of 327 bp, were assembled together with the ~10,000 sequences of the S. mediterranea UniGene set using different similarity cutoffs. The assembly was then mapped onto the current genome data. Remarkably, our Smed454 dataset contains more than 3 million novel transcribed nucleotides sequenced for the first time. A descriptive analysis of planarian splice sites was conducted on those Smed454 contigs that mapped univocally to the current genome assembly. Sequence analysis allowed us to identify genes encoding putative proteins with defined structural properties, such as transmembrane domains. Moreover, we annotated the Smed454 dataset using Gene Ontology, and identified putative homologues of several gene families that may play a key role during regeneration, such as neurotransmitter and hormone receptors, homeobox-containing genes, and genes related to eye function.ConclusionsWe report the first planarian transcript dataset, Smed454, as an open resource tool that can be accessed via a web interface. Smed454 contains significant novel sequence information about most expressed genes of S. mediterranea. Analysis of the annotated data promises to contribute to identification of gene families poorly characterized at a functional level. The Smed454 transcriptome data will assist in the molecular characterization of S. mediterranea as a model organism, which will be useful to a broad scientific community.
BMC Genomics | 2011
Enrique Fernández-Taboada; Gustavo Rodríguez-Esteban; Emili Saló; Josep F. Abril
BackgroundIn recent years, planaria have emerged as an important model system for research into stem cells and regeneration. Attention is focused on their unique stem cells, the neoblasts, which can differentiate into any cell type present in the adult organism. Sequencing of the Schmidtea mediterranea genome and some expressed sequence tag projects have generated extensive data on the genetic profile of these cells. However, little information is available on their protein dynamics.ResultsWe developed a proteomic strategy to identify neoblast-specific proteins. Here we describe the method and discuss the results in comparison to the genomic high-throughput analyses carried out in planaria and to proteomic studies using other stem cell systems. We also show functional data for some of the candidate genes selected in our proteomic approach.ConclusionsWe have developed an accurate and reliable mass-spectra-based proteomics approach to complement previous genomic studies and to further achieve a more accurate understanding and description of the molecular and cellular processes related to the neoblasts.
Methods of Molecular Biology | 2009
Enrique Blanco; Josep F. Abril
The sequence of many eukaryotic genomes is nowadays available from a personal computer to any researcher in the world-wide scientific community. However, the sequences are worthless without the adequate annotation of the biological meaningful elements. The annotation of the genes, in particular, is a challenging task that can not be tackled without the aid of specific bioinformatics tools. We present in this chapter a simple protocol mainly based on the combination of the program GeneID and other computational tools to annotate the location of a gene, which was previously annotated in D. melanogaster, in the recently assembled genome of D. yakuba.
Journal of Environmental Management | 2015
Marta Rusiñol; Xavier Fernandez-Cassi; N. Timoneda; Anna Carratalà; Josep F. Abril; Carolina Silvera; Maria José Figueras; Emiliano Gelati; Xavier Rodó; David Kay; Peter Wyn-Jones; Sílvia Bofill-Mas; Rosina Girones
Conventional wastewater treatment does not completely remove and/or inactive viruses; consequently, viruses excreted by the population can be detected in the environment. This study was undertaken to investigate the distribution and seasonality of human viruses and faecal indicator bacteria (FIB) in a river catchment located in a typical Mediterranean climate region and to discuss future trends in relation to climate change. Sample matrices included river water, untreated and treated wastewater from a wastewater treatment plant within the catchment area, and seawater from potentially impacted bathing water. Five viruses were analysed in the study. Human adenovirus (HAdV) and JC polyomavirus (JCPyV) were analysed as indicators of human faecal contamination of human pathogens; both were reported in urban wastewater (mean values of 10(6) and 10(5) GC/L, respectively), river water (10(3) and 10(2) GC/L) and seawater (10(2) and 10(1) GC/L). Human Merkel Cell polyomavirus (MCPyV), which is associated with Merkel Cell carcinoma, was detected in 75% of the raw wastewater samples (31/37) and quantified by a newly developed quantitative polymerase chain reaction (qPCR) assay with mean concentrations of 10(4) GC/L. This virus is related to skin cancer in susceptible individuals and was found in 29% and 18% of river water and seawater samples, respectively. Seasonality was only observed for norovirus genogroup II (NoV GGII), which was more abundant in cold months with levels up to 10(4) GC/L in river water. Human hepatitis E virus (HEV) was detected in 13.5% of the wastewater samples when analysed by nested PCR (nPCR). Secondary biological treatment (i.e., activated sludge) and tertiary sewage disinfection including chlorination, flocculation and UV radiation removed between 2.22 and 4.52 log10 of the viral concentrations. Climate projections for the Mediterranean climate areas and the selected river catchment estimate general warming and changes in precipitation distribution. Persistent decreases in precipitation during summer can lead to a higher presence of human viruses because river and sea water present the highest viral concentrations during warmer months. In a global context, wastewater management will be the key to preventing environmental dispersion of human faecal pathogens in future climate change scenarios.