Genís Parra
Max Planck Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Genís Parra.
Nature | 2004
Olivier Jaillon; Jean-Marc Aury; Frédéric Brunet; Jean-Louis Petit; Nicole Stange-Thomann; Evan Mauceli; Laurence Bouneau; Cécile Fischer; Catherine Ozouf-Costaz; Alain Bernot; Sophie Nicaud; David B. Jaffe; Sheila Fisher; Georges Lutfalla; Carole Dossat; Béatrice Segurens; Corinne Dasilva; Marcel Salanoubat; Michael Levy; Nathalie Boudet; Sergi Castellano; Véronique Anthouard; Claire Jubin; Vanina Castelli; Michael Katinka; Benoit Vacherie; Christian Biémont; Zineb Skalli; Laurence Cattolico; Julie Poulain
Tetraodon nigroviridis is a freshwater puffer fish with the smallest known vertebrate genome. Here, we report a draft genome sequence with long-range linkage and substantial anchoring to the 21 Tetraodon chromosomes. Genome analysis provides a greatly improved fish gene catalogue, including identifying key genes previously thought to be absent in fish. Comparison with other vertebrates and a urochordate indicates that fish proteins have diverged markedly faster than their mammalian homologues. Comparison with the human genome suggests ∼900 previously unannotated human genes. Analysis of the Tetraodon and human genomes shows that whole-genome duplication occurred in the teleost fish lineage, subsequent to its divergence from mammals. The analysis also makes it possible to infer the basic structure of the ancestral bony vertebrate genome, which was composed of 12 chromosomes, and to reconstruct much of the evolutionary history of ancient and recent chromosome rearrangements leading to the modern human karyotype.
Bioinformatics | 2007
Genís Parra; Keith Bradnam; Ian Korf
MOTIVATION The numbers of finished and ongoing genome projects are increasing at a rapid rate, and providing the catalog of genes for these new genomes is a key challenge. Obtaining a set of well-characterized genes is a basic requirement in the initial steps of any genome annotation process. An accurate set of genes is needed in order to learn about species-specific properties, to train gene-finding programs, and to validate automatic predictions. Unfortunately, many new genome projects lack comprehensive experimental data to derive a reliable initial set of genes. RESULTS In this study, we report a computational method, CEGMA (Core Eukaryotic Genes Mapping Approach), for building a highly reliable set of gene annotations in the absence of experimental data. We define a set of conserved protein families that occur in a wide range of eukaryotes, and present a mapping procedure that accurately identifies their exon-intron structures in a novel genomic sequence. CEGMA includes the use of profile-hidden Markov models to ensure the reliability of the gene structures. Our procedure allows one to build an initial set of reliable gene annotations in potentially any eukaryotic genome, even those in draft stages. AVAILABILITY Software and data sets are available online at http://korflab.ucdavis.edu/Datasets.
Nucleic Acids Research | 2009
Genís Parra; Keith Bradnam; Zemin Ning; Thomas M. Keane; Ian Korf
Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values.
FEBS Letters | 2005
Nuria Lopez-Bigas; Benjamin Audit; Christos A. Ouzounis; Genís Parra; Roderic Guigó
Disease‐causing point mutations are assumed to act predominantly through subsequent individual changes in the amino acid sequence that impair the normal function of proteins. However, point mutations can have a more dramatic effect by altering the splicing pattern of the gene. Here, we describe an approach to estimate the overall importance of splicing mutations. This approach takes into account the complete set of genes known to be involved in disease and suggest that, contrary to current assumptions, many mutations causing disease may actually be affecting the splicing pattern of the genes.
Nature | 2002
Gernot Glöckner; Ludwig Eichinger; Karol Szafranski; Justin A. Pachebat; Alan T. Bankier; Paul H. Dear; Rüdiger Lehmann; Cornelia Baumgart; Genís Parra; Josep F. Abril; Roderic Guigó; Kai Kumpf; Budi Tunggal; Edward C. Cox; Michael A. Quail; Matthias Platzer; André Rosenthal; Angelika A. Noegel; Bart Barrell; Marie-Adèle Rajandream; Jeffrey G. Williams; Robert R. Kay; Adam Kuspa; Richard A. Gibbs; Richard Sucgang; Donna Muzny; Brian Desany; Kathy Zeng; Baoli Zhu; Pieter J. de Jong
The genome of the lower eukaryote Dictyostelium discoideum comprises six chromosomes. Here we report the sequence of the largest, chromosome 2, which at 8 megabases (Mb) represents about 25% of the genome. Despite an A + T content of nearly 80%, the chromosome codes for 2,799 predicted protein coding genes and 73 transfer RNA genes. This gene density, about 1 gene per 2.6 kilobases (kb), is surpassed only by Saccharomyces cerevisiae (one per 2 kb) and is similar to that of Schizosaccharomyces pombe (one per 2.5 kb). If we assume that the other chromosomes have a similar gene density, we can expect around 11,000 genes in the D. discoideum genome. A significant number of the genes show higher similarities to genes of vertebrates than to those of other fully sequenced eukaryotes. This analysis strengthens the view that the evolutionary position of D. discoideum is located before the branching of metazoa and fungi but after the divergence of the plant kingdom, placing it close to the base of metazoan evolution.
The Plant Cell | 2008
Alan B. Rose; Tali Elfersi; Genís Parra; Ian Korf
Introns that elevate mRNA accumulation have been found in a wide range of eukaryotes. However, not all introns affect gene expression, and direct testing is currently the only way to identify stimulatory introns. Our genome-wide analysis in Arabidopsis thaliana revealed that promoter-proximal introns as a group are compositionally distinct from distal introns and that the degree to which an individual intron matches the promoter-proximal intron profile is a strong predictor of its ability to increase expression. We found that the sequences responsible for elevating expression are dispersed throughout an enhancing intron, as is a candidate motif that is overrepresented in first introns and whose occurrence in tested introns is proportional to its effect on expression. The signals responsible for intron-mediated enhancement are apparently conserved between Arabidopsis and rice (Oryza sativa) despite the large evolutionary distance separating these plants.
Proceedings of the National Academy of Sciences of the United States of America | 2014
Sergi Castellano; Genís Parra; Federico Sánchez-Quinto; Fernando Racimo; Martin Kuhlwilm; Martin Kircher; Susanna Sawyer; Qiaomei Fu; Anja Heinze; Birgit Nickel; Jesse Dabney; Michael Siebauer; Louise White; Hernán A. Burbano; Gabriel Renaud; Udo Stenzel; Carles Lalueza-Fox; Marco de la Rasilla; Antonio Rosas; Pavao Rudan; Dejana Brajković; Željko Kućan; Ivan Gušić; Michael V. Shunkov; Anatoli P. Derevianko; Bence Viola; Matthias Meyer; Janet Kelso; Aida M. Andrés; Svante Pääbo
Significance We use a hybridization approach to enrich the DNA from the protein-coding fraction of the genomes of two Neandertal individuals from Spain and Croatia. By analyzing these two exomes together with the genome sequence of a Neandertal from Siberia we show that the genetic diversity of Neandertals was lower than that of present-day humans and that the pattern of coding variation suggests that Neandertal populations were small and isolated from one another. We also show that genes involved in skeletal morphology have changed more than expected on the Neandertal evolutionary lineage whereas genes involved in pigmentation and behavior have changed more on the modern human lineage. We present the DNA sequence of 17,367 protein-coding genes in two Neandertals from Spain and Croatia and analyze them together with the genome sequence recently determined from a Neandertal from southern Siberia. Comparisons with present-day humans from Africa, Europe, and Asia reveal that genetic diversity among Neandertals was remarkably low, and that they carried a higher proportion of amino acid-changing (nonsynonymous) alleles inferred to alter protein structure or function than present-day humans. Thus, Neandertals across Eurasia had a smaller long-term effective population than present-day humans. We also identify amino acid substitutions in Neandertals and present-day humans that may underlie phenotypic differences between the two groups. We find that genes involved in skeletal morphology have changed more in the lineage leading to Neandertals than in the ancestral lineage common to archaic and modern humans, whereas genes involved in behavior and pigmentation have changed more on the modern human lineage.
Current protocols in human genetics | 2003
Enrique Blanco; Genís Parra; Roderic Guigó
The unit describes the usage of geneid. geneid is a very efficient gene‐finding program. It allows for the analysis of large genomic sequences, including whole mammalian chromosomes. These sequences can be partially annotated, and geneid can be used to refine this initial annotation. Parameter configurations exist for a number of eukaryotic species. Geneid produces output in a variety of standard formats. The results, thus, can be processed by a variety of software tools, including visualization programs. Geneid software is in the public domain, and it is undergoing constant development. It is easy to install and use. Exhaustive benchmark evaluations show that geneid compares favorably with other existing gene finding tools.
Proceedings of the National Academy of Sciences of the United States of America | 2003
Roderic Guigó; Emmanouil T. Dermitzakis; Pankaj K. Agarwal; Chris P. Ponting; Genís Parra; Alexandre Reymond; Josep F. Abril; Evan Keibler; Robert Lyle; Catherine Ucla; Michael R. Brent
A primary motivation for sequencing the mouse genome was to accelerate the discovery of mammalian genes by using sequence conservation between mouse and human to identify coding exons. Achieving this goal proved challenging because of the large proportion of the mouse and human genomes that is apparently conserved but apparently does not code for protein. We developed a two-stage procedure that exploits the mouse and human genome sequences to produce a set of genes with a much higher rate of experimental verification than previously reported prediction methods. RT-PCR amplification and direct sequencing applied to an initial sample of mouse predictions that do not overlap previously known genes verified the regions flanking one intron in 139 predictions, with verification rates reaching 76%. On average, the confirmed predictions show more restricted expression patterns than the mouse orthologs of known human genes, and two-thirds lack homologs in fish genomes, demonstrating the sensitivity of this dual-genome approach to hard-to-find genes. We verified 112 previously unknown homologs of known proteins, including two homeobox proteins relevant to developmental biology, an aquaporin, and a homolog of dystrophin. We estimate that transcription and splicing can be verified for >1,000 gene predictions identified by this method that do not overlap known genes. This is likely to constitute a significant fraction of the previously unknown, multiexon mammalian genes.
Nucleic Acids Research | 2011
Genís Parra; Keith Bradnam; Alan B. Rose; Ian Korf
Introns in a wide range of organisms including plants, animals and fungi are able to increase the expression of the gene that they are contained in. This process of intron-mediated enhancement (IME) is most thoroughly studied in Arabidopsis thaliana, where it has been shown that enhancing introns are typically located near the promoter and are compositionally distinct from downstream introns. In this study, we perform a comprehensive comparative analysis of several sequenced plant genomes. We find that enhancing sequences are conserved in the multi-cellular plants but are either absent or unrecognizable in algae. IME signals are preferentially located towards the 5′-end of first introns but also appear to be enriched in 5′-UTRs and coding regions near the transcription start site. Enhancing introns are found most prominently in genes that are highly expressed in a wide range of tissues. Through site-directed mutagenesis in A. thaliana, we show that IME signals can be inserted or removed from introns to increase or decrease gene expression. Although we do not yet know the specific mechanism of IME, the predicted signals appear to be both functional and highly conserved.