Pere Puigbò
National Institutes of Health
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pere Puigbò.
Nucleic Acids Research | 2007
Pere Puigbò; Eduard Guzmán; Antoni Romeu; Santiago Garcia-Vallvé
OPTIMIZER is an on-line application that optimizes the codon usage of a gene to increase its expression level. Three methods of optimization are available: the ‘one amino acid–one codon’ method, a guided random method based on a Monte Carlo algorithm, and a new method designed to maximize the optimization with the fewest changes in the query sequence. One of the main features of OPTIMIZER is that it makes it possible to optimize a DNA sequence using pre-computed codon usage tables from a predicted group of highly expressed genes from more than 150 prokaryotic species under strong translational selection. These groups of highly expressed genes have been predicted using a new iterative algorithm. In addition, users can use, as a reference set, a pre-computed table containing the mean codon usage of ribosomal protein genes and, as a novelty, the tRNA gene-copy numbers. OPTIMIZER is accessible free of charge at http://genomes.urv.es/OPTIMIZER.
Biology Direct | 2008
Pere Puigbò; Ignacio G. Bravo; Santiago Garcia-Vallvé
BackgroundThe Codon Adaptation Index (CAI) was first developed to measure the synonymous codon usage bias for a DNA or RNA sequence. The CAI quantifies the similarity between the synonymous codon usage of a gene and the synonymous codon frequency of a reference set.ResultsWe describe here CAIcal, a web-server available at http://genomes.urv.es/CAIcal that includes a complete set of utilities related with the CAI. The server provides useful important features, such as the calculation and graphical representation of the CAI along either an individual sequence or a protein multiple sequence alignment translated to DNA. The automated calculation of CAI and its expected value is also included as one of the CAIcal tools. The software is also free to be downloaded as a standalone application for local use.ConclusionThe CAIcal server provides a complete set of tools to assess codon usage adaptation and to help in genome annotation.ReviewersThis article was reviewed by Purificación López-García, Dan Graur, Rob Knight and Shamil Sunyaev.
PLOS ONE | 2012
Natalya Yutin; Pere Puigbò; Eugene V. Koonin; Yuri I. Wolf
Archaeal and bacterial ribosomes contain more than 50 proteins, including 34 that are universally conserved in the three domains of cellular life (bacteria, archaea, and eukaryotes). Despite the high sequence conservation, annotation of ribosomal (r-) protein genes is often difficult because of their short lengths and biased sequence composition. We developed an automated computational pipeline for identification of r-protein genes and applied it to 995 completely sequenced bacterial and 87 archaeal genomes available in the RefSeq database. The pipeline employs curated seed alignments of r-proteins to run position-specific scoring matrix (PSSM)-based BLAST searches against six-frame genome translations, mitigating possible gene annotation errors. As a result of this analysis, we performed a census of prokaryotic r-protein complements, enumerated missing and paralogous r-proteins, and analyzed the distributions of ribosomal protein genes among chromosomal partitions. Phyletic patterns of bacterial and archaeal r-protein genes were mapped to phylogenetic trees reconstructed from concatenated alignments of r-proteins to reveal the history of likely multiple independent gains and losses. These alignments, available for download, can be used as search profiles to improve genome annotation of r-proteins and for further comparative genomics studies.
Environmental Microbiology | 2012
Michael Y. Galperin; Sergei L. Mekhedov; Pere Puigbò; Sergey Smirnov; Yuri I. Wolf; Daniel J. Rigden
Three classes of low-G+C Gram-positive bacteria (Firmicutes), Bacilli, Clostridia and Negativicutes, include numerous members that are capable of producing heat-resistant endospores. Spore-forming firmicutes include many environmentally important organisms, such as insect pathogens and cellulose-degrading industrial strains, as well as human pathogens responsible for such diseases as anthrax, botulism, gas gangrene and tetanus. In the best-studied model organism Bacillus subtilis, sporulation involves over 500 genes, many of which are conserved among other bacilli and clostridia. This work aimed to define the genomic requirements for sporulation through an analysis of the presence of sporulation genes in various firmicutes, including those with smaller genomes than B. subtilis. Cultivable spore-formers were found to have genomes larger than 2300 kb and encompass over 2150 protein-coding genes of which 60 are orthologues of genes that are apparently essential for sporulation in B. subtilis. Clostridial spore-formers lack, among others, spoIIB, sda, spoVID and safA genes and have non-orthologous displacements of spoIIQ and spoIVFA, suggesting substantial differences between bacilli and clostridia in the engulfment and spore coat formation steps. Many B. subtilis sporulation genes, particularly those encoding small acid-soluble spore proteins and spore coat proteins, were found only in the family Bacillaceae, or even in a subset of Bacillus spp. Phylogenetic profiles of sporulation genes, compiled in this work, confirm the presence of a common sporulation gene core, but also illuminate the diversity of the sporulation processes within various lineages. These profiles should help further experimental studies of uncharacterized widespread sporulation genes, which would ultimately allow delineation of the minimal set(s) of sporulation-specific genes in Bacilli and Clostridia.
Genome Biology and Evolution | 2010
Pere Puigbò; Yuri I. Wolf; Eugene V. Koonin
Phylogenetic trees of individual genes of prokaryotes (archaea and bacteria) generally have different topologies, largely owing to extensive horizontal gene transfer (HGT), suggesting that the Tree of Life (TOL) should be replaced by a “net of life” as the paradigm of prokaryote evolution. However, trees remain the natural representation of the histories of individual genes given the fundamentally bifurcating process of gene replication. Therefore, although no single tree can fully represent the evolution of prokaryote genomes, the complete picture of evolution will necessarily combine trees and nets. A quantitative measure of the signals of tree and net evolution is derived from an analysis of all quartets of species in all trees of the “Forest of Life” (FOL), which consists of approximately 7,000 phylogenetic trees for prokaryote genes including approximately 100 nearly universal trees (NUTs). Although diverse routes of net-like evolution collectively dominate the FOL, the pattern of tree-like evolution that reflects the consistent topologies of the NUTs is the most prominent coherent trend. We show that the contributions of tree-like and net-like evolutionary processes substantially differ across bacterial and archaeal lineages and between functional classes of genes. Evolutionary simulations indicate that the central tree-like signal cannot be realistically explained by a self-reinforcing pattern of biased HGT.
BMC Bioinformatics | 2008
Pere Puigbò; Ignacio G. Bravo; Santiago Garcia-Vallvé
BackgroundThe Codon Adaptation Index (CAI) is a measure of the synonymous codon usage bias for a DNA or RNA sequence. It quantifies the similarity between the synonymous codon usage of a gene and the synonymous codon frequency of a reference set. Extreme values in the nucleotide or in the amino acid composition have a large impact on differential preference for synonymous codons. It is thence essential to define the limits for the expected value of CAI on the basis of sequence composition in order to properly interpret the CAI and provide statistical support to CAI analyses. Though several freely available programs calculate the CAI for a given DNA sequence, none of them corrects for compositional biases or provides confidence intervals for CAI values.ResultsThe E-CAI server, available at http://genomes.urv.es/CAIcal/E-CAI, is a web-application that calculates an expected value of CAI for a set of query sequences by generating random sequences with G+C and amino acid content similar to those of the input. An executable file, a tutorial, a Frequently Asked Questions (FAQ) section and several examples are also available. To exemplify the use of the E-CAI server, we have analysed the codon adaptation of human mitochondrial genes that codify a subunit of the mitochondrial respiratory chain (excluding those genes that lack a prokaryotic orthologue) and are encoded in the nuclear genome. It is assumed that these genes were transferred from the proto-mitochondrial to the nuclear genome and that its codon usage was then ameliorated.ConclusionThe E-CAI server provides a direct threshold value for discerning whether the differences in CAI are statistically significant or whether they are merely artifacts that arise from internal biases in the G+C composition and/or amino acid composition of the query sequences.
Nucleic Acids Research | 2007
Pere Puigbò; Antoni Romeu; Santiago Garcia-Vallvé
The highly expressed genes database (HEG-DB) is a genomic database that includes the prediction of which genes are highly expressed in prokaryotic complete genomes under strong translational selection. The current version of the database contains general features for almost 200 genomes under translational selection, including the correspondence analysis of the relative synonymous codon usage for all genes, and the analysis of their highly expressed genes. For each genome, the database contains functional and positional information about the predicted group of highly expressed genes. This information can also be accessed using a search engine. Among other statistical parameters, the database also provides the Codon Adaptation Index (CAI) for all of the genes using the codon usage of the highly expressed genes as a reference set. The ‘Pathway Tools Omics Viewer’ from the BioCyc database enables the metabolic capabilities of each genome to be explored, particularly those related to the group of highly expressed genes. The HEG-DB is freely available at http://genomes.urv.cat/HEG-DB.
Applied and Environmental Microbiology | 2011
José M. González; Jarone Pinhassi; Beatriz Fernández-Gómez; Montserrat Coll-Lladó; Mónica González-Velázquez; Pere Puigbò; Sebastian Jaenicke; Laura Gómez-Consarnau; Antoni Fernández-Guerra; Alexander Goesmann; Carlos Pedrós-Alió
ABSTRACT Proteorhodopsin phototrophy is expected to have considerable impact on the ecology and biogeochemical roles of marine bacteria. However, the genetic features contributing to the success of proteorhodopsin-containing bacteria remain largely unknown. We investigated the genome of Dokdonia sp. strain MED134 (Bacteroidetes) for features potentially explaining its ability to grow better in light than darkness. MED134 has a relatively high number of peptidases, suggesting that amino acids are the main carbon and nitrogen sources. In addition, MED134 shares with other environmental genomes a reduction in gene copies at the expense of important ones, like membrane transporters, which might be compensated by the presence of the proteorhodopsin gene. The genome analyses suggest Dokdonia sp. MED134 is able to respond to light at least partly due to the presence of a strong flavobacterial consensus promoter sequence for the proteorhodopsin gene. Moreover, Dokdonia sp. MED134 has a complete set of anaplerotic enzymes likely to play a role in the adaptation of the carbon anabolism to the different sources of energy it can use, including light or various organic matter compounds. In addition to promoting growth, proteorhodopsin phototrophy could provide energy for the degradation of complex or recalcitrant organic matter, survival during periods of low nutrients, or uptake of amino acids and peptides at low concentrations. Our analysis suggests that the ability to harness light potentially makes MED134 less dependent on the amount and quality of organic matter or other nutrients. The genomic features reported here may well be among the keys to a successful photoheterotrophic lifestyle.
BMC Biology | 2013
Pere Puigbò; Yuri I. Wolf; Eugene V. Koonin
thicket of the phylogenetic forest’, published in 2009 in Journal of Biology [1] (see also the accompanying comment [2]), we presented evidence that the traditional Tree of Life (TOL) can and should be replaced with a statistical central trend in the genome-wide compendium of phylogenetic trees that reflects the coherence between the evolutionary histories of different genes and was later denoted the Statistical Tree Of Life (STOL) [3]. Since Darwin’s day, the TOL is the dominant icon of evolutionary biology [4,5], the basis of taxonomy and an essential framework for evolutionary reconstructions. In the late 1970s, ribosomal (r)RNA was introduced as a universal phylogenetic marker, primarily through the work of Carl Woese and colleagues [6,7], and the rRNA tree, complemented with trees for other universal genes such as the large RNA polymerase subunits, became the standard model for TOL study. Technical difficulties notwithstanding, progress in genome sequencing combined with advances in phylogenetic analysis seemed to put a well-resolved TOL within reach [8,9]. However, as soon as a reasonable number of complete genome sequences of bacteria and archaea became available, phylogenomics genome-wide phylogenetic analysis of individual gene trees hopelessly marred this neat picture by showing that the trees of different genes generally had different topologies. The topological inconsistencies between gene trees were far too extensive to be dismissed as phylogenetic artifacts, leading to the realization that no single gene tree, including those for universal genes such as rRNA, could represent the evolution of genomes in its entirety. Hence the concepts of horizontal genomics or a ‘net of life’ were brought about to replace the simple notion of the TOL [10,11]. In the extreme, several influential studies proposed to dispense with ‘tree thinking’ altogether as an artificial construct having little to do with actual evolution, at least as far as bacteria and archaea are concerned [12-15]. The concept of ‘horizontal genomics’ involves an internal contradiction because the notion of horizontal gene transfer (HGT) inherently implies the existence of a standard of vertical, tree-like evolution, and most of the existing methods for HGT detection are based on the comparison of gene trees to a standard ‘species tree’, in practice often the rRNA tree [16,17]. If the vertical standard does not exist, the concept of HGT becomes effectively meaningless, so all we can talk about is a network of life, with nodes corresponding to genomes and edges reflecting gene exchange [18]. The stakes here are high because replacement of the TOL with a network graph would change our entire perception of the process of evolution and invalidate all evolutionary recon struction based on a species tree. However, the tree representation is by no means superfluous to the description of evolution because the very process of the replication of genetic information implies a bifurcating graph in other words, a tree [19]. Thus, the key question is [1,20]: in the genome-wide compendium of phylogenetic trees, that we denoted the Forest Of Life (FOL), can we detect any order, any preferred tree topology (branching order) that would reflect a consensus of the topologies of other trees? We set out to address the above question as objectively as possible, first of all dispensing with any pre-selected standard of tree-like evolution. The analyzed FOL consisted of 6,901 maximum likelihood phylogenetic trees that were built for clusters of orthologous genes from a representative set of 100 diverse bacterial and archaeal genomes [1]. The complete matrix of topological distances between these trees was analyzed using the Inconsis tency Score, a measure that we defined speci fically for this purpose that reflects the average topological (in)consistency of a given tree with the rest of the trees in the FOL (for the details of the methods employed in this analysis, see [21]). Although the FOL includes very few trees with exactly identical topologies, we found that the topologies of the trees were far more congruent than expected by chance. The 102 Nearly Universal Trees Seeing the Tree of Life behind the phylogenetic forest
Genome Biology and Evolution | 2016
Jaime Iranzo; Pere Puigbò; Alexander E. Lobkovsky; Yuri I. Wolf; Eugene V. Koonin
Abstract Almost all cellular life forms are hosts to diverse genetic parasites with various levels of autonomy including plasmids, transposons and viruses. Theoretical modeling of the evolution of primordial replicators indicates that parasites (cheaters) necessarily evolve in such systems and can be kept at bay primarily via compartmentalization. Given the (near) ubiquity, abundance and diversity of genetic parasites, the question becomes pertinent: are such parasites intrinsic to life? At least in prokaryotes, the persistence of parasites is linked to the rate of horizontal gene transfer (HGT). We mathematically derive the threshold value of the minimal transfer rate required for selfish element persistence, depending on the element duplication and loss rates as well as the cost to the host. Estimation of the characteristic gene duplication, loss and transfer rates for transposons, plasmids and virus-related elements in multiple groups of diverse bacteria and archaea indicates that most of these rates are compatible with the long term persistence of parasites. Notably, a small but non-zero rate of HGT is also required for the persistence of non-parasitic genes. We hypothesize that cells cannot tune their horizontal transfer rates to be below the threshold required for parasite persistence without experiencing highly detrimental side-effects. As a lower boundary to the minimum DNA transfer rate that a cell can withstand, we consider the process of genome degradation and mutational meltdown of populations through Muller’s ratchet. A numerical assessment of this hypothesis suggests that microbial populations cannot purge parasites while escaping Muller’s ratchet. Thus, genetic parasites appear to be virtually inevitable in cellular organisms.