Jaime Huerta-Cepas
Pompeu Fabra University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jaime Huerta-Cepas.
Nucleic Acids Research | 2015
Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi Tsafou; Michael Kuhn; Peer Bork; Lars Juhl Jensen; Christian von Mering
The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein–protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein–protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.
Nucleic Acids Research | 2016
Jaime Huerta-Cepas; Damian Szklarczyk; Kristoffer Forslund; Helen Cook; Davide Heller; Mathias C. Walter; Thomas Rattei; Daniel R. Mende; Shinichi Sunagawa; Michael Kuhn; Lars Juhl Jensen; Christian von Mering; Peer Bork
eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows for a better propagation of functional terms across nested OGs and led to the novel annotation of 95 890 previously uncharacterized OGs, increasing overall annotation coverage from 67% to 72%. The functional annotations of OGs have been expanded to also provide Gene Ontology terms, KEGG pathways and SMART/Pfam domains for each group. Moreover, eggNOG now provides pairwise orthology relationships within OGs based on analysis of phylogenetic trees. We have also incorporated a framework for quickly mapping novel sequences to OGs based on precomputed HMM profiles. Finally, eggNOG version 4.5 incorporates a novel data set spanning 2605 viral OGs, covering 5228 proteins from 352 viral proteomes. All data are accessible for bulk downloading, as a web-service, and through a completely redesigned web interface. The new access points provide faster searches and a number of new browsing and visualization capabilities, facilitating the needs of both experts and less experienced users. eggNOG v4.5 is available at http://eggnog.embl.de.
BMC Bioinformatics | 2010
Jaime Huerta-Cepas; Joaquín Dopazo; Toni Gabaldón
BackgroundMany bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale.ResultsHere we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETEs programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations.ConclusionsETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.
Nucleic Acids Research | 2014
Sean Powell; Kristoffer Forslund; Damian Szklarczyk; Kalliopi Trachana; Alexander Roth; Jaime Huerta-Cepas; Toni Gabaldón; Thomas Rattei; Christopher J. Creevey; Michael Kuhn; Lars Juhl Jensen; Christian von Mering; Peer Bork
With the increasing availability of various ‘omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.
Molecular Biology and Evolution | 2016
Jaime Huerta-Cepas; François Serra; Peer Bork
The Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. ETE is freely available at http://etetoolkit.org
Science | 2016
Simone S. Li; Ana Zhu; Vladimir Benes; Paul Igor Costea; Rajna Hercog; Falk Hildebrand; Jaime Huerta-Cepas; Max Nieuwdorp; Jarkko Salojärvi; Anita Yvonne Voigt; Georg Zeller; Shinichi Sunagawa; Willem M. de Vos; Peer Bork
Persistence of fecal transplants Fecal microbiota transplantation is a successful way of treating the distressing symptoms of irritable bowel disease or Clostridium difficile infection. The procedure is done by administering a concentrate of colonic bacteria from a healthy donor. Li et al. used metagenomic data to look at single-nucleotide variants after transplants in humans. Donor and recipient strains coexisted for at least 3 months. Some donor strains replaced related strains of the same species, but totally novel species from a donor were unlikely to thrive in a recipient. Rational design of personalized fecal transplant “cocktails” will therefore rely on resolution beyond the species level. Science, this issue p. 586 The mystery of the success of clinical microbial transplant therapy is beginning to be decoded. Fecal microbiota transplantation (FMT) has shown efficacy in treating recurrent Clostridium difficile infection and is increasingly being applied to other gastrointestinal disorders, yet the fate of native and introduced microbial strains remains largely unknown. To quantify the extent of donor microbiota colonization, we monitored strain populations in fecal samples from a recent FMT study on metabolic syndrome patients using single-nucleotide variants in metagenomes. We found extensive coexistence of donor and recipient strains, persisting 3 months after treatment. Colonization success was greater for conspecific strains than for new species, the latter falling within fluctuation levels observed in healthy individuals over a similar time frame. Furthermore, same-donor recipients displayed varying degrees of microbiota transfer, indicating individual patterns of microbiome resistance and donor-recipient compatibilities.
Genome Biology | 2007
Jaime Huerta-Cepas; Hernán Dopazo; Joaquín Dopazo; Toni Gabaldón
Background:Phylogenomics analyses serve to establish evolutionary relationships among organisms and their genes. A phylome, the complete collection of all gene phylogenies in a genome, constitutes a valuable source of information, but its use in large genomes still constitutes a technical challenge. The use of phylomes also requires the development of new methods that help us to interpret them.Results:We reconstruct here the human phylome, which includes the evolutionary relationships of all human proteins and their homologs among 39 fully sequenced eukaryotes. Phylogenetic techniques used include alignment trimming, branch length optimization, evolutionary model testing and maximum likelihood and Bayesian methods. Although differences with alternative topologies are minor, most of the trees support the Coelomata and Unikont hypotheses as well as the grouping of primates with laurasatheria to the exclusion of rodents. We assess the extent of gene duplication events and their relationship with the functional roles of the protein families involved. We find support for at least one, and probably two, rounds of whole genome duplications before vertebrate radiation. Using a novel algorithm that is independent from a species phylogeny, we derive orthology and paralogy relationships of human proteins among eukaryotic genomes.Conclusion:Topological variations among phylogenies for different genes are to be expected, highlighting the danger of gene-sampling effects in phylogenomic analyses. Several links can be established between the functions of gene families duplicated at certain phylogenetic splits and major evolutionary transitions in those lineages. The pipeline implemented here can be easily adapted for use in other organisms.
Nucleic Acids Research | 2011
Jaime Huerta-Cepas; Salvador Capella-Gutiérrez; Leszek P. Pryszcz; Ivan Denisov; Diego Kormes; Marina Marcet-Houben; Toni Gabaldón
The growing availability of complete genomic sequences from diverse species has brought about the need to scale up phylogenomic analyses, including the reconstruction of large collections of phylogenetic trees. Here, we present the third version of PhylomeDB (http://phylomeDB.org), a public database for genome-wide collections of gene phylogenies (phylomes). Currently, PhylomeDB is the largest phylogenetic repository and hosts 17 phylomes, comprising 416 093 trees and 165 840 alignments. It is also a major source for phylogeny-based orthology and paralogy predictions, covering about 5 million proteins in 717 fully-sequenced genomes. For each protein-coding gene in a seed genome, the database provides original and processed alignments, phylogenetic trees derived from various methods and phylogeny-based predictions of orthology and paralogy relationships. The new version of phylomeDB has been extended with novel data access and visualization features, including the possibility of programmatic access. Available seed species include model organisms such as human, yeast, Escherichia coli or Arabidopsis thaliana, but also alternative model species such as the human pathogen Candida albicans, or the pea aphid Acyrtosiphon pisum. Finally, PhylomeDB is currently being used by several genome sequencing projects that couple the genome annotation process with the reconstruction of the corresponding phylome, a strategy that provides relevant evolutionary insights.
BMC Bioinformatics | 2007
Fatima Al-Shahrour; Leonardo Arbiza; Hernán Dopazo; Jaime Huerta-Cepas; Pablo Minguez; David Montaner; Joaquín Dopazo
With the popularisation of high-throughput techniques, the need for procedures that help in the biological interpretation of results has increased enormously. Recently, new procedures inspired in systems biology criteria have started to be developed. Here we present FatiScan, a web-based program which implements a threshold-independent test for the functional interpretation of large-scale experiments that does not depend on the pre-selection of genes based on the multiple application of independent tests to each gene. The test implemented aims to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes. In addition, the test does not depend on the type of the data used for obtaining significance values, and consequently different types of biologically informative terms (gene ontology, pathways, functional motifs, transcription factor binding sites or regulatory sites from CisRed) can be applied to different classes of genome-scale studies. We exemplify its application in microarray gene expression, evolution and interactomics. Methods for gene set enrichment which, in addition, are independent from the original data and experimental design constitute a promising alternative for the functional profiling of genome-scale experiments. A web server that performs the test described and other similar ones can be found at: http://www.babelomics.org .
Nucleic Acids Research | 2014
Jaime Huerta-Cepas; Salvador Capella-Gutiérrez; Leszek P. Pryszcz; Marina Marcet-Houben; Toni Gabaldón
Phylogenetic trees representing the evolutionary relationships of homologous genes are the entry point for many evolutionary analyses. For instance, the use of a phylogenetic tree can aid in the inference of orthology and paralogy relationships, and in the detection of relevant evolutionary events such as gene family expansions and contractions, horizontal gene transfer, recombination or incomplete lineage sorting. Similarly, given the plurality of evolutionary histories among genes encoded in a given genome, there is a need for the combined analysis of genome-wide collections of phylogenetic trees (phylomes). Here, we introduce a new release of PhylomeDB (http://phylomedb.org), a public repository of phylomes. Currently, PhylomeDB hosts 120 public phylomes, comprising >1.5 million maximum likelihood trees and multiple sequence alignments. In the current release, phylogenetic trees are annotated with taxonomic, protein-domain arrangement, functional and evolutionary information. PhylomeDB is also a major source for phylogeny-based predictions of orthology and paralogy, covering >10 million proteins across 1059 sequenced species. Here we describe newly implemented PhylomeDB features, and discuss a benchmark of the orthology predictions provided by the database, the impact of proteome updates and the use of the phylome approach in the analysis of newly sequenced genomes and transcriptomes.