Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael L. Tress is active.

Publication


Featured researches published by Michael L. Tress.


Genome Research | 2012

GENCODE: The reference human genome annotation for The ENCODE Project

Jennifer Harrow; Adam Frankish; José Manuel Rodríguez González; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen Aken; Daniel Barrell; Amonida Zadissa; Stephen M. J. Searle; I. Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles A. Steward; Rachel A. Harte; Mike Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael L. Tress

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.


Human Molecular Genetics | 2014

Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes

Iakes Ezkurdia; David Juan; Jose Manuel Rodriguez; Adam Frankish; Mark Diekhans; Jennifer Harrow; Jesús Vázquez; Alfonso Valencia; Michael L. Tress

Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.


Proceedings of the National Academy of Sciences of the United States of America | 2007

The implications of alternative splicing in the ENCODE protein complement.

Michael L. Tress; Pier Luigi Martelli; Adam Frankish; Gabrielle A. Reeves; Jan Jaap Wesselink; Corin Yeats; Páll ĺsólfur Ólason; Mario Albrecht; Hedi Hegyi; Alejandro Giorgetti; Domenico Raimondo; Julien Lagarde; Roman A. Laskowski; Gonzalo López; Michael I. Sadowski; James D. Watson; Piero Fariselli; Ivan Rossi; Alinda Nagy; Wang Kai; Zenia M Størling; Massimiliano Orsini; Yassen Assenov; Hagen Blankenburg; Carola Huthmacher; Fidel Ramírez; Andreas Schlicker; P. D. Jones; Samuel Kerrien; Sandra Orchard

Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.


PLOS ONE | 2012

Evidence for Transcript Networks Composed of Chimeric RNAs in Human Cells

Sarah Djebali; Julien Lagarde; Philipp Kapranov; Vincent Lacroix; Christelle Borel; Jonathan M. Mudge; Cédric Howald; Sylvain Foissac; Catherine Ucla; Jacqueline Chrast; Paolo Ribeca; David Martin; Ryan R. Murray; Xinping Yang; Lila Ghamsari; Chenwei Lin; Ian Bell; Erica Dumais; Jorg Drenkow; Michael L. Tress; Josep Lluís Gelpí; Modesto Orozco; Alfonso Valencia; Nynke L. van Berkum; Bryan R. Lajoie; Marc Vidal; John A. Stamatoyannopoulos; Philippe Batut; Alexander Dobin; Jennifer Harrow

The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.


Briefings in Bioinformatics | 2008

Progress and challenges in predicting protein–protein interaction sites

Iakes Ezkurdia; Lisa Bartoli; Piero Fariselli; Rita Casadio; Alfonso Valencia; Michael L. Tress

The identification of protein-protein interaction sites is an essential intermediate step for mutant design and the prediction of protein networks. In recent years a significant number of methods have been developed to predict these interface residues and here we review the current status of the field. Progress in this area requires a clear view of the methodology applied, the data sets used for training and testing the systems, and the evaluation procedures. We have analysed the impact of a representative set of features and algorithms and highlighted the problems inherent in generating reliable protein data sets and in the posterior analysis of the results. Although it is clear that there have been some improvements in methods for predicting interacting sites, several major bottlenecks remain. Proteins in complexes are still under-represented in the structural databases and in particular many proteins involved in transient complexes are still to be crystallized. We provide suggestions for effective feature selection, and make it clear that community standards for testing, training and performance measures are necessary for progress in the field.


Proteins | 2005

Assessment of predictions submitted for the CASP6 comparative modeling category

Michael L. Tress; Iakes Ezkurdia; Osvaldo Graña; Gonzalo López; Alfonso Valencia

Here we present a full overview of the Critical Assessment of Protein Structure Prediction (CASP6) comparative modeling category. Prediction accuracy for the 43 comparative modeling targets was assessed through detailed numerical comparisons between predicted and experimental structures. Assessments using standard measures for model backbone quality and structural alignment accuracy highlighted a small number of groups with stand out predictions and these findings were backed up by statistical comparisons. We were able to carry out evaluations of side‐chain contacts predictions and side‐chain rotamer accuracy, for which one group turned out to have statistically better predictions. We also assessed the prediction quality of structurally divergent regions and biologically important sites. Interestingly we were able to show that predictors were not predicting these important functional regions with any greater accuracy than the rest of the structure. In addition we investigated the ability of predictors to build models that improve on the structural template and reached some tentative conclusions from comparisons with the previous CASP experiment. Proteins 2005;Suppl 7:27–45.


Proteins | 1999

Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure.

David Jones; Michael L. Tress; Kevin Bryson; Caroline Hadley

Analysis of our fold recognition results in the 3rd Critical Assessment in Structure Prediction (CASP3) experiment, using the programs THREADER 2 and GenTHREADER, shows an encouraging level of overall success. Of the 23 submitted predictions, 20 targets showed no clear sequence similarity to proteins of known 3D structure. These 20 targets can be divided into 22 domains, of which, 20 domains either entirely match a previously known fold, or partially match a substantial region of a known fold. Of these 20 domains, we correctly assigned the folds in 10 cases. Proteins Suppl 1999:3:104–111.


Proteins | 2005

CASP6 Assessment of Contact Prediction

Osvaldo Graña; David Baker; Robert M. MacCallum; Jens Meiler; Marco Punta; Burkhard Rost; Michael L. Tress; Alfonso Valencia

Here we present the evaluation results of the Critical Assessment of Protein Structure Prediction (CASP6) contact prediction category. Contact prediction was assessed with standard measures well known in the field and the performance of specialist groups was evaluated alongside groups that submitted models with 3D coordinates. The evaluation was mainly focused on long range contact predictions for the set of new fold targets, although we analyzed predictions for all targets. Three groups with similar levels of accuracy and coverage performed a little better than the others. Comparisons of the predictions of the three best methods with those of CASP5/CAFASP3 suggested some improvement, although there were not enough targets in the comparisons to make this statistically significant. Proteins 2005;Suppl 7:214–224.


Genome Research | 2012

Chimeras taking shape: Potential functions of proteins encoded by chimeric RNA transcripts

Milana Frenkel-Morgenstern; Vincent Lacroix; Iakes Ezkurdia; Yishai Levin; Alexandra Gabashvili; Jaime Prilusky; Angela del Pozo; Michael L. Tress; Rory Johnson; Roderic Guigó; Alfonso Valencia

Chimeric RNAs comprise exons from two or more different genes and have the potential to encode novel proteins that alter cellular phenotypes. To date, numerous putative chimeric transcripts have been identified among the ESTs isolated from several organisms and using high throughput RNA sequencing. The few corresponding protein products that have been characterized mostly result from chromosomal translocations and are associated with cancer. Here, we systematically establish that some of the putative chimeric transcripts are genuinely expressed in human cells. Using high throughput RNA sequencing, mass spectrometry experimental data, and functional annotation, we studied 7424 putative human chimeric RNAs. We confirmed the expression of 175 chimeric RNAs in 16 human tissues, with an abundance varying from 0.06 to 17 RPKM (Reads Per Kilobase per Million mapped reads). We show that these chimeric RNAs are significantly more tissue-specific than non-chimeric transcripts. Moreover, we present evidence that chimeras tend to incorporate highly expressed genes. Despite the low expression level of most chimeric RNAs, we show that 12 novel chimeras are translated into proteins detectable in multiple shotgun mass spectrometry experiments. Furthermore, we confirm the expression of three novel chimeric proteins using targeted mass spectrometry. Finally, based on our functional annotation of exon organization and preserved domains, we discuss the potential features of chimeric proteins with illustrative examples and suggest that chimeras significantly exploit signal peptides and transmembrane domains, which can alter the cellular localization of cognate proteins. Taken together, these findings establish that some chimeric RNAs are translated into potentially functional proteins in humans.


Nucleic Acids Research | 2007

firestar—prediction of functionally important residues using structural templates and alignment reliability

Gonzalo López; Alfonso Valencia; Michael L. Tress

Here we present firestar, an expert system for predicting ligand-binding residues in protein structures. The server provides a method for extrapolating from the large inventory of functionally important residues organized in the FireDB database and adds information about the local conservation of potential-binding residues. The interface allows users to make queries by protein sequence or structure. The user can access pairwise and multiple alignments with structures that have relevant functionally important binding sites. The results are presented in a series of easy to read displays that allow users to compare binding residue conservation across homologous proteins. The binding site residues can also be viewed with molecular visualization tools. One feature of firestar is that it can be used to evaluate the biological relevance of small molecule ligands present in PDB structures. With the server it is easy to discern whether small molecule binding is conserved in homologous structures. We found this facility particularly useful during the recent assessment of CASP7 function prediction. Availability: http://firedb.bioinfo.cnio.es/Php/FireStar.php.

Collaboration


Dive into the Michael L. Tress's collaboration.

Top Co-Authors

Avatar

Alfonso Valencia

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Iakes Ezkurdia

Centro Nacional de Investigaciones Cardiovasculares

View shared research outputs
Top Co-Authors

Avatar

Gonzalo López

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar

Adam Frankish

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

Jesús Vázquez

Centro Nacional de Investigaciones Cardiovasculares

View shared research outputs
Top Co-Authors

Avatar

Jose Manuel Rodriguez

Centro Nacional de Investigaciones Cardiovasculares

View shared research outputs
Top Co-Authors

Avatar

Jennifer Harrow

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

Federico Abascal

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar

David Juan

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar

Mark Diekhans

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge