Bruno Contreras-Moreira
Spanish National Research Council
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bruno Contreras-Moreira.
Nucleic Acids Research | 2007
Socorro Gama-Castro; Verónica Jiménez-Jacinto; Martín Peralta-Gil; Alberto Santos-Zavaleta; Mónica I Peñaloza-Spínola; Bruno Contreras-Moreira; Juan Segura-Salazar; Luis Muñiz-Rascado; Irma Martínez-Flores; Heladia Salgado; César Bonavides-Martínez; Cei Abreu-Goodger; Carlos Rodríguez-Penagos; Juan Miranda-Ríos; Enrique Merino; Araceli M. Huerta; Luis G. Treviño-Quintanilla; Julio Collado-Vides
RegulonDB (http://regulondb.ccg.unam.mx/) is the primary reference database offering curated knowledge of the transcriptional regulatory network of Escherichia coli K12, currently the best-known electronically encoded database of the genetic regulatory network of any free-living organism. This paper summarizes the improvements, new biology and new features available in version 6.0. Curation of original literature is, from now on, up to date for every new release. All the objects are supported by their corresponding evidences, now classified as strong or weak. Transcription factors are classified by origin of their effectors and by gene ontology class. We have now computational predictions for σ54 and five different promoter types of the σ70 family, as well as their corresponding −10 and −35 boxes. In addition to those curated from the literature, we added about 300 experimentally mapped promoters coming from our own high-throughput mapping efforts. RegulonDB v.6.0 now expands beyond transcription initiation, including RNA regulatory elements, specifically riboswitches, attenuators and small RNAs, with their known associated targets. The data can be accessed through overviews of correlations about gene regulation. RegulonDB associated original literature, together with more than 4000 curation notes, can now be searched with the Textpresso text mining engine.
Applied and Environmental Microbiology | 2013
Bruno Contreras-Moreira; Pablo Vinuesa
ABSTRACT GET_HOMOLOGUES is an open-source software package that builds on popular orthology-calling approaches making highly customizable and detailed pangenome analyses of microorganisms accessible to nonbioinformaticians. It can cluster homologous gene families using the bidirectional best-hit, COGtriangles, or OrthoMCL clustering algorithms. Clustering stringency can be adjusted by scanning the domain composition of proteins using the HMMER3 package, by imposing desired pairwise alignment coverage cutoffs, or by selecting only syntenic genes. The resulting homologous gene families can be made even more robust by computing consensus clusters from those generated by any combination of the clustering algorithms and filtering criteria. Auxiliary scripts make the construction, interrogation, and graphical display of core genome and pangenome sets easy to perform. Exponential and binomial mixture models can be fitted to the data to estimate theoretical core genome and pangenome sizes, and high-quality graphics can be generated. Furthermore, pangenome trees can be easily computed and basic comparative genomics performed to identify lineage-specific genes or gene family expansions. The software is designed to take advantage of modern multiprocessor personal computers as well as computer clusters to parallelize time-consuming tasks. To demonstrate some of these capabilities, we survey a set of 50 Streptococcus genomes annotated in the Orthologous Matrix (OMA) browser as a benchmark case. The package can be downloaded at http://www.eead.csic.es/compbio/soft/gethoms.php and http://maya.ccg.unam.mx/soft/gethoms.php.
PLOS ONE | 2009
Alfredo Mendoza-Vargas; Leticia Olvera; Maricela Olvera; Ricardo Grande; Leticia Vega-Alvarado; Blanca Taboada; Verónica Jiménez-Jacinto; Heladia Salgado; Katy Juárez; Bruno Contreras-Moreira; Araceli M. Huerta; Julio Collado-Vides
Despite almost 40 years of molecular genetics research in Escherichia coli a major fraction of its Transcription Start Sites (TSSs) are still unknown, limiting therefore our understanding of the regulatory circuits that control gene expression in this model organism. RegulonDB (http://regulondb.ccg.unam.mx/) is aimed at integrating the genetic regulatory network of E. coli K12 as an entirely bioinformatic project up till now. In this work, we extended its aims by generating experimental data at a genome scale on TSSs, promoters and regulatory regions. We implemented a modified 5′ RACE protocol and an unbiased High Throughput Pyrosequencing Strategy (HTPS) that allowed us to map more than 1700 TSSs with high precision. From this collection, about 230 corresponded to previously reported TSSs, which helped us to benchmark both our methodologies and the accuracy of the previous mapping experiments. The other ca 1500 TSSs mapped belong to about 1000 different genes, many of them with no assigned function. We identified promoter sequences and type of σ factors that control the expression of about 80% of these genes. As expected, the housekeeping σ70 was the most common type of promoter, followed by σ38. The majority of the putative TSSs were located between 20 to 40 nucleotides from the translational start site. Putative regulatory binding sites for transcription factors were detected upstream of many TSSs. For a few transcripts, riboswitches and small RNAs were found. Several genes also had additional TSSs within the coding region. Unexpectedly, the HTPS experiments revealed extensive antisense transcription, probably for regulatory functions. The new information in RegulonDB, now with more than 2400 experimentally determined TSSs, strengthens the accuracy of promoter prediction, operon structure, and regulatory networks and provides valuable new information that will facilitate the understanding from a global perspective the complex and intricate regulatory network that operates in E. coli.
Bioinformatics | 2002
Bruno Contreras-Moreira; Paul A. Bates
UNLABELLED To optimize the search for structural templates in protein comparative modelling, the query sequence is split into domains. The initial list of templates for each domain, extracted from PFAM plus PDB and SCOP, is then ranked according to sequence identity (%ID), coverage and resolution. If %ID is less than 30, secondary structure matching is used to filter out false templates. AVAILABILITY http://www.bmm.icnet.uk/~3djigsaw/dom_fish
Nature Genetics | 2003
Grant C. Sellar; Karen P. Watt; Genevieve J. Rabiasz; Euan A. Stronach; Li Li; Eric P. Miller; Charles Massie; Jayne Miller; Bruno Contreras-Moreira; Diane Scott; Iain Brown; Alastair Williams; Paul A. Bates; John F. Smyth; Hani Gabra
Epithelial ovarian cancer (EOC), the leading cause of death from gynecological malignancy, is a poorly understood disease. The typically advanced presentation of EOC with loco-regional dissemination in the peritoneal cavity and the rare incidence of visceral metastases are hallmarks of the disease. These features relate to the biology of the disease, which is a principal determinant of outcome. EOC arises as a result of genetic alterations sustained by the ovarian surface epithelium (OSE; ref. 3). The causes of these changes are unknown but are manifest by activation of oncogenes and inactivation of tumor-suppressor genes (TSGs). Our analysis of loss of heterozygosity at 11q25 identified OPCML (also called OBCAM), a member of the IgLON family of immunoglobulin (Ig) domain–containing glycosylphosphatidylinositol (GPI)-anchored cell adhesion molecules, as a candidate TSG in EOC. OPCML is frequently somatically inactivated in EOC by allele loss and by CpG island methylation. OPCML has functional characteristics consistent with TSG properties both in vitro and in vivo. A somatic missense mutation from an individual with EOC shows clear evidence of loss of function. These findings suggest that OPCML is an excellent candidate for the 11q25 ovarian cancer TSG. This is the first description to our knowledge of the involvement of the IgLON family in cancer.
Applied and Environmental Microbiology | 2008
Pablo Vinuesa; Keilor Rojas-Jiménez; Bruno Contreras-Moreira; Suresh K. Mahna; Braj Nandan Prasad; Hla Moe; Suresh B. Selvaraju; Heidemarie Thierfelder; Dietrich Werner
ABSTRACT A highly supported maximum-likelihood species phylogeny for the genus Bradyrhizobium was inferred from a supermatrix obtained from the concatenation of partial atpD, recA, glnII, and rpoB sequences corresponding to 33 reference strains and 76 bradyrhizobia isolated from the nodules of Glycine max (soybean) trap plants inoculated with soil samples from Myanmar, India, Nepal, and Vietnam. The power of the multigene approach using multiple strains per species was evaluated in terms of overall tree resolution and phylogenetic congruence, representing a practical and portable option for bacterial molecular systematics. Potential pitfalls of the approach are highlighted. Seventy-five of the isolates could be classified as B. japonicum type Ia (USDA110/USDA122-like), B. liaoningense, B. yuanmingense, or B. elkanii, whereas one represented a novel Bradyrhizobium lineage. Most Nepalese B. japonicum Ia isolates belong to a highly epidemic clone closely related to strain USDA110. Significant phylogenetic evidence against the monophyly of the of B. japonicum I and Ia lineages was found. Analysis of their DNA polymorphisms revealed high population distances, significant genetic differentiation, and contrasting population genetic structures, suggesting that the strains in the Ia lineage are misclassified as B. japonicum. The DNA polymorphism patterns of all species conformed to the expectations of the neutral mutation and population equilibrium models and, excluding the B. japonicum Ia lineage, were consistent with intermediate recombination levels. All species displayed epidemic clones and had broad geographic and environmental distribution ranges, as revealed by mapping climate types and geographic origins of the isolates on the species tree.
Nucleic Acids Research | 2010
Bruno Contreras-Moreira
3D-footprint is a living database, updated and curated on a weekly basis, which provides estimates of binding specificity for all protein–DNA complexes available at the Protein Data Bank. The web interface allows the user to: (i) browse DNA-binding proteins by keyword; (ii) find proteins that recognize a similar DNA motif and (iii) BLAST similar DNA-binding proteins, highlighting interface residues in the resulting alignments. Each complex in the database is dissected to draw interface graphs and footprint logos, and two complementary algorithms are employed to characterize binding specificity. Moreover, oligonucleotide sequences extracted from literature abstracts are reported in order to show the range of variant sites bound by each protein and other related proteins. Benchmark experiments, including comparisons with expert-curated databases RegulonDB and TRANSFAC, support the quality of structure-based estimates of specificity. The relevant content of the database is available for download as flat files and it is also possible to use the 3D-footprint pipeline to analyze protein coordinates input by the user. 3D-footprint is available at http://floresta.eead.csic.es/3dfootprint with demo buttons and a comprehensive tutorial that illustrates the main uses of this resource.
Journal of Molecular Biology | 2003
Bruno Contreras-Moreira; Paul W. Fitzjohn; Paul A. Bates
Comparative modelling of proteins is a predictive technique to build an atomic model for a given amino acid sequence, on the basis of the structures of other proteins (templates) that have been determined experimentally. Critical problems arise in this procedure: selecting the correct templates, aligning the query sequence with them and building the non-conserved surface loops. In this work, we apply a genetic algorithm, with crossover and mutation, as a new tool to overcome the first two. In silico protein recombination proves to be an effective way to exploit the variability of templates and sequence alignments to produce populations of optimized models by artificial selection. Despite some limitations, the procedure is shown to be robust to alignment errors, while simplifying the task of selecting templates, making it a good candidate for automatic building of reliable protein models.
BMC Bioinformatics | 2008
Vladimir Espinosa Angarica; Abel González Pérez; Ana Tereza Ribeiro de Vasconcelos; Julio Collado-Vides; Bruno Contreras-Moreira
BackgroundThe specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence.ResultsHere we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models.ConclusionOur results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition.
Journal of Molecular Biology | 2008
Irma Lozada-Chávez; Vladimir Espinosa Angarica; Julio Collado-Vides; Bruno Contreras-Moreira
Understanding the mechanisms by which transcriptional regulatory networks (TRNs) change through evolution is a fundamental problem.Here, we analyze this question using data from Escherichia coli and Bacillus subtilis, and find that paralogy relationships are insufficient to explain the global or local role observed for transcription factors (TFs) within regulatory networks. Our results provide a picture in which DNA-binding specificity, a molecular property that can be measured in different ways, is a predictor of the role of transcription factors. In particular, we observe that global regulators consistently display low levels of binding specificity, while displaying comparatively higher expression values in microarray experiments. In addition, we find a strong negative correlation between binding specificity and the number of co-regulators that help coordinate genetic expression on a genomic scale. A close look at several orthologous TFs,including FNR, a regulator found to be global in E. coli and local in B.subtilis, confirms the diagnostic value of specificity in order to understand their regulatory function, and highlights the importance of evaluating the metabolic and ecological relevance of effectors as another variable in the evolutionary equation of regulatory networks. Finally, a general model is presented that integrates some evolutionary forces and molecular properties,aiming to explain how regulons grow and shrink, as bacteria tune their regulation to increase adaptation.