M. Gonzalo Claros
University of Málaga
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by M. Gonzalo Claros.
BMC Bioinformatics | 2010
Juan Falgueras; Antonio J. Lara; Noe Fernandez-Pozo; Francisco R. Cantón; Guillermo Pérez-Trabado; M. Gonzalo Claros
BackgroundHigh-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms.ResultsSeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming.ConclusionsSeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts.
PLOS ONE | 2014
Luis G. Pérez-Rivas; José M. Jerez; Rosario Carmona; Vanessa de Luque; Luis Vicioso; M. Gonzalo Claros; Enrique Viguera; Bella Pajares; Alfonso Sánchez; Nuria Ribelles; Emilio Alba; José Lozano
Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse pattern after surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years, respectively. Although several clinical and pathological features have been used to discriminate between low- and high-risk patients, the identification of molecular biomarkers with prognostic value remains an unmet need in the current management of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in 71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developed early (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregated tumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarray data analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentially expressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs were down-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-risk group of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-relapsing patients (AUC = 0.993, p-value<0.05). Network analysis based on miRNA-target interactions curated by public databases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result in an overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-related microRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breast surgery.
Euphytica | 2000
M. Gonzalo Claros; Remedios Crespillo; Marı́a L. Aguilar; Francisco M. Cánovas
Málaga is a province of Spain where olive-trees are cultivated in a large range of environments, climates and soils. We have developed a reliable and reproducible method to detect RAPD and AP-PCR polymorphisms, using DNA from olive-tree (Olea europaea L.) leaves. Starting from their natural orchards, fifty-six olive-tree cultivars throughout Málaga province, including oil and table olive cultivars, were screened and grouped into 22 varieties. A total of 62 informative polymorphic loci that provide 601 conspicuous bands were enough to differentiate the varieties. Clustering analyses managing 3 different pairwise distances, as well as phylogenetic analyses, led to the same result: olive-trees in Málaga can be divided into three main groups. Group I (90% of certainty) contains wild type and two introduced varieties, group II (83% of certainty) covers some native olive-trees, and group III (58% of certainty) is an heterogeneous cluster that includes varieties originating and cultivated in a number of Andalusian locations. Geographic location seems to be the first responsible of this classification, and morphological traits are needed to justify the group III subclustering. These results are consistent with the hypothesis of autochthonic origin of most olive-tree cultivars, and have been used to support a Label of Origin for the olive oil produced by the varieties included in group II.
BMC Genomics | 2014
Hicham Benzekri; Paula Armesto; Xavier Cousin; Mireia Rovira; Diego Crespo; Manuel Alejandro Merlo; David Mazurais; Rocío Bautista; Darío Guerrero-Fernández; Noe Fernandez-Pozo; Marian Ponce; Carlos Infante; José Zambonino; Sabine Nidelet; Marta Gut; Laureana Rebordinos; Josep V. Planas; Marie-Laure Bégout; M. Gonzalo Claros; Manuel Manchado
BackgroundSenegalese sole (Solea senegalensis) and common sole (S. solea) are two economically and evolutionary important flatfish species both in fisheries and aquaculture. Although some genomic resources and tools were recently described in these species, further sequencing efforts are required to establish a complete transcriptome, and to identify new molecular markers. Moreover, the comparative analysis of transcriptomes will be useful to understand flatfish evolution.ResultsA comprehensive characterization of the transcriptome for each species was carried out using a large set of Illumina data (more than 1,800 millions reads for each sole species) and 454 reads (more than 5 millions reads only in S. senegalensis), providing coverages ranging from 1,384x to 2,543x. After a de novo assembly, 45,063 and 38,402 different transcripts were obtained, comprising 18,738 and 22,683 full-length cDNAs in S. senegalensis and S. solea, respectively. A reference transcriptome with the longest unique transcripts and putative non-redundant new transcripts was established for each species. A subset of 11,953 reference transcripts was qualified as highly reliable orthologs (>97% identity) between both species. A small subset of putative species-specific, lineage-specific and flatfish-specific transcripts were also identified. Furthermore, transcriptome data permitted the identification of single nucleotide polymorphisms and simple-sequence repeats confirmed by FISH to be used in further genetic and expression studies. Moreover, evidences on the retention of crystallins crybb1, crybb1-like and crybb3 in the two species of soles are also presented. Transcriptome information was applied to the design of a microarray tool in S. senegalensis that was successfully tested and validated by qPCR. Finally, transcriptomic data were hosted and structured at SoleaDB.ConclusionsTranscriptomes and molecular markers identified in this study represent a valuable source for future genomic studies in these economically important species. Orthology analysis provided new clues regarding sole genome evolution indicating a divergent evolution of crystallins in flatfish. The design of a microarray and establishment of a reference transcriptome will be useful for large-scale gene expression studies. Moreover, the integration of transcriptomic data in the SoleaDB will facilitate the management of genomic information in these important species.
BMC Genomics | 2011
Noe Fernandez-Pozo; Javier Canales; Darío Guerrero-Fernández; David P. Villalobos; Sara M. Díaz-Moreno; Rocío Bautista; Arantxa Flores-Monterroso; M. Ángeles Guevara; Pedro Perdiguero; Carmen Collada; M. Teresa Cervera; Álvaro Soto; Ricardo J. Ordás; Francisco R. Cantón; Concepción Ávila; Francisco M. Cánovas; M. Gonzalo Claros
BackgroundPinus pinaster is an economically and ecologically important species that is becoming a woody gymnosperm model. Its enormous genome size makes whole-genome sequencing approaches are hard to apply. Therefore, the expressed portion of the genome has to be characterised and the results and annotations have to be stored in dedicated databases.DescriptionEuroPineDB is the largest sequence collection available for a single pine species, Pinus pinaster (maritime pine), since it comprises 951 641 raw sequence reads obtained from non-normalised cDNA libraries and high-throughput sequencing from adult (xylem, phloem, roots, stem, needles, cones, strobili) and embryonic (germinated embryos, buds, callus) maritime pine tissues. Using open-source tools, sequences were optimally pre-processed, assembled, and extensively annotated (GO, EC and KEGG terms, descriptions, SNPs, SSRs, ORFs and InterPro codes). As a result, a 10.5× P. pinaster genome was covered and assembled in 55 322 UniGenes. A total of 32 919 (59.5%) of P. pinaster UniGenes were annotated with at least one description, revealing at least 18 466 different genes. The complete database, which is designed to be scalable, maintainable, and expandable, is freely available at: http://www.scbi.uma.es/pindb/. It can be retrieved by gene libraries, pine species, annotations, UniGenes and microarrays (i.e., the sequences are distributed in two-colour microarrays; this is the only conifer database that provides this information) and will be periodically updated. Small assemblies can be viewed using a dedicated visualisation tool that connects them with SNPs. Any sequence or annotation set shown on-screen can be downloaded. Retrieval mechanisms for sequences and gene annotations are provided.ConclusionsThe EuroPineDB with its integrated information can be used to reveal new knowledge, offers an easy-to-use collection of information to directly support experimental work (including microarray hybridisation), and provides deeper knowledge on the maritime pine transcriptome.
Euphytica | 2003
Rocío Bautista; Remedios Crespillo; Francisco M. Cánovas; M. Gonzalo Claros
There is an urgent need for the developmentof early identification techniques inolive-trees due to the economic importanceof cultivar identification in periods ofexpansion like now. We have been able toidentify 22 olive-tree cultivars using only10 different, specific, repeatable markers.These markers were designed by the cloningof significant RAPD bands obtained in PCRperformed on bulked DNA to retain thegenetic variability of each cultivar.Clones were partially or totally sequencedand new primers derived from thesesequences were used to obtain SequenceCharacterised Amplified Region (SCAR)fragments. We have demonstrated that theuse of the 10 SCAR markers is enough toprovide a simple, cheap, and reliableprocedure to identify 22 geographicallyrelated olive-tree cultivars.
DNA Research | 2014
Antonio Muñoz-Mérida; Enrique Viguera; M. Gonzalo Claros; Oswaldo Trelles
Automatic sequence annotation is an essential component of modern ‘omics’ studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ∼85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes.
Frontiers in Plant Science | 2015
Rosario Carmona; Adoración Zafra; Pedro Seoane; Antonio Jesús Castro; Darío Guerrero-Fernández; Trinidad Castillo-Castillo; Ana Medina-García; Francisco M. Cánovas; José F. Aldana-Montes; Ismael Navas-Delgado; Juan de Dios Alché; M. Gonzalo Claros
Plant reproductive transcriptomes have been analyzed in different species due to the agronomical and biotechnological importance of plant reproduction. Here we presented an olive tree reproductive transcriptome database with samples from pollen and pistil at different developmental stages, and leaf and root as control vegetative tissues http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads to 1,549 Sanger sequences. Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, mapped, and annotated with expression data, descriptions, GO terms, InterPro signatures, EC numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts (TTs) were also annotated with the corresponding orthologs in Arabidopsis thaliana from TAIR and RefSeq databases to enable Linked Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an ortholog. A minimum of 23,568 different TTs was identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome can be reduced to 28,972 TTs for further gene expression studies. Partial transcriptomes from pollen, pistil, and vegetative tissues as control were also constructed. ReprOlive provides free access and download capability to these results. Retrieval mechanisms for sequences and transcript annotations are provided. Graphical localization of annotated enzymes into KEGG pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of a Resource Description Framework (RDF) allowing a Linked Data search for extracting the most updated information related to enzymes, interactions, allergens, structures, and reactive oxygen species.
BMC Bioinformatics | 2009
Victoria Martín-Requena; Antonio Muñoz-Mérida; M. Gonzalo Claros; Oswaldo Trelles
BackgroundNowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that most users lack or have a very vague notion of. Moreover, programming skills could also be essential to analyse these data. An interactive, easy to use application seems therefore necessary to help researchers to extract full information from data and analyse them in a simple, powerful and confident way.ResultsPreP+07 is a standalone Windows XP application that presents a friendly interface for spot filtration, inter- and intra-slide normalization, duplicate resolution, dye-swapping, error removal and statistical analyses. Additionally, it contains two unique implementation of the procedures – double scan and Supervised Lowess-, a complete set of graphical representations – MA plot, RG plot, QQ plot, PP plot, PN plot – and can deal with many data formats, such as tabulated text, GenePix GPR and ArrayPRO. PreP+07 performance has been compared with the equivalent functions in Bioconductor using a tomato chip with 13056 spots. The number of differentially expressed genes considering p-values coming from the PreP+07 and Bioconductor Limma packages were statistically identical when the data set was only normalized; however, a slight variability was appreciated when the data was both normalized and scaled.ConclusionPreP+07 implementation provides a high degree of freedom in selecting and organizing a small set of widely used data processing protocols, and can handle many data formats. Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing of his/her microarray results and obtain a list of differentially expressed genes using PreP+07 without any programming skills. All of this gives support to scientists that have been using previous PreP releases since its first version in 2003.
Annals of Forest Science | 2007
Rocío Bautista; David P. Villalobos; Sara M. Díaz-Moreno; Francisco R. Cantón; Francisco M. Cánovas; M. Gonzalo Claros
Conifers are of great economic and ecological importance, but little is known concerning their genomic organization. This study is an attempt to obtain high-quality high-molecular-weight DNA from Pinus pinaster cotyledons and the construction of a pine BAC library. The preparation incorporates modifications like low centrifugation speeds, increase of EDTA concentration for plug maintenance, use of DNase inhibitors to reduce DNA degradation, use of polyvinylpyrrolidone and ascorbate to avoid secondary metabolites, and a brief electrophoresis of the plugs prior to their use. A total of 72 192 clones with an average insert size of 107 kb, which represents an equivalent of 11X pine haploid genomes, were obtained. The proportions of clones lacking inserts or containing chloroplast DNA are both approximately 1.6%. The library was screened with cDNA probes for seven genes, and two clones containing Fd-GOGAT sequences were found, one of them seemingly functional. Ongoing projects aimed at constructing a pine bacterial artificial chromosome library may benefit from the methods described here.RésuméLes conifères présentent un intérêt économique et écologique de premier plan mais restent très mal connus du point de vue de l’organisation de leur génome. Cette étude présente une tentative réussie de construction d’une banque BAC de séquences d’ADN de haute qualité et de poids moléculaire élevé à partir de cotylédons de Pinus pinaster. Le protocole de préparation se base sur des ajustements comme une baisse de la vitesse de centrifugation, une augmentation des concentrations d’EDTA dans les culots, l’utilisation d’inhibiteurs des ADNases pour limiter la dégradation de l’ADN, l’utilisation de polyvinylpyrrolidone et d’ascorbate pour éliminer les métabolites secondaires, et de brèves électrophorèses des culots. Un total de 72 192 clones a été obtenu, d’une dimension moyenne d’inserts de 107 kb et représentant l’équivalent de 11X du génome haploïde de pin. La proportion de clones dépourvus d’inserts ou contenant de l’ADN chloroplastique était de 1.6%. La banque a été testée avec des ADN complémentaires de 7 gènes, et deux clones contenant la séquence de la Fd-GOGAT ont été détectés. Des projets visant à construire une banque bactérienne artificielle (BAC) de chromosome de pin tireront bénéfice de l’utilisation de cette méthode.