Pedro Seoane
University of Málaga
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pedro Seoane.
Biology | 2012
Manuel G. Claros; Rocío Bautista; Darío Guerrero-Fernández; Hicham Benzerki; Pedro Seoane; Noe Fernandez-Pozo
In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed.
Frontiers in Plant Science | 2015
Rosario Carmona; Adoración Zafra; Pedro Seoane; Antonio Jesús Castro; Darío Guerrero-Fernández; Trinidad Castillo-Castillo; Ana Medina-García; Francisco M. Cánovas; José F. Aldana-Montes; Ismael Navas-Delgado; Juan de Dios Alché; M. Gonzalo Claros
Plant reproductive transcriptomes have been analyzed in different species due to the agronomical and biotechnological importance of plant reproduction. Here we presented an olive tree reproductive transcriptome database with samples from pollen and pistil at different developmental stages, and leaf and root as control vegetative tissues http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads to 1,549 Sanger sequences. Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, mapped, and annotated with expression data, descriptions, GO terms, InterPro signatures, EC numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts (TTs) were also annotated with the corresponding orthologs in Arabidopsis thaliana from TAIR and RefSeq databases to enable Linked Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an ortholog. A minimum of 23,568 different TTs was identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome can be reduced to 28,972 TTs for further gene expression studies. Partial transcriptomes from pollen, pistil, and vegetative tissues as control were also constructed. ReprOlive provides free access and download capability to these results. Retrieval mechanisms for sequences and transcript annotations are provided. Graphical localization of annotated enzymes into KEGG pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of a Resource Description Framework (RDF) allowing a Linked Data search for extracting the most updated information related to enzymes, interactions, allergens, structures, and reactive oxygen species.
PLOS ONE | 2015
Sara Ocaña; Pedro Seoane; Rocío Bautista; Carmen Palomino; Gonzalo M. Claros; Ana Maria Torres; Eva Madrid
Faba bean is an important food crop worldwide. However, progress in faba bean genomics lags far behind that of model systems due to limited availability of genetic and genomic information. Using the Illumina platform the faba bean transcriptome from leaves of two lines (29H and Vf136) subjected to Ascochyta fabae infection have been characterized. De novo transcriptome assembly provided a total of 39,185 different transcripts that were functionally annotated, and among these, 13,266 were assigned to gene ontology against Arabidopsis. Quality of the assembly was validated by RT-qPCR amplification of selected transcripts differentially expressed. Comparison of faba bean transcripts with those of better-characterized plant genomes such as Arabidopsis thaliana, Medicago truncatula and Cicer arietinum revealed a sequence similarity of 68.3%, 72.8% and 81.27%, respectively. Moreover, 39,060 single nucleotide polymorphism (SNP) and 3,669 InDels were identified for genotyping applications. Mapping of the sequence reads generated onto the assembled transcripts showed that 393 and 457 transcripts were overexpressed in the resistant (29H) and susceptible genotype (Vf136), respectively. Transcripts involved in plant-pathogen interactions such as leucine rich proteins (LRR) or plant growth regulators involved in plant adaptation to abiotic and biotic stresses were found to be differently expressed in the resistant line. The results reported here represent the most comprehensive transcript database developed so far in faba bean, providing valuable information that could be used to gain insight into the pathways involved in the resistance mechanism against A. fabae and to identify potential resistance genes to be further used in marker assisted selection.
Biomedical Engineering Online | 2017
Rosario Carmona; Macarena Arroyo; María José Jiménez-Quesada; Pedro Seoane; Adoración Zafra; Rafael Larrosa; Juan de Dios Alché; M. Gonzalo Claros
BackgroundGene expression analyses demand appropriate reference genes (RGs) for normalization, in order to obtain reliable assessments. Ideally, RG expression levels should remain constant in all cells, tissues or experimental conditions under study. Housekeeping genes traditionally fulfilled this requirement, but they have been reported to be less invariant than expected; therefore, RGs should be tested and validated for every particular situation. Microarray data have been used to propose new RGs, but only a limited set of model species and conditions are available; on the contrary, RNA-seq experiments are more and more frequent and constitute a new source of candidate RGs.ResultsAn automated workflow based on mapped NGS reads has been constructed to obtain highly and invariantly expressed RGs based on a normalized expression in reads per mapped million and the coefficient of variation. This workflow has been tested with Roche/454 reads from reproductive tissues of olive tree (Olea europaea L.), as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana and three different human cancers (prostate, small-cell cancer lung and lung adenocarcinoma). Candidate RGs have been proposed for each species and many of them have been previously reported as RGs in literature. Experimental validation of significant RGs in olive tree is provided to support the algorithm.ConclusionRegardless sequencing technology, number of replicates, and library sizes, when RNA-seq experiments are designed and performed, the same datasets can be analyzed with our workflow to extract suitable RGs for subsequent PCR validation. Moreover, different subset of experimental conditions can provide different suitable RGs.
Briefings in Bioinformatics | 2018
Elena Rojano; Pedro Seoane; Juan A. G. Ranea; James R. Perkins
Abstract Variants within non-coding genomic regions can greatly affect disease. In recent years, increasing focus has been given to these variants, and how they can alter regulatory elements, such as enhancers, transcription factor binding sites and DNA methylation regions. Such variants can be considered regulatory variants. Concurrently, much effort has been put into establishing international consortia to undertake large projects aimed at discovering regulatory elements in different tissues, cell lines and organisms, and probing the effects of genetic variants on regulation by measuring gene expression. Here, we describe methods and techniques for discovering disease-associated non-coding variants using sequencing technologies. We then explain the computational procedures that can be used for annotating these variants using the information from the aforementioned projects, and prediction of their putative effects, including potential pathogenicity, based on rule-based and machine learning approaches. We provide the details of techniques to validate these predictions, by mapping chromatin–chromatin and chromatin–protein interactions, and introduce Clustered Regularly Interspaced Short Palindromic Repeats-Associated Protein 9 (CRISPR-Cas9) technology, which has already been used in this field and is likely to have a big impact on its future evolution. We also give examples of regulatory variants associated with multiple complex diseases. This review is aimed at bioinformaticians interested in the characterization of regulatory variants, molecular biologists and geneticists interested in understanding more about the nature and potential role of such variants from a functional point of views, and clinicians who may wish to learn about variants in non-coding genomic regions associated with a given disease and find out what to do next to uncover how they impact on the underlying mechanisms.
international conference on bioinformatics and biomedical engineering | 2016
Rosario Carmona; Pedro Seoane; Adoración Zafra; María José Jiménez-Quesada; Juan de Dios Alché; M. Gonzalo Claros
Expression analyses such as quantitative and/or real-time PCR require the use of reference genes for normalization in order to obtain reliable assessments. The expression levels of these reference genes must remain constant in all different experimental conditions and/or tissues under study. Traditionally, housekeeping genes have been used for this purpose, but most of them have been reported to vary their expression levels under some experimental conditions. Consequently, the election of the best reference genes should be tested and validated in every experimental scenario. Microarray data are not always available for the search of appropriate reference genes, but NGS experiments are increasingly common. For this reason, an automatic workflow based on mapped NGS reads is presented with the aim of obtaining putative reference genes for a giving species in the experimental conditions of interest. The calculation of the coefficient of variation (CV) and a simple, normalized expression value such as RPKM per transcript allows for filtering and selecting those transcripts expressed homogeneously and consistently in all analyzed conditions. This workflow has been tested with Roche/454 reads obtained from olive (Olea europaea L.) pollen and pistil at different developmental stages, as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana. Some of the putative candidate reference genes have been experimentally validated.
international conference on bioinformatics and biomedical engineering | 2015
David Velasco; Pedro Seoane; M. Gonzalo Claros
The use of RNA-Seq has transformed the way sequencing reads are analyzed, allowing for qualitative and quantitative studies of transcriptomes. These studies always include an important collection (usually > 40%) of unknown transcripts. In this study, we improve the capability of Full-LengtherNext, an algorithm developed in our laboratory to annotate, analyze and correct de novo transcriptomes, to detect of potentially coding sequences. Here we analyze five software implementations of coding sequence predictors and show that the use of high-quality sequences at the training stage, proper threshold selection during score interrogation and the algorithm adaptation to its input type have a profound effect on the accuracy of the prediction. TransDecoder, the best performing algorithm in our tests, was thus added to the Full-LenghterNext pipeline, significantly improving its coding prediction reliability. Moreover, these analyses served to make inferences about the quality of the sample and to extract the subset of species specific (perhaps novel) genes discovered in the transcriptome assembly. Indirectly, we also demonstrated that Full-LentherNext sequence classification is appropriate and worth taking into consideration.
international conference on bioinformatics and biomedical engineering | 2017
Elena Rojano; Pedro Seoane; Anibal Bueno-Amoros; James R. Perkins; Juan Antonio Garcia-Ranea
Recent advances in sequencing technologies allow researchers to investigate diseases resulting of genomic variation. This allows us to further develop the concept of precision medicine and determine the best treatment for each patient. We have focused on developing tools for studying genomic loci associated to pathological traits from the perspective of network analysis. We have obtained from DECIPHER database patient information which includes their affected genomic regions by Copy Number Variations (CNV) and their pathologies described as Human Phenotype Ontology phenotypes. We have used different metrics for calculating association values between phenotypes and affected genomic regions to determine which method fits better to our data. The results obtained in this work, can be used in prediction systems for determining and ranking which genomic regions are associated to a concrete phenotype, in order to help clinicians with their diagnosis.
international conference on bioinformatics and biomedical engineering | 2017
Marina Espigares; Pedro Seoane; Rocío Bautista; Julia Quintana; Luis A. Hernández Gómez; M. Gonzalo Claros
Gene expression analyses of non-model organisms must start with the construction of a high accurate de novo transcriptome as a reference. The best way to determine the suitability of any de novo transcriptome assembling is its comparison with other well-known “reference” transcriptomes. In this study, we took six complete plant transcriptomes (Arabidopsis thaliana, Vitis vinifera, Zea mays, Populus trichocarpa, Triticum aestivum and Oryza sativa) and compared all of them using a series of metrics system for a principal component analysis, resulting that A. thaliana and P. trichocarpa were the best references. This has been automated using AutoFlow. A primary assembly of short reads from Illumina Platform (50 nt, single reads) and long reads from Roche-454 technology from Castanea sativa was performed individually using k-mers from 25 to 35 and different assemblers (Oases v2, SOAPdenovoTrans, RAY, MIRA4 and MINIMUS). The resulting contigs were then reconciled with the aim of obtaining the best transcriptome. Oases and SOAP were used for the assembling of short reads, MIRA and MINIMUS for the assembling of long reads or the reconciliations, and RAY, that can compute de novo transcript assembling from heterogeneous (long and short reads) next-generation sequencing data, was included to avoid the reconciliation step. A total of 90 different assemblies were generated in a single run of the pipeline. A hierarchical clustering on the PCA components (HCPC) was implemented to automatically identify the best assembling strategies based on the shortest distance in HCPC to the two plant reference transcriptomes is selected. In this approach, reconciliation of Roche/454 long reads with Illumina contigs produce more complete and accurate gene reconstructions than other combinations. Surprisingly, reconstructions based only on Illumina and the ones creates with RAY seem to be less accurate. For this specific study, the most complete and accurate transcriptome corresponds to the Illumina contigs obtained with SOAPdenovoTrans and reassembled with 454 long reads using MIRA4. This is only a one example of a transcriptome building. Many other assembling can be performed just changing parameters, k-mers, sequencing technology, assemblers, reference organisms, etc. The pipeline in AutoFlow is easily customizable for those purposes.
Plant Biotechnology Journal | 2014
Javier Canales; Rocío Bautista; Philippe Label; Josefa Gómez-Maldonado; Isabelle Lesur; Noe Fernandez-Pozo; Marina Rueda-López; Darío Guerrero-Fernández; Vanessa Castro-Rodríguez; Hicham Benzekri; Rafael A. Cañas; M. A. Guevara; Andreia Rodrigues; Pedro Seoane; Caroline Teyssier; Alexandre Morel; François Ehrenmann; Grégoire Le Provost; Céline Lalanne; Céline Noirot; Christophe Klopp; Isabelle Reymond; Angel García-Gutiérrez; Jean-François Trontin; Marie-Anne Lelu-Walter; Célia Miguel; María Teresa Cervera; Francisco R. Cantón; Christophe Plomion; Luc Harvengt