Filipe de Sousa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Filipe de Sousa is active.

Explore More

Publication

Featured researches published by Filipe de Sousa.

Methods in Ecology and Evolution | 2013

Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data

Johan Bengtsson-Palme; Martin Ryberg; Martin Hartmann; Sara Branco; Zheng Wang; Anna Godhe; Pierre De Wit; Marisol Sánchez-García; Ingo Ebersberger; Filipe de Sousa; Anthony S. Amend; Ari Jumpponen; Martin Unterseher; Erik Kristiansson; Kessy Abarenkov; Yann J. K. Bertrand; Kemal Sanli; K. Martin Eriksson; Unni Vik; Vilmar Veldre; R. Henrik Nilsson

Summary 1. The nuclear ribosomal internal transcribed spacer (ITS) region is the primary choice for molecular identification of fungi. Its two highly variable spacers (ITS1 and ITS2) are usually species specific, whereas the intercalary 5.8S gene is highly conserved. For sequence clustering and BLAST searches, it is often advantageous to rely on either one of the variable spacers but not the conserved 5.8S gene. To identify and extract ITS1 and ITS2 from large taxonomic and environmental data sets is, however, often difficult, and many ITS sequences are incorrectly delimited in the public sequence databases. 2. We introduce ITSx, a Perl-based software tool to extract ITS1, 5.8S and ITS2 – as well as full-length ITS sequences – from both Sanger and high-throughput sequencing data sets. ITSx uses hidden Markov models computed from large alignments of a total of 20 groups of eukaryotes, including fungi, metazoans and plants, and the sequence extraction is based on the predicted positions of the ribosomal genes in the sequences. 3. ITSx has a very high proportion of true-positive extractions and a low proportion of false-positive extractions. Additionally, process parallelization permits expedient analyses of very large data sets, such as a one million sequence amplicon pyrosequencing data set. ITSx is rich in features and written to be easily incorporated into automated sequence analysis pipelines. 4. ITSx paves the way for more sensitive BLAST searches and sequence clustering operations for the ITS region in eukaryotes. The software also permits elimination of non-ITS sequences from any data set. This is particularly useful for amplicon-based next-generation sequencing data sets, where insidious non-target sequences are often found among the target sequences. Such non-target sequences are difficult to find by other means and would contribute noise to diversity estimates if left in the data set.

Microbes and Environments | 2015

A Comprehensive, Automatically Updated Fungal ITS Sequence Dataset for Reference-Based Chimera Control in Environmental Sequencing Efforts

R. Henrik Nilsson; Leho Tedersoo; Martin Ryberg; Erik Kristiansson; Martin Hartmann; Martin Unterseher; Teresita M. Porter; Johan Bengtsson-Palme; Donald M. Walker; Filipe de Sousa; Hannes A. Gamper; Ellen Larsson; Karl-Henrik Larsson; Urmas Kõljalg; Robert C. Edgar; Kessy Abarenkov

The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric—artificially joined—DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation.

Fungal Diversity | 2014

Improving ITS sequence data for identification of plant pathogenic fungi

R. Henrik Nilsson; Kevin D. Hyde; Julia Pawłowska; Martin Ryberg; Leho Tedersoo; Anders Bjørnsgard Aas; Siti Aisyah Alias; Artur Alves; Cajsa Lisa Anderson; Alexandre Antonelli; A. Elizabeth Arnold; Barbara Bahnmann; Mohammad Bahram; Johan Bengtsson-Palme; Anna Berlin; Sara Branco; Putarak Chomnunti; Asha J. Dissanayake; Rein Drenkhan; Hanna Friberg; Tobias Guldberg Frøslev; Bettina Halwachs; Martin Hartmann; Béatrice Henricot; Ruvishika S. Jayawardena; Ari Jumpponen; Håvard Kauserud; Sonja Koskela; Tomasz Kulik; Kare Liimatainen

SummaryPlant pathogenic fungi are a large and diverse assemblage of eukaryotes with substantial impacts on natural ecosystems and human endeavours. These taxa often have complex and poorly understood life cycles, lack observable, discriminatory morphological characters, and may not be amenable to in vitro culturing. As a result, species identification is frequently difficult. Molecular (DNA sequence) data have emerged as crucial information for the taxonomic identification of plant pathogenic fungi, with the nuclear ribosomal internal transcribed spacer (ITS) region being the most popular marker. However, international nucleotide sequence databases are accumulating numerous sequences of compromised or low-resolution taxonomic annotations and substandard technical quality, making their use in the molecular identification of plant pathogenic fungi problematic. Here we report on a concerted effort to identify high-quality reference sequences for various plant pathogenic fungi and to re-annotate incorrectly or insufficiently annotated public ITS sequences from these fungal lineages. A third objective was to enrich the sequences with geographical and ecological metadata. The results – a total of 31,954 changes – are incorporated in and made available through the UNITE database for molecular identification of fungi (http://unite.ut.ee), including standalone FASTA files of sequence data for local BLAST searches, use in the next-generation sequencing analysis platforms QIIME and mothur, and related applications. The present initiative is just a beginning to cover the wide spectrum of plant pathogenic fungi, and we invite all researchers with pertinent expertise to join the annotation effort.

PLOS ONE | 2014

Phylogenetic properties of 50 nuclear loci in Medicago (Leguminosae) generated using multiplexed sequence capture and next-generation sequencing.

Filipe de Sousa; Yann J. K. Bertrand; Stephan Nylinder; Bengt Oxelman; Jonna S. Eriksson; Bernard E. Pfeil

Next-generation sequencing technology has increased the capacity to generate molecular data for plant biological research, including phylogenetics, and can potentially contribute to resolving complex phylogenetic problems. The evolutionary history of Medicago L. (Leguminosae: Trifoliae) remains unresolved due to incongruence between published phylogenies. Identification of the processes causing this genealogical incongruence is essential for the inference of a correct species phylogeny of the genus and requires that more molecular data, preferably from low-copy nuclear genes, are obtained across different species. Here we report the development of 50 novel LCN markers in Medicago and assess the phylogenetic properties of each marker. We used the genomic resources available for Medicago truncatula Gaertn., hybridisation-based gene enrichment (sequence capture) techniques and Next-Generation Sequencing to generate sequences. This alternative proves to be a cost-effective approach to amplicon sequencing in phylogenetic studies at the genus or tribe level and allows for an increase in number and size of targeted loci. Substitution rate estimates for each of the 50 loci are provided, and an overview of the variation in substitution rates among a large number of low-copy nuclear genes in plants is presented for the first time. Aligned sequences of major species lineages of Medicago and its sister genus are made available and can be used in further probe development for sequence-capture of the same markers.

Systematic Biology | 2015

Assignment of Homoeologs to Parental Genomes in Allopolyploids for Species Tree Inference, with an Example from Fumaria (Papaveraceae)

Yann J. K. Bertrand; Anne-Cathrine Scheen; Thomas Marcussen; Bernard E. Pfeil; Filipe de Sousa; Bengt Oxelman

There is a rising awareness that species trees are best inferred from multiple loci while taking into account processes affecting individual gene trees, such as substitution model error (failure of the model to account for the complexity of the data) and coalescent stochasticity (presence of incomplete lineage sorting [ILS]). Although most studies have been carried out in the context of dichotomous species trees, these processes operate also in more complex evolutionary histories involving multiple hybridizations and polyploidy. Recently, methods have been developed that accurately handle ILS in allopolyploids, but they are thus far restricted to networks of diploids and tetraploids. We propose a procedure that improves on this limitation by designing a workflow that assigns homoeologs to hypothetical diploid ancestral genomes prior to genome tree construction. Conflicting assignment hypotheses are evaluated against substitution model error and coalescent stochasticity. Incongruence that cannot be explained by stochastic mechanisms needs to be explained by other processes (e.g., homoploid hybridization or paralogy). The data can then be filtered to build multilabeled genome phylogenies using inference methods that can recover species trees, either in the face of substitution model error and coalescent stochasticity alone, or while simultaneously accounting for hybridization. Methods are already available for folding the resulting multilabeled genome phylogeny into a network. We apply the workflow to the reconstruction of the reticulate phylogeny of the plant genus Fumaria (Papaveraceae) with ploidal levels ranging from 2[Formula: see text] to 14[Formula: see text]. We describe the challenges in recovering nuclear NRPB2 homoeologs in high ploidy species while combining in vivo cloning and direct sequencing techniques. Using parametric bootstrapping simulations we assign nuclear homoeologs and chloroplast sequences (four concatenated loci) to their common hypothetical diploid ancestral genomes. As these assignments hinge on effective population size assumptions, we investigate how varying these assumptions impacts the recovered multilabeled genome phylogeny.

Plant Systematics and Evolution | 2016

Patterns of phylogenetic incongruence in Medicago found among six loci

Filipe de Sousa; Yann J. K. Bertrand; Bernard E. Pfeil

The species phylogeny of Medicago L. (Leguminosae) remains unresolved, as there is significant incongruence between the published gene phylogenies. Here, we compare six of these gene phylogenies of Medicago, inferred from unlinked loci from the nuclear, chloroplast and mitochondrial genomes. Data from all loci were re-analysed, including gap-coding of initial data sets, and dated phylogenies were produced. The patterns of species relationships observed in the six dated phylogenies are compatible with several different biological processes, such as incomplete lineage sorting and hybridisation. A subset of the original sampling that included 29 taxa was also analysed using coalescent-based tree distance comparisons. The observed topological distances suggest that differences between gene phylogenies cannot be solely attributed to incomplete lineage sorting. Hybridisation is strongly suspected to have occurred in the history of many taxa in the genus, because of overlapping divergence times between suspected hybrids and each parental lineage, confirming earlier results based on only two genes. An attempt to reconcile the conflicting histories in a multispecies coalescent analysis, using multiple labels for taxa with hybrid histories, did not produce satisfactory results and may be fatally limited. We conclude that although the currently available data are not sufficient to clarify relationships in Medicago, many cases of hybridisation are probable. The phylogenetic history of the genus is therefore better understood as a network and not a single tree. This raises concerns over previous studies that have used single gene trees as summaries of the history of species relationships.

Systematic Biology | 2017

Using Genomic Location and Coalescent Simulation to Investigate Gene Tree Discordance in Medicago L.

Filipe de Sousa; Yann J. K. Bertrand; Jeff J. Doyle; Bengt Oxelman; Bernard E. Pfeil

Abstract.— Several well‐documented evolutionary processes are known to cause conflict between species‐level phylogenies and gene‐level phylogenies. Three of the most challenging processes for species tree inference are incomplete lineage sorting, hybridization and gene duplication, which may result in unwarranted comparisons of paralogous genes. Several existing methods have dealt with these processes but none has yet been able to untangle all three at once. Here, we propose a stepwise method by which these processes can be discerned using information on genomic location coupled with coalescent simulations. In the first step, highly discordant genes within genomic blocks (putative paralogs) are identified and excluded from the data set and, in the second step, blocks of linked genes are grouped according to their hybrid history. Existing multispecies coalescent software can then be applied to recover the principal tree(s) that make up the species tree/network without violating the underlying model. The potential of the approach is evaluated on simulated data derived from a species network composed of nine species, of which one is of hybrid origin, and displaying a single‐gene duplication that leads to paralogous comparisons. We apply our method to an empirical set of 12 genes from 7 species sampled in the plant genus Medicago that display phylogenetic discordance. We identify the causes of the discordance and demonstrate that the Medicago orbicularis lineage experienced an episode of ancient hybridization. Our results show promise as a new way to explore phylogenetic sequence data that can significantly improve species tree inference in presence of hybridization and undetected paralogy or other causes leading to extremely discordant gene trees. [Coalescent simulation; gene tree; genomic location; hybridization; incomplete lineage sorting; paralogy; phylogenetic incongruence; principal tree; species tree.]

Molecular Phylogenetics and Evolution | 2017

A cryptic species produced by autopolyploidy and subsequent introgression involving Medicago prostrata (Fabaceae)

Jonna S. Eriksson; J.L. Blanco-Pastor; Filipe de Sousa; Yann J. K. Bertrand; Bernard E. Pfeil

Although hybridisation through genome duplication is well known, hybridisation without genome duplication (homoploid hybrid speciation, HHS) is not. Few well-documented cases have been reported. A possible instance of HHS in Medicago prostrata Jacq. was suggested previously, based on only two genes and one individual. We tested whether this species was formed through HHS by sampling eight nuclear loci and 22 individuals, with additional individuals from related species, using gene capture and Illumina sequencing. Phylogenetic inference and coalescent simulations were performed to infer the causes of gene tree incongruence. We found no evidence that phylogenetic differences among M. prostrata individuals were the result of HHS. Instead, an autopolyploid origin of tetraploids with introgression from tetraploids of the M. sativa complex is likely. We argue that tetraploid M. prostrata individuals constitute a new species, characterised by a partially non-overlapping distribution and distinctive alleles (from the M. sativa complex). No gene flow from tetraploid to diploid M. prostrata is apparent, suggesting partial reproductive isolation. Thus, speciation via autopolyploidy appears to have been reinforced by introgression. This raises the intriguing possibility that introgressed alleles may be responsible for the increased range exploited by tetraploid M. prostrata with respect to that of the diploids.

Kew Bulletin | 2010

A revision of the South American genus Apuleia (Leguminosae, Cassieae)

Filipe de Sousa; Gwilym P. Lewis; Julie A. Hawkins

SummaryApuleia Mart., a genus of the Leguminosae native to South America, is revised. Species limits within the genus were tested using morphometrics and shape analysis of leaflets and fruits. Morphological evidence indicates that although there is great variation in Apuleia, the genus cannot be reliably separated into different species or infraspecific taxa. Apuleia is monospecific, comprising the single species A. leiocarpa (Vogel) J. F. Macbr.

BMC Evolutionary Biology | 2018

Allele phasing is critical to revealing a shared allopolyploid origin of Medicago arborea and M. strasseri (Fabaceae)

Jonna S. Eriksson; Filipe de Sousa; Yann J. K. Bertrand; Alexandre Antonelli; Bengt Oxelman; Bernard E. Pfeil

BackgroundWhole genome duplication plays a central role in plant evolution. There are two main classes of polyploid formation: autopolyploids which arise within one species by doubling of similar homologous genomes; in contrast, allopolyploidy (hybrid polyploidy) arise via hybridization and subsequent doubling of nonhomologous (homoeologous) genomes. The distinction between polyploid origins can be made using gene phylogenies, if alleles from each genome can be correctly retrieved. We examined whether two closely related tetraploid Mediterranean shrubs (Medicago arborea and M. strasseri) have an allopolyploid origin – a question that has remained unsolved despite substantial previous research. We sequenced and analyzed ten low-copy nuclear genes from these and related species, phasing all alleles. To test the efficacy of allele phasing on the ability to recover the evolutionary origin of polyploids, we compared these results to analyses using unphased sequences.ResultsIn eight of the gene trees the alleles inferred from the tetraploids formed two clades, in a non-sister relationship. Each of these clades was more closely related to alleles sampled from other species of Medicago, a pattern typical of allopolyploids. However, we also observed that alleles from one of the remaining genes formed two clades that were sister to one another, as is expected for autopolyploids. Trees inferred from unphased sequences were very different, with the tetraploids often placed in poorly supported and different positions compared to results obtained using phased alleles.ConclusionsThe complex phylogenetic history of M. arborea and M. strasseri is explained predominantly by shared allotetraploidy. We also observed that an increase in woodiness is correlated with polyploidy in this group of species and present a new possibility that woodiness could be a transgressive phenotype. Correctly phased homoeologues are likely to be critical for inferring the hybrid origin of allopolyploid species, when most genes retain more than one homoeologue. Ignoring homoeologous variation by merging the homoeologues can obscure the signal of hybrid polyploid origins and produce inaccurate results.

Explore More