Is this you? Create Your Porfile

Sandra Smit

Wageningen University and Research Centre

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sandra Smit is active.

Explore More

Publication

Featured researches published by Sandra Smit.

Bioinformatics | 2016

PanTools: representation, storage and exploration of pan-genomic data.

Siavash Sheikhizadeh; M. Eric Schranz; Mehmet Akdel; Dick de Ridder; Sandra Smit

MOTIVATION Next-generation sequencing technology is generating a wealth of highly similar genome sequences for many species, paving the way for a transition from single-genome to pan-genome analyses. Accordingly, genomics research is going to switch from reference-centric to pan-genomic approaches. We define the pan-genome as a comprehensive representation of multiple annotated genomes, facilitating analyses on the similarity and divergence of the constituent genomes at the nucleotide, gene and genome structure level. Current pan-genomic approaches do not thoroughly address scalability, functionality and usability. RESULTS We introduce a generalized De Bruijn graph as a pan-genome representation, as well as an online algorithm to construct it. This representation is stored in a Neo4j graph database, which makes our approach scalable to large eukaryotic genomes. Besides the construction algorithm, our software package, called PanTools, currently provides functionality for annotating pan-genomes, adding sequences, grouping genes, retrieving gene sequences or genomic regions, reconstructing genomes and comparing and querying pan-genomes. We demonstrate the performance of the tool using datasets of 62 E. coli genomes, 93 yeast genomes and 19 Arabidopsis thaliana genomes. AVAILABILITY AND IMPLEMENTATION The Java implementation of PanTools is publicly available at http://www.bif.wur.nl CONTACT [email protected].

Plant Journal | 2014

Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

Saulo Alves Aflitos; Elio Schijlen; Hans de Jong; Dick de Ridder; Sandra Smit; Richard Finkers; Jun Wang; Gengyun Zhang; Ning Li; Likai Mao; Freek T. Bakker; Rob Dirks; Timo M. Breit; Barbara Gravendeel; Henk Huits; Darush Struss; Ruth Swanson-Wagner; Hans van Leeuwen; Roeland C. H. J. van Ham; Laia Fito; Laetitia Guignier; Myrna Sevilla; Philippe Ellul; Eric Ganko; Arvind Kapur; Emannuel Reclus; Bernard de Geus; Henri van de Geest; Bas te Lintel Hekkert; Jan C. van Haarst

We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies.

G3: Genes, Genomes, Genetics | 2014

Fluorescence in situ hybridization and optical mapping to correct scaffold arrangement in the tomato genome

Lindsay A. Shearer; Lorinda K. Anderson; Hans de Jong; Sandra Smit; Jose Luis Goicoechea; Bruce A. Roe; Axin Hua; James J. Giovannoni; Stephen M. Stack

The order and orientation (arrangement) of all 91 sequenced scaffolds in the 12 pseudomolecules of the recently published tomato (Solanum lycopersicum, 2n = 2x = 24) genome sequence were positioned based on marker order in a high-density linkage map. Here, we report the arrangement of these scaffolds determined by two independent physical methods, bacterial artificial chromosome–fluorescence in situ hybridization (BAC-FISH) and optical mapping. By localizing BACs at the ends of scaffolds to spreads of tomato synaptonemal complexes (pachytene chromosomes), we showed that 45 scaffolds, representing one-third of the tomato genome, were arranged differently than predicted by the linkage map. These scaffolds occur mostly in pericentric heterochromatin where 77% of the tomato genome is located and where linkage mapping is less accurate due to reduced crossing over. Although useful for only part of the genome, optical mapping results were in complete agreement with scaffold arrangement by FISH but often disagreed with scaffold arrangement based on the linkage map. The scaffold arrangement based on FISH and optical mapping changes the positions of hundreds of markers in the linkage map, especially in heterochromatin. These results suggest that similar errors exist in pseudomolecules from other large genomes that have been assembled using only linkage maps to predict scaffold arrangement, and these errors can be corrected using FISH and/or optical mapping. Of note, BAC-FISH also permits estimates of the sizes of gaps between scaffolds, and unanchored BACs are often visualized by FISH in gaps between scaffolds and thus represent starting points for filling these gaps.

Genome Biology and Evolution | 2015

The Genome of Winter Moth (Operophtera brumata) Provides a Genomic Perspective on Sexual Dimorphism and Phenology.

Martijn F. L. Derks; Sandra Smit; Lucia Salis; Elio Schijlen; Alex Bossers; Christa Mateman; Agata S. Pijl; Dick de Ridder; M.A.M. Groenen; Marcel E. Visser; Hendrik-Jan Megens

The winter moth (Operophtera brumata) belongs to one of the most species-rich families in Lepidoptera, the Geometridae (approximately 23,000 species). This family is of great economic importance as most species are herbivorous and capable of defoliating trees. Genome assembly of the winter moth allows the study of genes and gene families, such as the cytochrome P450 gene family, which is known to be vital in plant secondary metabolite detoxification and host-plant selection. It also enables exploration of the genomic basis for female brachyptery (wing reduction), a feature of sexual dimorphism in winter moth, and for seasonal timing, a trait extensively studied in this species. Here we present a reference genome for the winter moth, the first geometrid and largest sequenced Lepidopteran genome to date (638 Mb) including a set of 16,912 predicted protein-coding genes. This allowed us to assess the dynamics of evolution on a genome-wide scale using the P450 gene family. We also identified an expanded gene family potentially linked to female brachyptery, and annotated the genes involved in the circadian clock mechanism as main candidates for involvement in seasonal timing. The genome will contribute to Lepidopteran genomic resources and comparative genomics. In addition, the genome enhances our ability to understand the genetic and molecular basis of insect seasonal timing and thereby provides a reference for future evolutionary and population studies on the winter moth.

Briefings in Bioinformatics | 2016

Computational pan-genomics: status, promises and challenges

Tobias Marschall; Manja Marz; Thomas Abeel; Louis J. Dijkstra; Bas E. Dutilh; Ali Ghaffaari; Paul J. Kersey; Wigard P. Kloosterman; Veli Mäkinen; Adam M. Novak; Benedict Paten; David Porubsky; Eric Rivals; Can Alkan; Jasmijn A. Baaijens; Paul I. W. de Bakker; Valentina Boeva; Raoul J. P. Bonnal; Francesca Chiaromonte; Rayan Chikhi; Francesca D. Ciccarelli; Robin Cijvat; Erwin Datema; Cornelia M. van Duijn; Evan E. Eichler; Corinna Ernst; Eleazar Eskin; Erik Garrison; Mohammed El-Kebir; Gunnar W. Klau

Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.

Briefings in Bioinformatics | 2015

Making the difference: integrating structural variation detection tools

Ke Lin; Sandra Smit; Guusje Bonnema; Gabino F. Sanchez-Perez; Dick de Ridder

From prokaryotes to eukaryotes, phenotypic variation, adaptation and speciation has been associated with structural variation between genomes of individuals within the same species. Many computer algorithms detecting such variations (callers) have recently been developed, spurred by the advent of the next-generation sequencing technology. Such callers mainly exploit split-read mapping or paired-end read mapping. However, as different callers are geared towards different types of structural variation, there is still no single caller that can be considered a community standard; instead, increasingly the various callers are combined in integrated pipelines. In this article, we review a wide range of callers, discuss challenges in the integration step and present a survey of pipelines used in population genomics studies. Based on our findings, we provide general recommendations on how to set-up such pipelines. Finally, we present an outlook on future challenges in structural variation detection.

Theoretical and Applied Genetics | 2012

Bin mapping of tomato diversity array (DArT) markers to genomic regions of Solanum lycopersicum × Solanum pennellii introgression lines

Antoinette Van Schalkwyk; Peter Wenzl; Sandra Smit; Rosa Lopez-Cobollo; Andrzej Kilian; Gerard J. Bishop; Charles A. Hefer; Dave K. Berger

Marker-trait association studies in tomato have progressed rapidly due to the availability of several populations developed between wild species and domesticated tomato. However, in the absence of whole genome sequences for each wild species, molecular marker methods for whole genome comparisons and fine mapping are required. We describe the development and validation of a diversity arrays technology (DArT) platform for tomato using an introgression line (IL) population consisting of wild Solanumpennellii introgressed into Solanumlycopersicum (cv. M82). A tomato diversity array consisting of 6,912 clones from domesticated tomato and twelve wild tomato/Solanaceous species was constructed. We successfully bin-mapped 990 polymorphic DArT markers together with 108 RFLP markers across the IL population, increasing the number of markers available for each S.pennellii introgression by tenfold on average. A subset of DArT markers from ILs previously associated with increased levels of lycopene and carotene were sequenced, and 44% matched protein coding genes. The bin-map position and order of sequenced DArT markers correlated well with their physical position on scaffolds of the draft tomato genome sequence (SL2.40). The utility of sequenced DArT markers was illustrated by converting several markers in both the S.pennellii and S.lycopersicum phases to cleaved amplified polymorphic sequence (CAPS) markers. Genotype scores from the CAPS markers confirmed the genotype scores from the DArT hybridizations used to construct the bin map. The tomato diversity array provides additional “sequence-characterized” markers for fine mapping of QTLs in S.pennellii ILs and wild tomato species.

BMC Genomics | 2017

Coping with living in the soil: the genome of the parthenogenetic springtail Folsomia candida

Anna Faddeeva-Vakhrusheva; Ken Kraaijeveld; Martijn F. L. Derks; Seyed Yahya Anvar; Valeria Agamennone; Wouter Suring; Andries A. Kampfraath; Jacintha Ellers; Giang Le Ngoc; Cornelis A.M. van Gestel; Janine Mariën; Sandra Smit; Nico M. van Straalen; Dick Roelofs

BackgroundFolsomia candida is a model in soil biology, belonging to the family of Isotomidae, subclass Collembola. It reproduces parthenogenetically in the presence of Wolbachia, and exhibits remarkable physiological adaptations to stress. To better understand these features and adaptations to life in the soil, we studied its genome in the context of its parthenogenetic lifestyle.ResultsWe applied Pacific Bioscience sequencing and assembly to generate a reference genome for F. candida of 221.7 Mbp, comprising only 162 scaffolds. The complete genome of its endosymbiont Wolbachia, was also assembled and turned out to be the largest strain identified so far. Substantial gene family expansions and lineage-specific gene clusters were linked to stress response. A large number of genes (809) were acquired by horizontal gene transfer. A substantial fraction of these genes are involved in lignocellulose degradation. Also, the presence of genes involved in antibiotic biosynthesis was confirmed. Intra-genomic rearrangements of collinear gene clusters were observed, of which 11 were organized as palindromes. The Hox gene cluster of F. candida showed major rearrangements compared to arthropod consensus cluster, resulting in a disorganized cluster.ConclusionsThe expansion of stress response gene families suggests that stress defense was important to facilitate colonization of soils. The large number of HGT genes related to lignocellulose degradation could be beneficial to unlock carbohydrate sources in soil, especially those contained in decaying plant and fungal organic matter. Intra- as well as inter-scaffold duplications of gene clusters may be a consequence of its parthenogenetic lifestyle. This high quality genome will be instrumental for evolutionary biologists investigating deep phylogenetic lineages among arthropods and will provide the basis for a more mechanistic understanding in soil ecology and ecotoxicology.

Genome Biology and Evolution | 2016

Gene Family Evolution Reflects Adaptation to Soil Environmental Stressors in the Genome of the Collembolan Orchesella cincta

Anna Faddeeva-Vakhrusheva; Martijn F. L. Derks; Seyed Yahya Anvar; Valeria Agamennone; Wouter Suring; Sandra Smit; Nico M. van Straalen; Dick Roelofs

Collembola (springtails) are detritivorous hexapods that inhabit the soil and its litter layer. The ecology of the springtail Orchesella cincta is extensively studied in the context of adaptation to anthropogenically disturbed areas. Here, we present a draft genome of an O. cincta reference strain with an estimated size of 286.8 Mbp, containing 20,249 genes. In total, 446 gene families are expanded and 1,169 gene families evolved specific to this lineage. Besides these gene families involved in general biological processes, we observe gene clusters participating in xenobiotic biotransformation. Furthermore, we identified 253 cases of horizontal gene transfer (HGT). Although the largest percentage of them originated from bacteria (37.5%), we observe an unusually high percentage (30.4%) of such genes of fungal origin. The majority of foreign genes are involved in carbohydrate metabolism and cellulose degradation. Moreover, some foreign genes (e.g., bacillopeptidases) expanded after HGT. We hypothesize that horizontally transferred genes could be advantageous for food processing in a soil environment that is full of decaying organic material. Finally, we identified several lineage-specific genes, expanded gene families, and horizontally transferred genes, associated with altered gene expression as a consequence of genetic adaptation to metal stress. This suggests that these genome features may be preadaptations allowing natural selection to act on. In conclusion, this genome study provides a solid foundation for further analysis of evolutionary mechanisms of adaptation to environmental stressors.

Proceedings of the National Academy of Sciences of the United States of America | 2018

Comparative genomics of the nonlegume Parasponia reveals insights into evolution of nitrogen-fixing rhizobium symbioses

R. van Velzen; Rens Holmer; F. Bu; L.J.J. Rutten; A.L. van Zeijl; Weizhong Liu; Luca Santuari; Q. Cao; Trupti Sharma; Defeng Shen; Yuda Purwana Roswanjaya; T. Wardhani; M. Seifi Kalhor; Joelle Jansen; D.J. van den Hoogen; Berivan Güngör; Marijke Hartog; Jan Hontelez; Jan Verver; Wei-Cai Yang; Elio Schijlen; Rimi Repin; Menno Schilthuizen; M.E. Schranz; Renze Heidstra; Kana Miyata; Elena Fedorova; Wouter Kohlen; A.H.J. Bisseling; Sandra Smit

Significance Fixed nitrogen is essential for plant growth. Some plants, such as legumes, can host nitrogen-fixing bacteria within cells in root organs called nodules. Nodules are considered to have evolved in parallel in different lineages, but the genetic changes underlying this evolution remain unknown. Based on gene expression in the nitrogen-fixing nonlegume Parasponia andersonii and the legume Medicago truncatula, we find that nodules in these different lineages may share a single origin. Comparison of the genomes of Parasponia with those of related nonnodulating plants reveals evidence of parallel loss of genes that, in legumes, are essential for nodulation. Taken together, this raises the possibility that nodulation originated only once and was subsequently lost in many descendant lineages. Nodules harboring nitrogen-fixing rhizobia are a well-known trait of legumes, but nodules also occur in other plant lineages, with rhizobia or the actinomycete Frankia as microsymbiont. It is generally assumed that nodulation evolved independently multiple times. However, molecular-genetic support for this hypothesis is lacking, as the genetic changes underlying nodule evolution remain elusive. We conducted genetic and comparative genomics studies by using Parasponia species (Cannabaceae), the only nonlegumes that can establish nitrogen-fixing nodules with rhizobium. Intergeneric crosses between Parasponia andersonii and its nonnodulating relative Trema tomentosa demonstrated that nodule organogenesis, but not intracellular infection, is a dominant genetic trait. Comparative transcriptomics of P. andersonii and the legume Medicago truncatula revealed utilization of at least 290 orthologous symbiosis genes in nodules. Among these are key genes that, in legumes, are essential for nodulation, including NODULE INCEPTION (NIN) and RHIZOBIUM-DIRECTED POLAR GROWTH (RPG). Comparative analysis of genomes from three Parasponia species and related nonnodulating plant species show evidence of parallel loss in nonnodulating species of putative orthologs of NIN, RPG, and NOD FACTOR PERCEPTION. Parallel loss of these symbiosis genes indicates that these nonnodulating lineages lost the potential to nodulate. Taken together, our results challenge the view that nodulation evolved in parallel and raises the possibility that nodulation originated ∼100 Mya in a common ancestor of all nodulating plant species, but was subsequently lost in many descendant lineages. This will have profound implications for translational approaches aimed at engineering nitrogen-fixing nodules in crop plants.

Explore More