bioRxiv | 2019

High throughput, high fidelity genotyping and de novo discovery of allelic variants at the self-incompatibility locus in natural populations of Brassicaceae from short read sequencing data

 
 
 

Abstract


Plant self-incompatibility (SI) is a genetic system that prevents selfing and enforces outcrossing. Because of strong balancing selection, the genes encoding SI are predicted to maintain extraordinary high levels of polymorphism, both in terms of the number of S-alleles that segregate in SI species and in terms of nucleotide sequence divergence among distinct S-allelic lines. However, because of these two combined features, documenting polymorphism of these genes also presents important methodological challenges that have so far largely prevented the comprehensive analysis of complete allelic series in natural populations, and also precluded the obtention of complete genic sequences for many S-alleles. Here, we present a novel methodological approach based on a computationally optimized comparison of short Illumina sequencing reads from genomic DNA to a database of known nucleotide sequences of the extracellular domain of SRK (eSRK). By examining mapping patterns along the reference sequences, we obtain highly reliable predictions of S-genotypes from individuals collected in natural populations of Arabidopsis halleri. Furthermore, using a de novo assembly approach of the filtered short reads, we obtain full length sequences of eSRK even when the initial sequence in the database was only partial, and we discover new SRK alleles that were not initially present in the database. When including those new alleles in the reference database, we were able to resolve the complete diploid SI genotypes of all individuals. Beyond the specific case of Brassicaceae S-alleles, our approach can be readily applied to other polymorphic loci, given reference allelic sequences are available.

Volume None
Pages None
DOI 10.1101/752717
Language English
Journal bioRxiv

Full Text