bioRxiv | 2021

High-throughput Interpretation of Killer-cell Immunoglobulin-like Receptor Short-read Sequencing Data with PING

 
 
 
 
 
 
 
 
 
 

Abstract


The killer-cell immunoglobulin-like receptor (KIR) complex on chromosome 19 encodes receptors that modulate the activity of natural killer cells, and variation in these genes has been linked to infectious and autoimmune disease, as well as having bearing on pregnancy and transplant outcomes. The medical relevance and high variability of KIR genes makes short-read sequencing an attractive technology for interrogating the region, providing a high-throughput, high-fidelity sequencing method that is cost-effective. However, because this gene complex is characterized by extensive nucleotide polymorphism, structural variation including gene fusions and deletions, and a high level of homology between genes, its interrogation at high resolution has been thwarted by bioinformatic challenges, with most studies limited to examining presence or absence of specific genes. Here, we present the PING (Pushing Immunogenetics to the Next Generation) pipeline, which incorporates empirical data, novel alignment strategies and a custom alignment processing workflow to enable high-throughput KIR sequence analysis from short-read data. PING provides KIR gene copy number classification functionality for all KIR genes through use of a comprehensive alignment reference. The gene copy number determined per individual enables an innovative genotype determination workflow using genotype-matched references. Together, these methods address the challenges imposed by the structural complexity and overall homology of the KIR complex. To determine copy number and genotype determination accuracy, we applied PING to European and African validation cohorts and a synthetic dataset. PING demonstrated exceptional copy number determination performance across all datasets and robust genotype determination performance. Finally, an investigation into discordant genotypes for the synthetic dataset provides insight into misaligned reads, advancing our understanding in interpretation of short-read sequencing data in complex genomic regions. PING promises to support a new era of studies of KIR polymorphism, delivering high-resolution KIR genotypes that are highly accurate, enabling high-quality, high-throughput KIR genotyping for disease and population studies. Author summary Killer cell immunoglobulin-like receptors (KIR) serve a critical role in regulating natural killer cell function. They are encoded by highly polymorphic genes within a complex genomic region that has proven difficult to interrogate owing to structural variation and extensive sequence homology. While methods for sequencing KIR genes have matured, there is a lack of bioinformatic support to accurately interpret KIR short-read sequencing data. The extensive structural variation of KIR, both the small-scale nucleotide insertions and deletions and the large-scale gene duplications and deletions, coupled with the extensive sequence similarity among KIR genes presents considerable challenges to bioinformatic analyses. PING addressed these issues through a highly-dynamic alignment workflow, which constructs individualized references that reflect the determined copy number and genotype makeup of a sample. This alignment workflow is enabled by a custom alignment processing pipeline, which scaffolds reads aligned to all reference sequences from the same gene into an overall gene alignment, enabling processing of these alignments as if a single reference sequence was used regardless of the number of sequences or of any insertions or deletions present in the component sequences. Together, these methods provide a novel and robust workflow for the accurate interpretation of KIR short-read sequencing data.

Volume None
Pages None
DOI 10.1101/2021.03.24.436770
Language English
Journal bioRxiv

Full Text