bioRxiv | 2021

Variant Library Annotation Tool (VaLiAnT): an oligonucleotide library design and annotation tool for Saturation Genome Editing and other Deep Mutational Scanning experiments

 
 
 
 
 
 
 
 
 

Abstract


Motivation Recent advances in CRISPR/Cas9 technology allow for the functional analysis of genetic variants at single nucleotide resolution whilst maintaining genomic context (Findlay et al., 2018). This approach, known as saturation genome editing (SGE), is a distinct type of deep mutational scanning (DMS) that systematically alters each position in a target region to explore its function. SGE experiments require the design and synthesis of oligonucleotide variant libraries which are introduced into the genome by homology-directed repair (HDR). This technology is broadly applicable to diverse research fields such as disease variant identification, drug development, structure-function studies, synthetic biology, evolutionary genetics and the study of host-pathogen interactions. Here we present the Variant Library Annotation Tool (VaLiAnT) which can be used to generate saturation mutagenesis oligonucleotide libraries from user-defined genomic coordinates and standardised input files. This software package is intentionally versatile to accommodate diverse operability, with species, genomic reference sequences and transcriptomic annotations specified by the user. Genomic ranges, directionality and frame information are considered to allow perturbations at both the nucleotide and amino acid level. Results Coordinates for a genomic range, that may include exonic and/or intronic sequence, are provided by the user in order to retrieve a corresponding oligonucleotide reference sequence. A user-specified range within this sequence is then subject to systematic, nucleotide and/or amino acid saturating mutator functions, with each discrete mutation returned to the user as a separate sequence, building up the final oligo library. If desired, variant accessions from genetic information repositories, such as ClinVar and gnomAD, that fall within the user-specified ranges, will also be incorporated into the library. For SGE library generation, base reference sequences can be modified to include PAM (Protospacer Adjacent Motif) and protospacer ‘protection edits’ that prevent Cas9 from cutting incorporated oligonucleotide tracts. Mutator functions modify this protected reference sequence to generate variant sequences. Constant regions are designated for non-editing to allow specific adapter annealing for downstream cloning and amplification from the library pool. A metadata file is generated, delineating annotation information for each variant sequence to aid computational analysis. In addition, a library file is generated, which contains unique sequences (any exact duplicate sequences are removed) ready for submission to commercial synthesis platforms. A VCF file listing all variants is also generated for analysis and quality control processes. The VaLiAnT software package provides a novel means to systemically retrieve, mutate and annotate genomic sequences for oligonucleotide library generation. Specific features for SGE library generation can be employed, with other diverse applications possible. Availability and Implementation VaLiAnT is a command line tool written in Python. Source code, testing data, example library input and output files, and executables are available at https://github.com/cancerit/VaLiAnT. A user manual details step by step instructions for software use, available at https://github.com/cancerit/VaLiAnT/wiki. The software is freely available for non-commercial use (see Licence for more details, https://github.com/cancerit/VaLiAnT/blob/develop/LICENSE).

Volume None
Pages None
DOI 10.1101/2021.01.19.427318
Language English
Journal bioRxiv

Full Text