bioRxiv | 2019

Structure of the space of taboo-free sequences

 
 

Abstract


Models of sequence evolution typically assume that all sequences are possible. However, restriction enzymes that cut DNA at specific recognition sites provide an example where carrying a recognition sequence can be lethal. Motivated by this observation, we studied the set of strings over a finite alphabet with taboos, that is, with prohibited substrings. The taboo-set is referred to as and any allowed string as a taboo-free string. We consider the graph whose vertices are taboo-free strings of length n and whose edges connect two taboo-free strings if their Hamming distance equals 1. Any (random) walk on this graph describes the evolution of a DNA sequence that avoids deleterious taboos. We describe the construction of the vertex set of . Then we state conditions under which and its suffix subgraphs are connected. Moreover, we provide a simple algorithm that can determine, for an arbitrary , if all these graphs are connected. We concluded that bacterial taboo-free Hamming graphs are nearly always connected, although 4 properly chosen taboos are enough to disconnect one of its suffix subgraphs.

Volume None
Pages None
DOI 10.1101/824847
Language English
Journal bioRxiv

Full Text