[PDF] On Critical Relative Distance of DNA Codes for Additive Stem Similarity

Abstract

We consider DNA codes based on the nearest-neighbor (stem) similarity model which adequately reflects the "hybridization potential" of two DNA sequences. Our aim is to present a survey of bounds on the rate of DNA codes with respect to a thermodynamically motivated similarity measure called an additive stem similarity. These results yield a method to analyze and compare known samples of the nearest neighbor "thermodynamic weights" associated to stacked pairs that occurred in DNA secondary structures.

Full PDF

aa r X i v : . [ c s . I T ] J a n On Critical Relative Distance of DNA Codes forAdditive Stem Similarity

A. D’yachkov, A. Voronina

Department of Probability Theory,Faculty of Mechanics and Mathematics,Moscow State University,Moscow, 119992, Russia,Email: [email protected],[email protected].

A. Macula, T. Renz

Air Force Res. Lab.,IFTC, Rome Research Site,Rome NY 13441, USA,Email: [email protected],[email protected]. and V. Rykov

Department of Mathematics,University of Nebraska at Omaha,6001 Dodge St., Omaha,NE 68182-0243, USA,E-mail: [email protected].

Abstract —We consider DNA codes based on the nearest-neighbor (stem) similarity model which adequately reﬂects the”hybridization potential” of two DNA sequences. Our aim is topresent a survey of bounds on the rate of DNA codes with respectto a thermodynamically motivated similarity measure called anadditive stem similarity. These results yield a method to analyzeand compare known samples of the nearest neighbor ”thermo-dynamic weights” associated to stacked pairs that occurred inDNA secondary structures.

I. I

NTRODUCTION

Single strands of DNA are represented by oriented se-quences with elements from alphabet A , { A, C, G, T } .The reverse-complement (Watson-Crick transformation) of aDNA strand is deﬁned by ﬁrst reversing the order of theletters and then substituting each letter x for its complement ¯ x , namely: A for T, C for G and vice-versa. For exam-ple, the reverse complement of AACG is CGTT . For strand x = ( x x . . . x n − x n ) ∈ A n = { A, C, G, T } n , let e x = (¯ x n ¯ x n − . . . ¯ x ¯ x ) ∈ A n = { A, C, G, T } n (1)denote its reverse complement. If y = e x , then x = e y forany x ∈ A n . If x = e x , then x is called a self reversecomplementary sequence. If x = e x , then a pair ( x , e x ) iscalled a pair of mutually reverse complementary sequences.A (perfect) Watson-Crick duplex is the joining of oppositelydirected x and e x so that every letter of one strand is pairedwith its complementary letter on the other strand in the doublehelix structure, i.e., x and e x are ”perfectly compatible.” How-ever, when two, not necessarily complementary, oppositelydirected DNA strands are ”sufﬁciently compatible,” they tooare capable of coalescing into a double stranded DNA duplex.The process of forming DNA duplexes from single strands isreferred to as DNA hybridization. Crosshybridization occurswhen two oppositely directed and non-complementary DNAstrands form a duplex.In general, crosshybridization is undesirable as it usuallyleads to experimental error. To increase the accuracy andthroughput of the applications listed in [1]-[5], there is a desireto have collections of DNA strands, as large and as mutuallyincompatible as possible, so that no crosshybridization can take place. It is straightforward to view this problem as oneof coding theory [6].DNA nanotechnology often requires collections of DNAstrands called free energy gap codes [7] that will correctly”self-assemble” into Watson-Crick duplexes and do not pro-duce erroneous crosshybridizations. When these collectionsconsist entirely of pairs of mutually reverse complementaryDNA strands they are called

DNA tag-antitag systems [4] and

DNA codes [7]-[13].The best known to date biological model, which is com-monly utilized to estimate hybridization energy is the ”nearest-neighbor” similarity model introduced in [1]. Roughly, itimplies that hybridization energy for any two DNA strandsshould be calculated as a sum of thermodynamic weights of all stems that were formed in the process of hybridization. Stemis deﬁned as a pair of consecutive DNA letters of either ofthe strands, which coalesced with a pair of consecutive DNAletters of the other DNA strand. This biological model leadsto a special similarity function on the space A n .First known to authors constructions of DNA codes weresuggested in [9]-[10]. They were based on conventional Ham-ming distance codes. Some methods of combinatorial codingtheory have been developed [14]-[15] as a means by whichsuch DNA codes can be found. From the very beginning itwas understood that hybridization energy for DNA strandsshould be somehow simulated with the similarity functionfor sequences from A n . But it can be easily noticed, thatHamming similarity does not in the proper degree inherit theidea of ”nearest-neighbor” similarity model. Thus there is nowonder that further exploration activities primarily focused onthe search of appropriate similarity function.One example of such function was proposed in [16], whereit was calculated as the sum of weights of all elements,constituting the longest common Hamming subsequence. Laterattempts included deletion similarity [8], which was earlierintroduced by Levenshtein [17] and block similarity [12]-[13].Both functions are non-additive which allowed for consider-ation of such cases as shifts of DNA sequences along eachother. Nevertheless, all of them still did not catch the point of”nearest-neighbor” similarity model.n 2008 we published our ﬁrst work [18], devoted to thestudy of stem similarity functions. There we considered thesimplest case, when similarity between two sequences from A n is equal to the number of stems in the longest commonHamming subsequence between these two sequences. Thecommon stem is understood as a block of length 2 whichcontains two adjacent elements of both of the initial sequences.In [19], we introduced the concept of an additive stem w -similarity for an arbitrary weight function w = w ( a, b ) > ,deﬁned for all 16 elements ( ab ) ∈ A , called stems. Tocalculate the additive stem w -similarity between two DNAsequences one should add up weights of all stems in thelongest common Hamming subsequence between them (see,below Deﬁnition 1). Finally, our recent works [20]-[21] dealwith non-additive stem w -similarity function, previously in-troduced in [7]. The given model also implies counting theweights of all formed stems between two DNA sequenceswith only difference that these stems are contained not inHamming common subsequence but in subsequence in senseof Levenstein insertion-deletion metric. To ﬁnd more detaileddiscussion of applicability of proposed constructions for mod-eling DNA hybridization assays please refer to work [7].In current report we will summarize main results of [19] instudy of asymptotic behavior of DNA codes maximal size foradditive stem w -similarity function. We will show how theseresults lead to the development of possible criteria called a critical relative w -distance of DNA codes for distinguishingbetween weight samples w ( a, b ) found in different experi-ments. We will also explain, how our consideration promptsthe algorithms for composing DNA ensembles of optimal sizefor the given length of DNA strands.II. A DDITIVE S TEM w -S IMILARITY M ODEL

A. Notations and Deﬁnitions

The symbol , denotes deﬁnitional equalities and the symbol [ n ] , { , , . . . , n } denotes the set of integers from 1 to n .Let w = w ( a, b ) > , a, b ∈ A , be a weight function suchthat w ( a, b ) = w (¯ b, ¯ a ) , a, b ∈ A . (2)Condition (2) means that w ( a, b ) is an invariant function underWatson-Crick transformation. Deﬁnition 1: [7],[19]. For x , y ∈ A n , the number S w ( x , y ) , n − X i =1 s wi ( x , y ) , where s wi ( x , y ) , ( w ( a, b ) if x i = y i = a, x i +1 = y i +1 = b, otherwise , (3)is called an additive stem w -similarity between x and y .Function S w ( x , e y ) is used to model a thermodynamic simi-larity ( hybridization energy ) between DNA sequences x and y .In virtue of (2)-(1) the function S w ( x , y ) = S w ( y , x ) ≤ S w ( x , x ) , x , y ∈ A n (4) In addition, S w ( x , e y ) = S w ( y , e x ) , x , y ∈ A n . (5)Identity (5) implies the symmetry property of hybridizationenergy between DNA sequences x and y [7]-[13]. Example 1:

In [18] we considered constant weights w = w ( a, b ) ≡ , a, b ∈ A , for which the additive stem -similarity S ( x , y ) , ≤ S ( x , y ) ≤ S ( x , x ) = n − , is the above-mentioned number of stems in the longest common Hammingsubsequence between x and y . Example 2:

Table 1 shows a biologically motivated collec-tion of weights w ( a, b ) , U ( a, b ) called [2] uniﬁed weights : U ( a, b ) b = A b = C b = G b = Ta = A a = C a = G a = T U ( a, b ) , 1998.The given values U ( a, b ) are based on weight samples whichcome from [2] and [5] and are the nearest neighbor ”thermo-dynamic weights” (e.g., free energy of formation) associatedto stacked pairs that occurred in DNA secondary structures.See [3] for an introduction to the nearest neighbor model.Taking into account inequality (4), we give Deﬁnition 2: [7],[19]. The number D w ( x , y ) , S w ( x , x ) − S w ( x , y ) = n − X i =1 η wi ( x , y ) ,η wi ( x , y ) , s wi ( x , x ) − s wi ( x , y ) ≥ , (6)is called an additive stem w -distance between x , y ∈ A n .Let x ( j ) , ( x ( j ) x ( j ) . . . x n ( j )) ∈ A n , j ∈ [ N ] , be codewords of a q -ary code X = { x (1) , x (2) , . . . , x ( N ) } of length n and size N , where N = 2 , , . . . is an even number.Let D , < D ≤ max x ∈A n S w ( x , x ) , be an arbitrary positivenumber. Deﬁnition 3: [7],[19]. A code X is called a DNA code ofdistance D for additive stem w -similarity (1) (or a ( n, D ) w -code ) if the following two conditions are fulﬁlled. ( i ) . Forany integer j ∈ [ N ] , there exists j ′ ∈ [ N ] , j ′ = j , such that x ( j ′ ) = g x ( j ) = x ( j ) . In other words, X is a collection of N/ pairs of mutually reverse complementary sequences. ( ii ) . Theminimal w -distance of code X is D w ( X ) , min j = j ′ D w ( x ( j ) , x ( j ′ )) ≥ D. (7)Let N w ( n, D ) be the maximal size of DNA ( n, D ) w -codesfor distance (2). If d > is a ﬁxed number, then R w ( d ) , lim n →∞ log N w ( n, nd ) n , d > , (8)is called a rate of DNA ( n, nd ) w -codes for the relativedistance d > . . ConstructionTheorem 1: If n = 2 t + 1 , t = 1 , , . . . , then N ( n, n −

1) = 16 . Proof:

Codewords of ( n, n − -code should not con-tain any common stems with each other. Note, that |A | = 16 and hence for any ( n, n − -code X = { x (1) , . . . x ( N ) }| { ( x ( u ) x ( u )) , u ∈ [ N ] } | ≤ |A | = 16 . Thus, N ( n, n − ≤ . Obviously, for odd n , the set A n doesn’t contain self reversecomplementary words. For stem a = ( a a ) ∈ A , deﬁne x ( a ) = ( a a a a . . . a a a a ) ∈ A n . Code X r = { x ( a ) , a ∈ A } , | X r | = 4 = 16 constitute a DNA ( n, n − -code of size for additive stem -similarity. Theorem 1 is proved. Example 3:

For instance, if n = 5 , D = n − , then pairs of mutually reverse complementary codewords of code X r are: ( AAAAA, T T T T T ) , ( ACACA, T GT GT ) , ( CCCCC, GGGGG ) , ( CACAC, GT GT G ) , ( AGAGA, T CT CT ) , ( AT AT A, T AT AT ) , ( CGCGC, GCGCG ) , ( CT CT C, GAGAG ) . Remark 1:

Note that for any weight function w , the additivestem w -similarity S w ( x ( a ) , x ( b )) = 0 , a , b ∈ A , a = b .Hence, the minimal w -distance (7) of code X r is D w ( X r ) = min j S w ( x ( j ) , x ( j )) ≥ t · w, where w = min a,b ∈A w ( a, b ) . Thus, for any weight function w ,the code X r is also a ( n, ( n − · w ) w -code. For example,for the additive stem U -similarity of Example 2, the number D U ( X r ) = 2 t . Therefore, the code X r is a ( n, n − U -code. C. Bounds on Rate R w ( d ) Let p , { p ( a, b ) , a, b ∈ A} be an arbitrary joint probabilitydistribution on the set of stems ( ab ) ∈ A , i.e., X a,b ∈A p ( a, b ) = 1 , p ( a, b ) ≥ for any a, b ∈ A . To describe bounds on the rate R w ( d ) , we will considerjoint probability distributions p , such that the correspondingmarginal probabilities coincide, i.e., for any a ∈ A p ( a ) , X b ∈A p ( a, b ) = X b ∈A p ( b, a ) , p ( a ) > (9)and, in addition, function p ( a, b ) , as well as weight func-tion (2), is invariant under Watson-Crick transformation, i.e., p ( a, b ) = p ( b, a ) for any a, b ∈ A . (10) Let p ( b | a ) , p ( a, b ) p ( a ) , p ( b | a ) , p ( b, a ) p ( a ) denote the corresponding conditional probabilities. It is easyto check, that for distributions p with properties (9)-(10), andfor the corresponding conditional probabilities, the followingequalities hold true for any a, b ∈ A : p ( a ) = p ( a ) = p ( a ) = p ( a ) , p ( b | a ) = p ( b | a ) . (11)For a ﬁxed weight function (2), introduce values T w , max ( ) T w ( p ) ,T w ( p ) , X a,b ∈A (cid:0) p ( a, b ) − p ( a, b ) (cid:1) w ( a, b ) , (12)where the maximum is taken over all distributions p forwhich condition (9) hold true. Note, that if weight function isinvariant under Watson-Crick transformation, then maximizingdistribution of (II-C) will satisfy conditions (10)-(11).Applying an analog of the conventional Plotkin bound [6],one can prove Theorem 2: [19] If d ≥ T w , then R w ( d ) = 0 .Let x = ( x x . . . x n ) ∈ A n be the stationary Markov chainwith initial distribution p ( a ) , a ∈ A , and transition matrix P = k p ( b | a ) k , a, b ∈ A , i.e. Pr { x i = a } , p ( a ) , Pr { x i +1 = b | x i = a } , p ( b | a ) (13)for any a, b ∈ A and i ∈ [ n − .Let a distribution p satisfy (9) and let also the following Markov condition M be fulﬁlled: transition matrix P mustdeﬁne such Markov chain x = ( x x . . . x n ) , that for any pairof states a, b ∈ A there exists an integer m ∈ [4] such thatthe conditional probability Pr { x m +1 = b | x = a } > . Theorem 3: [19] For any probability distribution p , satis-fying condition ( ) and Markov condition M , and any relativedistance d , < d < T w ( p ) , the rate R w ( d ) > .Theorem 2 is established using the ensemble of randomcodes where independent codewords x = ( x x . . . x n ) are identically distributed in accordance with the Markovchain (13) and, in virtue of (11), the corresponding reversecomplement codewords e x = (¯ x n ¯ x n − . . . ¯ x ¯ x ) have the samedistribution (13) as well. In addition, the proof of Theorem 2is based on the Perron-Frobenius theorem (see [22], Theo-rem 3.1.1).Let T w ( p ) be deﬁned by (II-C) and T M w , max ( ) , M T w ( p ) . (14)If T w = T M w , then the corresponding weight function w = w ( a, b ) is called regular , and non-regular otherwise. If aweight function w = w ( a, b ) is regular, then T w is called the critical relative distance of ( n, dn ) w -codes.From Theorem 2 and 3 it follows orollary 1: [19] If a weight function w = w ( a, b ) isregular, then the maximal size of ( n, nd ) w -codes increasesexponentially with increasing n if and only if < d < T w . Remark 2:

Results of Theorem 2 prompts an idea, that theconstruction of optimal random DNA codes for additive stem w -similarity should be based on generation of independentMarkov chains with transition matrix P and initial distribution p ( a ) , such that corresponding distribution p affords maximumin (14).III. W EIGHT S AMPLE A NALYSIS B ASED ON C RITERIONOF C RITICAL R ELATIVE D ISTANCE

In this section, we will discuss samples of weight function (or, brieﬂy, weight samples ) w = w ( a, b ) , a, b ∈ A , takenfrom SantaLucia (1998) (see Table 1 in [2]). In Tables 2-8, wepresent weights w ( A, A ) = w ( T, T ) and samples of relative weights e w ( a, b ) with respect to w ( A, A ) , i.e., for any a, b ∈ A , e w = e w ( a, b ) , w ( a, b ) w ( A, A ) , e w ( a, b ) = e w (¯ b, ¯ a ) . (15)Pure numbers e w ( a, b ) are comfortable for a mutual comparisonand for the comparison with uniﬁed weights of Table 1. w ( A, A ) = 0 . b = A b = C b = G b = Ta = A .

00 2 . . . a = C .

32 2 .

84 3 . . a = G .

16 3 .

81 2 .

84 2 . a = T .

51 2 .

16 2 .

32 1 . Table 2: Gotoh, 1981. w ( A, A ) = 0 . b = A b = C b = G b = Ta = A . . .

52 0 . a = C .

54 1 .

84 2 .

24 1 . a = G .

40 2 .

20 1 . . a = T .

85 1 .

40 1 .

54 1 . Table 3: Vologodskii, 1984. w ( A, A ) = 0 . b = A b = C b = G b = Ta = A .

00 1 .

69 1 .

75 0 . a = C .

78 2 .

31 2 .

79 1 . a = G . .

76 2 .

31 1 . a = T . . .

78 1 . Table 4: Blake, 1991. w ( A, A ) = 0 . b = A b = C b = G b = Ta = A . .

63 1 .

11 0 . a = C .

35 1 .

80 1 .

77 1 . a = G .

68 2 .

62 1 .

80 1 . a = T .

75 1 .

68 1 . . Table 5: Benight, 1992. w ( A, A ) = 1 . b = A b = C b = G b = Ta = A . .

40 1 .

14 0 . a = C .

35 1 .

74 2 .

05 1 . a = G .

43 2 .

24 1 .

74 1 . a = T .

59 1 .

43 1 . . Table 6: SantaLucia, 1996. w ( A, A ) = 1 . b = A b = C b = G b = Ta = A . .

25 1 . . a = C .

42 1 .

75 2 . . a = G . .

92 1 . . a = T . . .

42 1 . Table 7: Sugimoto, 1996. w ( A, A ) = 1 . b = A b = C b = G b = Ta = A . . .

81 0 . a = C .

08 1 .

66 1 .

98 0 . a = G .

85 1 .

70 1 . . a = T .

46 0 .

85 1 .

08 1 . Table 8: Breslauer, 1986.

A. Analysis of Tables 1-8 for Additive e w -DistanceAnalysis of Table 1 and Tables 3-7: The given weightsamples are regular and the maximum in (II-C) is attainedwhen p ( a, b ) = 0 if stem ( ab ) ∈ L , where the set L offorbidden stems in the Markov chain (13) maximizing (II-C)has the form L , { ( AT ) , ( T A ) , ( AA ) , ( T T ) } . (16)Below, in Table 1’ and Tables 3’-7’, we present the estimatedvalues of joint probabilities p ( a, b ) and marginal probabilities p ( a ) for which the maximum in (II-C) is attained. Values ofthe critical relative distance T e w are given as well. p ( a, b ) b = A b = C b = G b = T p ( a ) a = A . . . a = C . . . . . a = G . . . . . a = T . . . Table 1’: Uniﬁed weights U ( a, b ) . T U = 1 . . p ( a, b ) b = A b = C b = G b = T p ( a ) a = A . . . a = C . . . . . a = G . . . . . a = T . . . Table 3’: Vologodskii, 1984. T e w = 1 . . p ( a, b ) b = A b = C b = G b = T p ( a ) a = A . . . a = C . . . . . a = G . . . . . a = T . . . Table 4’: Blake, 1991. T e w = 1 . . ( a, b ) b = A b = C b = G b = T p ( a ) a = A . . . a = C . . . . . a = G . . . . . a = T . . . Table 5’: Benight, 1992. T e w = 1 . . p ( a, b ) b = A b = C b = G b = T p ( a ) a = A . . . a = C . . . . . a = G . . . . . a = T . . . Table 6’: SantaLucia, 1996. T e w = 1 . . p ( a, b ) b = A b = C b = G b = T p ( a ) a = A . . . a = C . . . . . a = G . . . . . a = T . . . Table 7’: Sugimoto, 1996. T e w = 1 . . Analysis of Table 2:

The given weight sample is regularand the maximum in (II-C) is attained when p ( a, b ) = 0 ifstem ( ab ) ∈ L , where the set L of forbidden stems in theMarkov chain (13) maximizing (II-C) has the form L = { ( AT ) , ( T A ) , ( AA ) , ( T T ) , ( AG ) , ( CT ) } . (17)Below, in Table 2’, we present the estimated values of joint p ( a, b ) and marginal p ( a ) probabilities for which the max-imum in (II-C) is attained. The estimated value of criticalrelative distances T e w = 2 . is given as well. p ( a, b ) b = A b = C b = G b = T p ( a ) a = A . . a = C . . . . a = G . . . . . a = T . . . Table 2’: Gotoh, 1981. T e w = 2 . . Analysis of Table 8:

The given weight sample e w is anon-regular weight sample because the maximum in (II-C) isattained (with the maximal value T e w = 1 . ) for probabilitydistribution p ′ ( a, b ) , ( ab ) ∈ A , which does not satisfy Markov condition M and has the form: p ′ ( a, b ) b = A b = C b = G b = T p ′ ( a ) a = A . . a = C . . . a = G . . . a = T . . Table 8’: Breslauer, 1986. T ′ e w = 1 . .This implies that for weight sample e w from Table 8, we cannotestimate the critical relative distance of optimal DNA codesbased on additive stem e w -similarity. B. Conclusion

For regular weight samples from Tables 2-7 (T2-T7), thedescriptive analysis and comparison of critical parameters aresummarized as follows:T2 T3 T4 T5 T6 T7

L L L L L L L T e w .

60 1 .

61 1 .

97 1 .

58 1 .

55 1 . ,where the corresponding set L ( L = L or L = L ) offorbidden stems in codewords of optimal DNA codes, forwhich the critical relative distance T e w can be attained, isdeﬁned by (16) or by (17).R EFERENCES[1] K. J. Breslauer, R. Frank, H. Blocker, L. A. Markey, ”Predicting DuplexDNA Stability from the Base Sequence,”

Proc. National Academy ofSciences USA , vol. 83, pp. 3746–3750, 1986.[2] J. SantaLucia, ”A uniﬁed view of polymer, dumbbell, and oligonu-cleotide DNA nearest-neighbor thermodynamics,”

Proc. National Aca-demy of Sciences USA , vol. 95, pp. 1460–1465, 1998.[3] M. Zuker, D. Mathews, D. Turner, ”Algorithms and Thermodynamicsfor RNA Secondary Structure Prediction: A Practical Guide,” in

RNABiochemistry and Biotechnology , J. Barciszewski & B. F. C. Clark, Eds.NATO ASI Series, Kluwer Academic Publishers, 1999.[4] L. Kaderali, A. Deshpande, J. Nolan, P. White, ”Primer-design formultiplexed genotyping,”

Nucleic Acids Res. , vol. 31, pp. 1796–1802,2003.[5] J. SantaLucia, D. Hicks, ”The thermodynamics of DNA structuralmotifs,”

Annu. Rev. Biophys. Biomol. Struct. , vol. 33, pp. 415–440, 2004.[6] F. J. MacWilliams, N. J. A. Sloane,

The Theory of Error-correctingCodes , Amsterdam, The Netherlands: North Holland, 1977.[7] M. A. Bishop, A. G. D’yachkov, A. J. Macula, T. E. Renz, V. V. Rykov,”Free Energy Gap and Statistical Thermodynamic Fidelity of DNACodes,”

Journal of Computational Biology , vol. 14, n. 8, pp. 1088–1104, 2007.[8] A. G. D’yachkov, P. A. Vilenkin, D. C. Torney, P. S. White, ”Reverse-Complement Similarity Codes for DNA Sequences,” // in

Proc. 2000IEEE Int. Symp. Information Theory , Sorrento, Italy, 2000, pp. 330.[9] V.V. Rykov , A.J. Macula , C.M.Korzelius, D.C. Engelhart, D.C. Torney,P.C. White, ”DNA Sequences Constructed on the Basis of QuaternaryCyclic Codes”.

Proceedings of 4-th World Multiconference on Sys-temics, Cybernetics and Informatics , Orlando, Florida, USA, July 2000.[10] A. Marathe, A. E. Condon, R. M. Corn, ”On combinatorial DNAdesign,”

J. Comp. Biol. , vol. 8, pp. 201–219, 2001.[11] A. G. D’yachkov, P. L. Erdos, A. J. Macula, V. V. Rykov, D. C. Torney,C. S. Tung, P. A. Vilenkin, P. S. White, ”Exordium for DNA Codes,”

J.Comb. Optimization , vol. 7, n. 4, pp. 369–379, 2003.[12] A. G. D’yachkov, A. J. Macula, T. E. Renz, P. A. Vilenkin, I. K. Is-magilov, ”New Results on DNA Codes,” in

Proc.

IEEE Int.Symp. Information Theory , Adelaide, South Australia, Australia, 2005,pp. 283–288.[13] A. G. D’yachkov, A. J. Macula, D. C. Torney, P. A. Vilenkin, P. S. White,I. K. Ismagilov, R. S. Sarbayev, ”On DNA Codes,”

Problems ofInformation Transmission , vol. 41, n. 4, pp. 349–367, 2005.[14] O. Milenkovic, N. Kashyap, ”New Constructions of Codes for DNAcomputing,”

Proc. 2005 International Workshop on Coding andCryptography (WCC 2005) , Bergen, Norway, 2005, pp. 204-213.[15] T. Abualrub, A. Ghrayeb, X. N. Zeng, ”Construction of cyclic codesover GF (4) for DNA computing,” Journal of the Franklin Institute ,vol. 343, n. 4-5, pp. 448–457, 2006.[16] A. G. D’yachkov, D. C. Torney, ”On similarity codes,”

IEEE Trans.Inform. Th. , vol. 46, n. 4, pp. 1558–1664, 2000.[17] V. I. Levenshtein, ”Efﬁcient Reconstruction of Sequences from TheirSubsequences and Supersequences,”

J. Comb. Th., Ser. A , vol. 93,pp. 310–332, 2001.[18] A. G. D’yachkov, A. N. Voronina, ”DNA Codes Based on StemHamming Similarity,” in

Proc. 11th Int. Workshop Algebraic and Com-binatorial Coding Theory , Pamporovo, Bulgaria, 2008, pp. 85–91.19] A. G. D’yachkov, A. N. Voronina, ”DNA Codes for Additive Stem Simi-larity,”

Problems of Information Transmission , vol. 45, n. 2, pp. 348–367,2009.[20] A. G. D’yachkov, A. J. Macula, T. E. Renz, V. V. Rykov, ”RandomCoding Bounds for DNA Codes Based on Fibonacci Ensembles ofDNA Sequences,” in , Toronto,Canada, 2008, pp. 2292–2296.[21] A. G. D’yachkov, A. N. Voronina, A. J. Macula, T. E. Renz, V. V. Rykov,”DNA Codes for the Nearest-Neighbor Similarity,” submitted to

IEEETrans. Inform. Th. .[22] Dembo, A., Zeitouni, O.,