[PDF] DNA codes over two noncommutative rings of order four

Abstract

In this paper, we describe a new type of DNA codes over two noncommutative rings E and F of order four with characteristic 2. Our DNA codes are based on quasi self-dual codes over E and F. Using quasi self-duality, we can describe fixed GC-content constraint weight distributions and reverse-complement constraint minimum distributions of those codes.

Full PDF

aa r X i v : . [ c s . I T ] F e b DNA codes over two noncommutative rings of order four

Jon-Lark Kim a, ˚ , Dong Eun Ohk a a Department of Mathematics, Sogang University, Seoul 04107, South Korea

Abstract

DNA codes based on error-correcting codes have been successful in DNA-based computation and storage. Since there are four nucleobases in DNA,two well known algebraic structures such as the ﬁnite ﬁeld GF p q and theinteger modular ring Z have been used. However, due to various possibilitiesof DNA sequences, it is natural to ask whether there are other algebraicstructures consisting of four elements.In this paper, we describe a new type of DNA codes over two noncom-mutative rings E and F of order four with characteristic 2. Our DNA codesare based on quasi self-dual codes over E and F . Using quasi self-duality, wecan describe ﬁxed GC-content constraint weight distributions and reverse-complement constraint minimum distributions of those codes. Keywords:

Coding theory, DNA codes, Quasi-self dual codes, codes overrings

1. Introduction

L. Adleman [1] performed the computation using DNA strands to solve aninstance of the Hamiltonian path problem giving birth to DNA computing.Since then, DNA computing and DNA storage have been developed. Thisdevelopment requires several theories for the construction of DNA sequencessatisfying various constraints. Algebraic coding theory has contributed toconstruct DNA codes with constraints [13].Most constructions of DNA codes use linear codes over ﬁnite ﬁeld withfour elements GF p q or the integer modular ring Z , both of which are well-known commutative rings of order four. It is natural to consider whether ˚ Corresponding author

Email address: [email protected] (Jon-Lark Kim)

Preprint submitted to Elsevier February 16, 2021 here are other ﬁnite rings of order four which might produce a new classof DNA codes. According to literature [7], B. Fine classiﬁed the 11 ﬁniterings of order four. We observe that the reverse-complement condition ofDNA sequences can be translated as a product of multiplication in a ringand prove that exactly eight rings of order four out of 11 can be studied.In fact, there are some linear codes over rings of order four which areneither GF p q nor Z [2], [3], [4]. In particular, we construct DNA codessatisfying constraints over the two noncommutative rings E and F in thenotation of [7].This paper consists of six sections. In Section 2, we introduce DNA codesand some deﬁnitions of DNA codes. All ﬁnite rings of order four will becovered and we deﬁne some generalized maps on DNA codes. In Section 3and 4, we construct Quasi self-dual (QSD) DNA codes based on the QSDcodes over E which was considered in [3]. We also calculate important valuesof QSD DNA codes over E , including the number of inequivalent codes, theGC-weight distribution and the minimum distance of a ﬁxed GC-contentssubcode on reverse-complement constraints. In Section 5, we also deﬁne QSDDNA codes over F and compute GC-weight distributions. The classiﬁcationof QSD DNA codes with n ď

2. Preliminaries

DNA coding theory is concerned about designing nucleic acid systemsusing error-correcting codes. DNA, deoxyribo nucleic acid, is a moleculecomposed of double strands built by paring the four units, Adenine, Thymine,Guanine, and Cytosine, denoted by

A, T, G and C respectively, which arecalled nucleotides. These nucleotides are joined in chains which are boundtogether with hydrogen bounds. A and T have 2 hydrogen bonds while G and C have 3 hydrogen bonds. Thus these joints make complementarybase pairings which are t A, T u and t G, C u . It is called the Watson-Crickcomplement . We denote it by A C “ T and G C “ C , or equivalently T C “ A and C C “ G . So this complement map is a bijection on the set t A, T, G, C u .A DNA sequence is a sequence of the nucleotides. The ends of a DNAsequence are chemically polar with 5 and 3 ends, which implies that thestrands are oriented. Given a sequence with the orientation 5 Ñ , thereverse complementary is involved naturally. For instance, a DNA sequence2 ´ T CGGCAACAT G ´ has its complement 3 ´ AGCCGT T GT AC ´ .If we arrange these sequences to have the same orientation, then there aretwo sequences5 ´ T CGGCAACAT G ´ and 5 ´ CAT GT T GCCGA ´ .Note that one sequence is the reverse complement of the other.A DNA sequence means one strand of DNA. The set of DNA strands areneeded for DNA computing. Thus we deﬁne the DNA code as a ﬁxed set ofsequences consisting of A, T, G, C , which are also called codewords.

Deﬁnition 1. An p n, M q DNA code C is a set of codewords of length n , size M over four alphabets, A, T, G, C . A DNA codeword, or a DNA sequence isa codeword of a DNA code.

In general, a DNA code does not need to have algebraic structures. How-ever, in some DNA computation and DNA storage, an error-correction isrequired. Furthermore to use Algebraic coding theory we expect that theset of DNA sequences has an algebraic structure. Therefore in DNA codingtheory, we identify the set t A, T, G, C u with an order 4 ring. Since there are4 types of nucleotides, DNA codes can be constructed from algebraic struc-tures of rings with 4 elements such as the ﬁnite ﬁeld GF p q or the ﬁnite ring Z . In fact, any code over 4 elements can be a DNA code, but it will bediﬃcult to analyze properties. Deﬁnition 2.

Let x “ p x x ¨ ¨ ¨ x n q be given ( i.e., x i P t

A, T, G, C u ). The reverse of x , denoted by x R , is the codeword p x n x n ´ ¨ ¨ ¨ x q . The complement of x , denoted by x C , is the codeword p x C x C ¨ ¨ ¨ x nC q . The reverse complement of x is x RC “ p x R q C “ p x C q R . We can easily check that p x R q C “ p x C q R for any DNA sequence x. Usingthese deﬁnitions, we can give constraints on DNA codes. Deﬁnition 3.

Let C be a DNA code, and d H be the Hamming distance. The code C has the reverse constraint if there exists d ě such that d H p x R , y q ě d for all x, y P C . The code C has the reverse-complement constraint if there exists d ě such that d H p x RC , y q ě d for all x, y P C . Note that the reverse map R : F n Ñ F n is not a linear map ( F is a4-element ring). This reverse map is a permutation, so that it is not indepen-dent of the permutation equivalence. For example, let C “ t AT T C, CGGA u .Then p AT T C q R “ p CT T A q , p CGGA q R “ p AGGC q , so that d H p x R , y q ě x, y P C . A permuted code C “ t AT CT, CGAG u has min t d H p x R , y qu “ C R “ t T CT A, GAGC u . So for these reverse constraints, we do notconsider the permutation equivalence.In genetics, it is required to compute the GC -content. The GC -contentis the percentage of G and C in a DNA. Since GC pair is held by 3 hydrogenbonds and AT pair is held by 2 hydrogen bonds, high GC -content DNAs aremore stable than low GC -content DNAs. On the other hand, if GC -contentis too high, then it is diﬃcult to occur DNA replication. Therefore we need toset a proper GC -content. In DNA coding theory, we deﬁne the GC -contentas the number of coordinates of G and C . Deﬁnition 4.

Let C be a DNA code and x be a codeword in C . The GC -content of x is the number of G and C in x . The DNA code C has a ﬁxed GC -content constraint if each codeword in C has the same GC -content. Many codes do not satisfy the ﬁxed GC -content constraint. For the ﬁxed GC -content constraint, we need to calculate the set of codewords which havethe same GC -content. Therefore we need the GC -weight enumerator. Deﬁnition 5.

Let C be a code over 4 elements t a , a , a , a u and C be aDNA code. The complete weight enumerator of the code C , CW E C p w, x, y, z q isdeﬁned by CW E C p w, x, y, z q “ ÿ c P C w n a p c q x n a p c q y n a p c q z n a p c q where n α p c q is the number of occurrences of α P t a , a , a , a u in acodeword c . The GC -weight enumerator of the code C , GCW C p x, y q is the weightenumerator that counts the number of coordinates in t G, C u and t A, T u ,which is deﬁned by GCW C p x, y q “ CW E C p x, x, y, y q “ ÿ c P C x n G p c q x n C p c q y n A p c q y n T p c q We can get the size of a subcode C which has a ﬁxed GC -content using thegiven polynomial GCW C p x, y q . If GCW C p x, y q “ ř a i x i y n ´ i , then the sub-code has the order | C | “ a k where C “ t c P C | c has a ﬁxed GC -content k (or n ´ k ) u . To date, most DNA codes are constructed using GF p q or Z . However,B. Fine classiﬁed rings of order p up to isomorphism and so there are 11ﬁnite rings of 4 elements [7]. It is possible to construct DNA codes usingother ﬁnite rings. The main goal of this paper is to construct DNA codesover other rings except for GF p q and Z , especially the ring E .ring name ring presentation char A x a ; 4 a “ , a “ a y B x a ; 4 a “ , a “ a y C x a ; 4 a “ , a “ y D x a, b ; 2 a “ b “ , a “ a, b “ b, ab “ ba “ y E x a, b ; 2 a “ b “ , a “ a, b “ b, ab “ a, ba “ b y F x a, b ; 2 a “ b “ , a “ a, b “ b, ab “ b, ba “ a y G x a, b ; 2 a “ b “ , a “ , b “ b, ab “ ba “ a y H x a, b ; 2 a “ b “ , a “ , b “ b, ab “ ba “ y I x a, b ; 2 a “ b “ , a “ b, ab “ y J x a, b ; 2 a “ b “ , a “ b “ y K x a, b ; 2 a “ b “ , a “ a, b “ a ` b, ab “ ba “ b y Table 1: Classiﬁcation table of ﬁnite rings of order 4

Proposition 2.1.

The following propositions hold. A – Z and K – GF p q , which have a multiplicative identity. Theother rings do not have a multiplicative identity. a a a a a aa a a a a a a a a a a a + 0 a b c a b ca a c bb b c ac c b a Table 2: Addition tables of the ring of order 4; char 4 and char 2 D – p Z ` Z q – Z r u s{p u ´ u q and G – Z r u s{p u ´ q – Z r u s{p u q . In the above rings, only E and F are non-commutative rings. Any product of two elements in C or J is zero. As mentioned above, most DNA codes are constructed using A – Z or K – GF p q . J. Liand and L. Wang constructed the cyclic DNA codes, usingthe ring F ` u F with u “ F ` u F – G . K. Guenda and T.Gulliver constructed DNA codes over the same ring F ` u F with u “ F r u s{p u ´ q – G [16]. N. Bennenni et alintroduced another cyclic DNA codes over the ring F ` v F with v “ v [5].This ring F ` v F – D . Even though there are some papers algebraic codesover the rings E , H and I , the DNA codes over those rings have not beenconstructed. Thus we focus on the other rings.Before construct DNA codes over rings, we need to deﬁne maps whichcan calculate easily the complement and the GC -content. It means thatthe complement map and the GC -content map should be deﬁned over ﬁniterings. Deﬁnition 6.

Let R be a ring of order 4 and f : t A, T, C, G u Ñ R be aproper representation map. It means f is bijective. A complement map φ over R is a bijection deﬁned by φ p f p x qq “ f p x C q . We can check that φ p x q ‰ x and φ p x q “ x since x C ‰ x and p x C q C “ x .We denote φ by φ p x q “ x C . This φ is a bijection on R, so we can deﬁnethis map φ easily. The question is whether a simple deﬁnition of φ exist. Tobe speciﬁc we want to deﬁne an element α P R satisfying x C “ x ` α . Byaddition table of rings, we can ﬁnd such α in any ﬁnite ring of order 4. Ifthe ﬁnite ring has char 4, deﬁne x C “ x ` a . If the ﬁnite ring has char 2,any element of R can be α . 6et C be an additive code over R which is a ring of order 4 and letx “ p αα ¨ ¨ ¨ α q . We can calculate y C by y C “ y ` x. If the codeword x P C ,then the code C C “ t c C | c P C u is the same as the original code C . So itimplies that the reverse-complement constraint and the reverse constraintare equal in C . Deﬁnition 7.

Let R be a ring of order 4. A GC -content map ψ : R Ñ GF p q is a function deﬁned by ψ p x q “ , if f ´ p x q “ C or G , otherwisewhere f : t A, T, C, G u Ñ R is a bijection. The map ψ can be deﬁned ψ : R Ñ A where A “ t , r u and r p‰ q P R .This deﬁnition can be extended to ψ : R Ñ A ã ÝÑ GF p q . For the GC -contentmap we also want to deﬁne simply as ψ p x q “ βx for some β P R . Proposition 2.2.

We can deﬁne the natural GC -content map over ﬁnitering of order 4, except the ring C, J and K .Proof. Note that this β satisﬁes βx “ βy for some x ‰ y . If we deﬁne ψ p x q “ βx as follows β “ $’’’’’’’’’’’&’’’’’’’’’’’% a, ring A , and 0 C “ aa or 3 a, ring B , and 0 C “ aa, ring D , and 0 C “ ba or b or c, ring E , and 0 C “ ca, ring G , and 0 C “ ab or c, ring H , and 0 C “ aa or c, ring I , and 0 C “ b then the map ψ is well-deﬁned.In the ring F , there is no element β satisfying βx “ βy for some x ‰ y .However, if we deﬁne ψ p x q “ xβ where β “ a or b or c and 0 C “ c , then this ψ is well-deﬁned.Since the rings C and J satisfy x ¨ y “ x, y , so the element β andthe map ψ satisfying ψ p x q “ βx do not exist.7he ring K is a ﬁeld, so there is no element β satisfying βx “ βy forsome x ‰ y , except zero.Therefore the ring C , J and K do not have the natural GC -contentmap.The complement map φ and the GC -content ψ can be deﬁned on a DNAcode C p n, M q and its codeword x “ p x ¨ ¨ ¨ x n q by φ p x q “ p φ p x q ¨ ¨ ¨ φ p x n qq and ψ p x q “ p ψ p x q ¨ ¨ ¨ ψ p x n qq . Therefore φ p C q is another DN A code and ψ p C q is a binary code. From now on let φ and ψ be the map on a code. Notethat d H p ψ p x qq is the number of GC s in the codeword x. So it is natural that ψ is called the GC -content map.

3. Quasi self-dual codes over E Following Alhmadi et al., a quasi self-dual code over the ring E is de-ﬁned [3]. Recall the ring E is deﬁned by two generators a and b with therelation as follows. E “ x a, b | a “ b “ , a “ a, b “ b, ab “ a, ba “ b y .Its multiplication table is given as follows. ˆ a b c a a a b b b c c c Table 3: Multiplication table of the ring of the ring E Since the ring E is noncommutative, we ﬁrst should deﬁne a linear E -code. A linear E -code is a one-sided E -submodule of E n . Deﬁne an innerproduct on E n as p x, y q “ ř ni “ x i y i where x, y P E n . (The product is themultiplication on E .) Deﬁnition 8 ([3]) . Let C be a linear E -code. The right dual C K R of C is the right module C K R “ t y P E n | @ x P C , p x, y q “ u . The left dual C K L of C is the left module C K L “ t y P E n | @ x P C , p y, x q “ u . The code C is left self dual (resp. right self-dual) if C “ C K L (resp. C “ C K R ). The code C is self-dual if it is both left and right self-dual. The code C is self-orthogonal if @ x, y P C , p x, y q “ . A quasi self-dual(QSD) code is a self-orthogonal code of size n . It is local with maximal ideal J “ t , c u , and its residue ﬁeld E { J – GF p q . Thus for any element e P E , we can write e “ as ` ct where s, t P t , u “ GF p q and where a natural action of GF p q on E .Denote by r : E Ñ E { J – GF p q , the map of reduction modulo J . Thus r p q “ r p c q “ r p a q “ r p b q “

1. Then this map r can be the GC -contentmap ψ . Let deﬁne f : t A, T, G, C u Ñ E by f p A q “ , f p T q “ c, f p G q “ a and f p C q “ b . Then r p f p C qq “ r p f p G qq “

1, and the others go to 0. Wecan check ψ p x q “ ax satisﬁes 0 C “ c and ψ p q “ ψ p c q “ ψ p a q “ ψ p b q “ a .Moreover Im p ψ q “ t , a u – GF p q . Therefore ψ – r . So now let ψ “ r .Then it can be extended from E n to GF p q n . And since pairing is given by t A, T u Ñ t , c u and t G, C u Ñ t a, b u , we should deﬁne x C “ x ` c . Deﬁnition 9 ([3]) . Let C be a code of length n over E . The residue code of C is res p C q “ t ψ p y q P GF p q n | y P C u . The torsion code of C is tor p C q “ t x P GF p q n | cx P C u . The both codes are binary code.

Theorem 3.1 ([3]) . If C is a QSD code over E , then C “ a res p C q ‘ c tor p C q as modules. Theorem 3.2 ([3]) . For any QSD E -linear codes C , we have res p C q Ď res p C q K , tor p C q “ res p C q K , | C | “ dim p res p C qq ` dim p tor p C qq . We can construct QSD E -codes by the above theorems.9 heorem 3.3 ([3]) . Let B be a self-orthogonal binary r n, k s code. The code C over the ring E deﬁned by the relation C “ aB ` cB K is a QSD code. Its residue code is B and its torsion code is B K . By above theorem, we know that the classiﬁcation of QSD E -codes isequivalent to the classiﬁcation of their residue codes. Moreover, two QSDcodes C and C are equivalent to permutation if and only if their residuecodes are equivalent to permutation. Therefore we can get the followingtheorem. Corollary 3.4.

Let N p n, k q be the number of inequivalent QSD codes over E where n is the length and k is the dimension of their residue codes. Then N p n, k q “ Ψ p n, k q where Ψ p n, k q is the number of inequivalent binary self-orthogonal codes. We need the classiﬁcation of inequivalent binary self-orthogonal codes.Hou et al., classiﬁed the case when k ď n ď

40 [10]. Pless classiﬁedthe case when k “ n { n ď

20 [15].

Lemma 3.5.

The number of inequivalent binary self-orthogonal r n, k, s codesis the number of inequivalent binary self-orthogonal r n ´ , k ´ s codes.Proof. Let C be a binary self-orthogonal r n, k, s code. Let take x P C with wt p x q “

2. Then we can puncture the positions of nonzero coordinates of x .We can get self-orthogonal r n ´ , k ´ s code. Conversely, we can get r n, k, s codes from r n ´ , k ´ s codes by adding weight 2 extra vector. Obviouslyif two codes are equivalent, then the induced codes are also equivalent.We can calculate the number of self-orthogonal r , s codes and self-orthogonal r , s codes using the lemma and the paper of I. Bouyukliev [6]. Lemma 3.6.

The number of inequivalent binary self-orthogonal r , s codesis 27. The number of inequivalent binary self-orthogonal r , s codes is 48.Proof. Note that the largest minimum distance of binary r , s codes is 5.So we can let d ď d is the minimum distance. Since the binary selforthogonal codes have only even weights, we consider the cases d “ “

4. By the above lemma, the number of self-orthogonal r , , s codesis the number of self-orthogonal r , s codes, that is 15. By the paper of I.Bouyukliev, there exist twelve self-orthogonal r , , s codes [6]. Hence thenumber of self-orthogonal r , s codes is 15 ` “ r , , s codes and r , , s codes. They are 23 and 25, respectively. Even thoughthere exist a r , , s code, it cannot be self-orthogonal. So the number of r , s codes is 23 ` “ n k N

11 12 130 1 2 3 4 5 0 1 2 3 4 5 6 0 1 2 3 41 5 10 14 12 4 1 6 16 26 28 15 3 1 6 16 30 36

13 14 155 6 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 723 6 1 7 23 51 75 61 27 4 1 7 23 58 98 94 48 10

4. Quasi self-dual DNA codes over E Theorem 4.1 ([3]) . Let C be a QSD code over E . Then CW E C p w, x, y, z q “ J p res p C q , tor p C qqp w, x, y, z q where J p A, B q of two binary linear codes A, B is the joint weight enumeratordeﬁned by J p A, B qp w, x, y, z q ÿ u P A,v P B w i p u,v q x j p u,v q y k p u,v q z l p u,v q ,i, j, k, l are the integers of the number of indices ι P t , ¨ ¨ ¨ , n u with p u ι , v ι q “p , q , p , q , p , q and p , q , respectively. heorem 4.2. Let C be a QSD code over E . Then GCW C p x, y q “ n ÿ i “ n ´ k A i p res p C qq x i y n ´ i where n “ | C | , k “ dim p res p C qq and A i p res p C qq is the binary weight distri-bution of res p C q .Proof. By MacWilliams [14], W res p C q p x, y q “ dim p tor p C qq J p res p C q , tor p C qqp x, x, y, y q . So GCW C p x, y q “ CW E C p x, x, y, y q “ J p res p C q , tor p C qqp x, x, y, y q“ dim p tor p C qq W res p C q p x, y q “ n ´ k W res p C q p x, y q . For example, let res p C q “ xp ¨ ¨ ¨ ¨ ¨ ¨ qy , where the codewordhas m ones and n ´ m zeros ( m is even). Then res p C q is a 1-dimensionalcode, therefore GCW C p x, y q “ n ´ x m y n ´ m ` n ´ y n . We can get p n, n ´ q DNA codes which have ﬁxed GC -content constraint m and 0 (resp).We want to give reverse (and reverse-complement) constraints for QSDDNA codes. Since any residue code has zero vector, its QSD DNA code hasthe vector p cc ¨ ¨ ¨ c q . Since the complement map is deﬁned x C “ x ` c inthe ring E , x C P C for any x P C and for any QSD DNA code C . Thenmin t d H p x RC , y q | x, y P C u “

0. Thus we need to give reverse-complementconstraints to a subcode which have ﬁxed a GC -content constraint. Deﬁnition 10.

Let C be a QSD DNA code of length n . Let C m be the subcodeof C , which has ﬁxed GC -content constraint m . This C m has permutationequivalence codes, P m “ t σ p C m q | σ P S n u . Deﬁne d mRC : “ max C P P m t d C u where d C “ min t d H p x RC , y q | x, y P C u .

12t is clear that d RC “ p cc ¨ ¨ ¨ c q in C . Theorem 4.3.

Let C be a QSD code over E satisfying res p C q “ x a y “xp ¨ ¨ ¨ ¨ ¨ ¨ qy where d H p a q “ m ( m is even). Then d mRC “ t m, n ´ m u .Proof. Let take σ , σ P S n . Suppose that d H p σ p a q , σ p a q R q ď d H p σ p a q , σ p a q R q . Denote σ p a q “ p x ¨ ¨ ¨ x n q where x i P GF p q . Then d H p σ p a q , σ p a q R q “ the number of x i s , where x i ‰ x n ´ i . We claim that d σ p C m q “ d H p σ p a q , σ p a q R q .If x j ‰ x n ´ j for some j , then ax j ` ct ‰ ax n ´ j ` ct for any t , t P GF p q (since a p x j ` x n ´ j q “ a ‰ c p t ` t q ). So d H p x RC , y q ě x, y P σ p C m q (For any x, y , it generated by σ p a q and so the counting appears in the j th and p n ´ j q th position). If there exist k coordinates j s which sat-isfy x j ‰ x n ´ j , then d H p x RC , y q ě k for any x, y P σ p C m q . Therefore d σ p C m q “ d H p σ p a q , σ p a q R q . This claim means that we can get the mini-mum distance of the subcode σ p C m q using the distance of σ p a q .Then by the assumption we can get that d σ p C m q ď d σ p C m q . Therefore we need to increase the number of x i ’s satisfying x i ‰ x n ´ i . If m ă n {

2, then we can take σ P S n where σ p a q “ a “ p ¨ ¨ ¨ ¨ ¨ ¨ q sothat there are m positions of x i ’s satisfying x i ‰ x n ´ i . Thus d σ p C m q “ m . If m ě n {

2, then also σ p a q “ a “ p ¨ ¨ ¨ ¨ ¨ ¨ q has n ´ m positions of x i ’ssatisfying x i ‰ x n ´ i so that d σ p C m q “ p n ´ m q . Therefore d mRC “ m, if m ă n { p n ´ m q if m ě n { . If 2 m ă p n ´ m q , then m ă n { d mRC “ m . If 2 m ě p n ´ m q , then m ě n { d mRC “ p n ´ m q . Thus d mRC “ t m, n ´ m u . Theorem 4.4.

Let C be a QSD code over E satisfying res p C q “ Bˆ a a ˙F “ Bˆ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ˙F where d H p a q “ m and d H p a q “ m ( m and m are positive even integers).Let m “ m ` m . Then the following hold. If m “ m , then d m RC “ d m RC “ min t m, p n ´ t n { u q ´ m u and d mRC “ t m, n ´ m u . If m ‰ m , then d m RC “ t m , n ´ m u , d m RC “ t m , n ´ m u and d mRC “ t m, n ´ m u .Note that p n ´ t n { u q ´ m “ n ´ m, if n is even n ´ m ` , if n is odd .Proof. - Case 1. Suppose m “ m “ m { C m is generated by one vector p a ` a q , so by Theorem 4.3, d mRC “ t m, n ´ m u . Assume that m ă n { p a q or p a q can have 2 m “ m positionsof x i ’s satisfying x i ‰ x n ´ i . Thus d m RC “ m “ m .Now assume that n { ď m . Let σ p a q “ p ¨ ¨ ¨ ¨ ¨ ¨ q . By theassumption σ p a q has to form that σ p a q “ p ¨ ¨ ¨ x m ` ¨ ¨ ¨ x n q where x i P GF p q .Let n be even. To avoid the coincidence, we should take x i ’s such that x m ` “ ¨ ¨ ¨ “ x n { “

1. Locate the rest of ones x n ´ m ` n { ` “ ¨ ¨ ¨ “ x n “ d H p σ p a q , p σ p a q R qq “ p n { ´ m q “ n ´ m , d H p σ p a q , p σ p a q R qq “ d H p σ p a q , p σ p a q R qq “ m “ m . Since m ď n , so d m RC “ n ´ m .Next, let n be odd. If we take the same progress as the n even case,we can get d H p σ p a q , p σ p a q R qq “ p t n { u ´ m q “ t n { u ´ m . How-ever, since n is odd, we can let x t n { u ` “

1, which is in σ p a q . In thatcase, d H p σ p a q , p σ p a q R qq “ t n { u ´ m ` d H p σ p a q , p σ p a q R qq “ m , d H p σ p a q , p σ p a q R qq “ m ´

2. Since n { ă m , so t n { u ` ď m . Then2 t n { u ` ď m ` ď m (since 2 ď m ).Therefore d m RC “ t n { u ´ m `

2. Then d m RC can be formed as d m RC “ p n ´ t n { u q ´ m . Thus d m RC “ m, if m ă n { p n ´ t n { u q ´ m, if m ě n { m ă n {

2, then m ă n ´ m ď p n ´ t n { u q´ m . Thus d m RC “ min t m, p n ´ t n { u q ´ m u .- Case 2. Suppose m ‰ m . Then the subcode with ﬁxed GC -contentconstraint m is generated by one vector a . So by Theorem 4.3, d m RC “ t m , n ´ m u . In the same argument, we can get the following results: d m RC “ t m , n ´ m u , and C m is generated by one vector p a ` a q , so d mRC “ t m, n ´ m u . 14 heorem 4.5. Let C be a QSD code over E satisfying res p C q “ Bˆ a a ˙F “ Bˆ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ˙F where d H p a q “ m ` m , d H p a q “ m ` m and d H p a X a q “ m ( m , m and m are positive even integers). Then the following holds. If m , m and m are all distinct, then d m i ` m j RC “ t m i ` m j , n ´p m i ` m j qu for all ď i ‰ j ď . Without loss of generality, let m “ m ‰ m . Then d m ` m RC “ d m RC “ t m , n ´ m u and d m ` m RC “ d m ` m RC is d m ` m RC “ $’&’% p m ` m q if m ` m ă n { n ´ m ´ δ n if n { ď m ` m ă n { ` m p t n { u ´ m q if n { ď m . If m “ m “ m , then d m ` m RC “ d m ` m RC “ d m ` m RC is d m ` m RC “ $’&’% m if m ă n { n ´ m ´ δ n if n { ď m ă n { p t n { u ´ m q if n { ď m . where δ n “ $’&’% if n ” if n ” if n ” .Proof. - Case 1. If m , m and m are all distinct, then the subcodes whichhave ﬁxed GC -content constraint are generated by one vector. So d m i ` m j RC “ t m i ` m j , n ´ p m i ` m j qu is obvious.- Case 2. Suppose m ‰ m “ m . Then this case is obviously the same asthe case m “ m ‰ m . And let a “ a ` a . Then res p C q can be generatedby a and a . Since a “ a ` a , so d H p a q “ d H p a q` d H p a q´ d H p a X a q “ m ` m and d H p a X a q “ m . Therefore the case m ‰ m “ m is thesame as the case m “ m ‰ m .So now suppose that m “ m ‰ m . Then only the code a ` a generates the codeword which has ﬁxed m ` m GC -content. Thus d m RC “ t m , n ´ m u is obvious. 15f m ` m ` m “ m ` m ď n {

2, then we can easily check that d m ` m RC “ t m ` m , n ´ p m ` m qu . Note that 2 m ` m ď n { t m ` m , n ´ p m ` m qu “ p m ` m q . So d m ` m RC “ p m ` m q .Assume that m ` m ` m “ m ` m ą n { m ` m “ m ă n { σ p a q “ p x ¨ ¨ ¨ x n q and σ p a q “ p y ¨ ¨ ¨ y n q where x i , y i P GF p q .Then we can let x “ ¨ ¨ ¨ x m “ x m ` “ ¨ ¨ ¨ “ x m “ y “ ¨ ¨ ¨ “ y m “ y m ` “ ¨ ¨ ¨ “ y m “ n ” x m ` “ ¨ ¨ ¨ “ x n { “ “ y m ` “ ¨ ¨ ¨ “ y n { . Then rest 1’s should belocated in x n { ` , . . . , x n and y n { ` , . . . , y n . Since n ” m ´ p n { ´ m q so it is even. If we let 1’s to one side,the coincidence will be increasing. Thus we can take x n ´ m ´ m { ` n { ` “¨ ¨ ¨ “ x n ´ m ` m { ´ n { “

1. Then the number of 1’s is m ´ n { ` m and the middle point is between n ´ m and n ´ m `

1. In this case d H p σ p a q , σ p a q R q “ n ´ m . The other Hamming distance is not smallerthan n ´ m .If n ” m ´ n { ` m is not even sowe cannot divide into half. So one side has more 1’s, and then the minimumdistance value is decreasing exactly 2. If n ” x t n { u ` “ y t n { u ` “

1. Then the minimum distance value is decreasing exactly 1.Therefore d m ` m RC “ n ´ m ´ δ n .Lastly assume that n { ď m ` m “ m . Denote σ p a q “ p x ¨ ¨ ¨ x n q and σ p a q “ p y ¨ ¨ ¨ y n q . Let x “ ¨ ¨ ¨ “ x m “ y t n { u ` “ ¨ ¨ ¨ “ y t n { u ` m “

1. And let x m ` “ ¨ ¨ ¨ “ x m ` m { “ y t n { u ` m ` “ ¨ ¨ ¨ “ y t n { u ` m “ m { “

1. Then d m ` m RC “ ˆ p m { q ` ˆ p t n { u ´ m ´ m { q “ p t n { u ´ m q .- Case 3. Assume that m “ m “ m . Then we can apply the samemethodas the case 2.For example, let res p C q “ Bˆ ˙F . Then by the formula, d RC “ d RC “

2. By these lemmas, we can calculate some d RC values.The table in the conclusion shows some proper value of d RC for each lengthand dimension of residue codes. The tables of speciﬁc d RC values up to theclassiﬁcation of QSD DNA codes with n ď . Quasi self-dual DNA codes over F The ring F is deﬁned by F “ @ a, b | a “ b “ , a “ a, b “ b, ab “ b, ba “ a D . Thus its multiplication table is given as follows. ˆ a b c a a b cb a b cc Table 4: Multiplication table of the ring of the ring F The ring E and F are not isomorphic. Even though p x, y q E ‰ p x, y q F forinner products, we can deﬁne a QSD DNA code over the ring F similarly.Let a linear F -code be a one-sided F -submodule of F n . Deﬁnition 11.

Let x, y P F n where x “ p x , ¨ ¨ ¨ , x n q and y “ p y , ¨ ¨ ¨ , y n q .Deﬁne an inner product of x, y as p x, y q “ ř x i y i . Let C be a linear F -code. The right dual C K R of C is the right module C K R “ t y P F n | @ x P C , p x, y q “ u . The left dual C K L of C is the left module C K L “ t y P F n | @ x P C , p y, x q “ u . The code C is left self dual (resp. right self-dual) if C “ C K L (resp. C “ C K R ). And the code C is self-dual is it is both of its duals. The code C is self-orthogonal if @ x, y P C , p x, y q “ . A quasi self-dual(QSD) code is a self-orthogonal code of size n . Remark that p x, y q E ‰ p x, y q F as an inner product. However if C is QSDin the ring E , then so is in the ring F . Theorem 5.1.

Let C be a QSD code over the ring E . Then by a map f : E ÞÑ F , f p C q is a QSD code over ring F .Proof. Deﬁne a bijection f : E ÞÑ F by f p a E q “ a F , f p b E q “ b F and f p c E q “ c F . Let C be a QSD code over the ring E . Take x, y P C , denotedby x “ p x , . . . , x n q and y “ p y , . . . , y n q . Then p x, y q E “ ř ni “ p x i , y i q “ x m ,y m “ c p x m , y m q ` ÿ x m ‰ nor c p x m , c q ` ÿ y n ‰ nor c p c, y n q ` ÿ x m ‰ nor cy n ‰ nor c p x m , y n q“ ÿ y n ‰ nor c p c, y n q ` ÿ x m ‰ nor cy n ‰ nor c p x m , y n q “ a terms or b terms in the summation sum x m ‰ nor cy n ‰ nor c p x m , y n q . Therefore the number of coordinates of a ’s and b ’s in y is even. If the numberof a ’s in odd, then p y, y q ‰

0. So both the number of a ’s and the number of b ’s are even. Hence every codeword in C has even a -positions and b -positions.Then n ÿ i “ p f p x i q , f p y i qq “ ÿ x m ‰ nor c p f p x m q , f p c qq ` ÿ x m ‰ ,cy n ‰ nor c p f p x m q , f p y n qq “

0. So we can regard an QSD code over the ring E as an QSD code over thering F . Deﬁnition 12.

Let C be a code of length n over F . The residue code of C is res p C q “ t ψ p y q | y P C u . The torsion code of C is tor p C q “ t x P GF p q n | cx P C u where ψ : F Ñ GF p q is the map ψ p q “ ψ p c q “ and ψ p a q “ ψ p b q “ , or ψ p x q “ xa . The map ψ p x q “ xa has an image t , c u – GF p q so this map ψ is well-deﬁned. Lemma 5.2.

Every element f P F can be written f “ as ` ct where s, t P GF p q . F is isomorphic to the ring E as additive group, F alsohas this decomposition. Corollary 5.3. If C is a QSD code over F , then C “ a res p C q ‘ c tor p C q asmodules. Corollary 5.4.

Let N p n, k q be the number of inequivalent QSD codes over F where n is the length and k is the dimension of their residue codes. Then N p n, k q “ Ψ p n, k q where Ψ p n, k q is the number of inequivalent binary self-orthogonal codes. Corollary 5.5.

Let C be a QSD code over F . Then GCW C p x, y q “ n ÿ i “ n ´ k A i p res p C qq x i y n ´ i where n “ | C | , k “ dim p res p C qq and A i p res p C qq is the binary weight distri-bution of res p C q . Therefore we can check F has the same GC -weight distribution over E .

6. Conclusion

In this paper, we construct QSD DNA codes over E . For each DNA code,the GC -weight enumerator is obtained. This implies the (nonlinear) subcodeswhich have a ﬁxed GC -content. Especially some minimum distances withreverse complement constraint in the ring E are calculated for n ď

8. Thetables of d RC are below. Some values of d RC are computed by MAGMAprogramming. The QSD DNA codes over the ring F is almost same as thecase of the ring E , so we can apply the below tables. References [1] L. M. Adleman, “Molecular computation of solutions to combinatorialproblems,”

Science , 266(5187) (1994): 1021–1024.[2] A. Alahmadi, A. Alkathiry, A. Altassan, W. Basaﬀar, A. Bonnecaze, H.Shoaib, and P. Sol´e, “Type IV codes over a non-local non-unital ring,”

Proyecciones (Antofagasta) , 39(4) (2020): 963–978.19 k Residue code Generator Matrix d mRC n @` ¨ ¨ ¨ ˘D ` c ¨ ¨ ¨ c ˘ d RC “

02 1 @` ˘D ` a a ˘ d RC “

03 1 @` ˘D ˆ a a

00 0 c ˙ d RC “

24 1 @` ˘D ¨˝ a a c

00 0 0 c ˛‚ d RC “

44 1 @` ˘D ¨˝ a a a ac c c c ˛‚ d RC “

04 2 Bˆ ˙F ˆ a a a a ˙ d RC “ ,d RC “

05 1 @` ˘D ¨˚˚˝ a a c c

00 0 0 0 c ˛‹‹‚ d RC “

45 1 @` ˘D ¨˚˚˝ a a a a c c c c c ˛‹‹‚ d RC “

25 2 Bˆ ˙F ¨˝ a a a a

00 0 0 0 c ˛‚ d RC “ ,d RC “

26 1 @` ˘D ¨˚˚˚˚˝ a a c c c

00 0 0 0 0 c ˛‹‹‹‹‚ d RC “ n k Residue code Generator Matrix d mRC @` ˘D ¨˚˚˚˚˝ a a a a c c c c c

00 0 0 0 0 c ˛‹‹‹‹‚ d RC “

46 1 @` ˘D ¨˚˚˚˚˝ a a a a a ac c c c c c c c ˛‹‹‹‹‚ d RC “

06 2 Bˆ ˙F ¨˚˚˝ a a a a c

00 0 0 0 0 c ˛‹‹‚ d RC “ ,d RC “

46 2 Bˆ ˙F ¨˚˚˝ a a a a a ac c c c ˛‹‹‚ d RC “ ,d RC “ ,d RC “

06 2 Bˆ ˙F ¨˚˚˝ a a a a a a a a c c c c c ˛‹‹‚ d RC “

46 2

C¨˝ ˛‚G ¨˝ a a a a a a ˛‚ d RC “ ,d RC “ ,d RC “

07 1 @` ˘D ¨˚˚˚˚˚˚˝ a a c c c c

00 0 0 0 0 0 c ˛‹‹‹‹‹‹‚ d RC “ n k Residue code Generator Matrix d mRC @` ˘D ¨˚˚˚˚˚˚˝ a a a a c c c c c c

00 0 0 0 0 0 c ˛‹‹‹‹‹‹‚ d RC “

67 1 @` ˘D ¨˚˚˚˚˚˚˝ a a a a a a c c c c c c c c c ˛‹‹‹‹‹‹‚ d RC “

27 2 Bˆ ˙F ¨˚˚˚˚˝ a a a a c c

00 0 0 0 0 0 c ˛‹‹‹‹‚ d RC “ ,d RC “

67 2 Bˆ ˙F ¨˚˚˚˚˝ a a a a a a c c c c c ˛‹‹‹‹‚ d RC “ ,d RC “ ,d RC “

27 2 Bˆ ˙F ¨˚˚˚˚˝ a a a a a a a a

00 0 c c c c c c ˛‹‹‹‹‚ d RC “

47 3

C¨˝ ˛‚G ¨˚˚˝ a a a a a a

00 0 0 0 0 0 c ˛‹‹‚ d RC “ ,d RC “ ,d RC “ k Residue code Generator Matrix d mRC @` ˘D ¨˚˚˚˚˚˚˚˚˝ a a c c c c c

00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‹‹‚ d RC “

48 1 @` ˘D ¨˚˚˚˚˚˚˚˚˝ a a a a c c c c c c c

00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‹‹‚ d RC “

88 1 @` ˘D ¨˚˚˚˚˚˚˚˚˝ a a a a a a c c c c c c c c c

00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‹‹‚ d RC “

48 1 @` ˘D ¨˚˚˚˚˚˚˚˚˝ a a a a a a a ac c c c c c c c c c c c ˛‹‹‹‹‹‹‹‹‚ d RC “

08 2 Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a c c c

00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‚ d RC “ ,d RC “ n k Residue code Generator Matrix d mRC Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a a a c c c c c

00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‚ d RC “ ,d RC “ ,d RC “

48 2 Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a a a a ac c c c c c c c ˛‹‹‹‹‹‹‚ d RC “ ,d RC “ ,d RC “

08 2 Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a a a a ac c c c c c c c ˛‹‹‹‹‹‹‚ d RC “ ,d RC “

08 2 Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a a a a a c c c c c c

00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‚ d RC “

48 2 Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a a a a a a ac c c c c c c c c ˛‹‹‹‹‹‹‚ d RC “ ,d RC “

48 3

C¨˝ ˛‚G ¨˚˚˚˚˝ a a a a a a c

00 0 0 0 0 0 0 c ˛‹‹‹‹‚ d RC “ ,d RC “ ,d RC “ n k Residue code Generator Matrix d mRC C¨˝ ˛‚G ¨˚˚˚˚˝ a a a a a a a ac c c c ˛‹‹‹‹‚ d RC “ ,d RC “ ,d RC “ ,d RC “

08 3

C¨˝ ˛‚G ¨˚˚˚˚˝ a a a a a a a a a a c c c c c ˛‹‹‹‹‚ d RC “ ,d RC “ ,d RC “

48 3

C¨˝ ˛‚G ¨˚˚˚˚˚˚˚˚˝ a a a a a a a a

00 0 a a a a c c c c c c c c c c ˛‹‹‹‹‹‹‹‹‚ d RC “

48 3

C¨˝ ˛‚G ¨˚˚˚˚˝ a a a a a a a a a a a a c cc c c c ˛‹‹‹‹‚ d RC “ ,d RC “

48 4

C¨˚˚˝ ˛‹‹‚G ¨˚˚˝ a a a a a a a a ˛‹‹‚ d RC “ ,d RC “ ,d RC “ ,d RC “

08 4

C¨˚˚˝ ˛‹‹‚G ¨˚˚˚˚˚˚˚˚˚˚˝ a a a a a a a a a a a a a a a ac c c c c c c c c c c c c c c c ˛‹‹‹‹‹‹‹‹‹‹‚ d RC “

3] A. Alahmadi, A. Altassan, W. Basaﬀar, A. Bonnecaze, H. Shoaib, P.Sol´e, Type VI codes over a non-unital ring, to appear in Journal ofAlgebra and Its Applications, Available from https://hal.archives-ouvertes.fr/hal-02433480/document [4] A. Alahmadi, A. Altassan, W. Basaﬀar, A. Bonnecaze, H. Shoaib, and P.Sol´e, “Quasi Type IV codes over a non-unital ring,” preprint, Availablefrom https://hal.archives-ouvertes.fr/hal-02544399/document [5] N. Bennenni, K. Guenda, and S. Mesnager, “New DNA Cyclic Codesover Rings,”

Advances in Math. of Comm.,

Journal of Combina-torial Mathematics and Combinatorial Computing,

59 (2006): 33-87.[7] B. Fine, “Classiﬁcation of ﬁnite rings of order p ,” Mathematics Maga-zine,

The-oretical Computer Science, F ` u F for DNA computing,” Applicable Algebra in Engineering, Communica-tion and Computing,

IEEE trans. Inform. Theory,

Elec-tronic J. of Combinatorics

10 (2003): R33.[12] J. Liang and L. Wang, “On cyclic DNA codes over F ` u F ,” Journalof Applied Mathematics and Computing,

The art of DNA strings: sixteenyears of DNA coding theory, arXiv preprint arXiv:1607.00266[14] F. S. MacWilliams and N. J. A. Sloane,

The theory of error correctingcodess