DNA codes over two noncommutative rings of order four
aa r X i v : . [ c s . I T ] F e b DNA codes over two noncommutative rings of order four
Jon-Lark Kim a, ˚ , Dong Eun Ohk a a Department of Mathematics, Sogang University, Seoul 04107, South Korea
Abstract
DNA codes based on error-correcting codes have been successful in DNA-based computation and storage. Since there are four nucleobases in DNA,two well known algebraic structures such as the finite field GF p q and theinteger modular ring Z have been used. However, due to various possibilitiesof DNA sequences, it is natural to ask whether there are other algebraicstructures consisting of four elements.In this paper, we describe a new type of DNA codes over two noncom-mutative rings E and F of order four with characteristic 2. Our DNA codesare based on quasi self-dual codes over E and F . Using quasi self-duality, wecan describe fixed GC-content constraint weight distributions and reverse-complement constraint minimum distributions of those codes. Keywords:
Coding theory, DNA codes, Quasi-self dual codes, codes overrings
1. Introduction
L. Adleman [1] performed the computation using DNA strands to solve aninstance of the Hamiltonian path problem giving birth to DNA computing.Since then, DNA computing and DNA storage have been developed. Thisdevelopment requires several theories for the construction of DNA sequencessatisfying various constraints. Algebraic coding theory has contributed toconstruct DNA codes with constraints [13].Most constructions of DNA codes use linear codes over finite field withfour elements GF p q or the integer modular ring Z , both of which are well-known commutative rings of order four. It is natural to consider whether ˚ Corresponding author
Email address: [email protected] (Jon-Lark Kim)
Preprint submitted to Elsevier February 16, 2021 here are other finite rings of order four which might produce a new classof DNA codes. According to literature [7], B. Fine classified the 11 finiterings of order four. We observe that the reverse-complement condition ofDNA sequences can be translated as a product of multiplication in a ringand prove that exactly eight rings of order four out of 11 can be studied.In fact, there are some linear codes over rings of order four which areneither GF p q nor Z [2], [3], [4]. In particular, we construct DNA codessatisfying constraints over the two noncommutative rings E and F in thenotation of [7].This paper consists of six sections. In Section 2, we introduce DNA codesand some definitions of DNA codes. All finite rings of order four will becovered and we define some generalized maps on DNA codes. In Section 3and 4, we construct Quasi self-dual (QSD) DNA codes based on the QSDcodes over E which was considered in [3]. We also calculate important valuesof QSD DNA codes over E , including the number of inequivalent codes, theGC-weight distribution and the minimum distance of a fixed GC-contentssubcode on reverse-complement constraints. In Section 5, we also define QSDDNA codes over F and compute GC-weight distributions. The classificationof QSD DNA codes with n ď
2. Preliminaries
DNA coding theory is concerned about designing nucleic acid systemsusing error-correcting codes. DNA, deoxyribo nucleic acid, is a moleculecomposed of double strands built by paring the four units, Adenine, Thymine,Guanine, and Cytosine, denoted by
A, T, G and C respectively, which arecalled nucleotides. These nucleotides are joined in chains which are boundtogether with hydrogen bounds. A and T have 2 hydrogen bonds while G and C have 3 hydrogen bonds. Thus these joints make complementarybase pairings which are t A, T u and t G, C u . It is called the Watson-Crickcomplement . We denote it by A C “ T and G C “ C , or equivalently T C “ A and C C “ G . So this complement map is a bijection on the set t A, T, G, C u .A DNA sequence is a sequence of the nucleotides. The ends of a DNAsequence are chemically polar with 5 and 3 ends, which implies that thestrands are oriented. Given a sequence with the orientation 5 Ñ , thereverse complementary is involved naturally. For instance, a DNA sequence2 ´ T CGGCAACAT G ´ has its complement 3 ´ AGCCGT T GT AC ´ .If we arrange these sequences to have the same orientation, then there aretwo sequences5 ´ T CGGCAACAT G ´ and 5 ´ CAT GT T GCCGA ´ .Note that one sequence is the reverse complement of the other.A DNA sequence means one strand of DNA. The set of DNA strands areneeded for DNA computing. Thus we define the DNA code as a fixed set ofsequences consisting of A, T, G, C , which are also called codewords.
Definition 1. An p n, M q DNA code C is a set of codewords of length n , size M over four alphabets, A, T, G, C . A DNA codeword, or a DNA sequence isa codeword of a DNA code.
In general, a DNA code does not need to have algebraic structures. How-ever, in some DNA computation and DNA storage, an error-correction isrequired. Furthermore to use Algebraic coding theory we expect that theset of DNA sequences has an algebraic structure. Therefore in DNA codingtheory, we identify the set t A, T, G, C u with an order 4 ring. Since there are4 types of nucleotides, DNA codes can be constructed from algebraic struc-tures of rings with 4 elements such as the finite field GF p q or the finite ring Z . In fact, any code over 4 elements can be a DNA code, but it will bedifficult to analyze properties. Definition 2.
Let x “ p x x ¨ ¨ ¨ x n q be given ( i.e., x i P t
A, T, G, C u ). The reverse of x , denoted by x R , is the codeword p x n x n ´ ¨ ¨ ¨ x q . The complement of x , denoted by x C , is the codeword p x C x C ¨ ¨ ¨ x nC q . The reverse complement of x is x RC “ p x R q C “ p x C q R . We can easily check that p x R q C “ p x C q R for any DNA sequence x. Usingthese definitions, we can give constraints on DNA codes. Definition 3.
Let C be a DNA code, and d H be the Hamming distance. The code C has the reverse constraint if there exists d ě such that d H p x R , y q ě d for all x, y P C . The code C has the reverse-complement constraint if there exists d ě such that d H p x RC , y q ě d for all x, y P C . Note that the reverse map R : F n Ñ F n is not a linear map ( F is a4-element ring). This reverse map is a permutation, so that it is not indepen-dent of the permutation equivalence. For example, let C “ t AT T C, CGGA u .Then p AT T C q R “ p CT T A q , p CGGA q R “ p AGGC q , so that d H p x R , y q ě x, y P C . A permuted code C “ t AT CT, CGAG u has min t d H p x R , y qu “ C R “ t T CT A, GAGC u . So for these reverse constraints, we do notconsider the permutation equivalence.In genetics, it is required to compute the GC -content. The GC -contentis the percentage of G and C in a DNA. Since GC pair is held by 3 hydrogenbonds and AT pair is held by 2 hydrogen bonds, high GC -content DNAs aremore stable than low GC -content DNAs. On the other hand, if GC -contentis too high, then it is difficult to occur DNA replication. Therefore we need toset a proper GC -content. In DNA coding theory, we define the GC -contentas the number of coordinates of G and C . Definition 4.
Let C be a DNA code and x be a codeword in C . The GC -content of x is the number of G and C in x . The DNA code C has a fixed GC -content constraint if each codeword in C has the same GC -content. Many codes do not satisfy the fixed GC -content constraint. For the fixed GC -content constraint, we need to calculate the set of codewords which havethe same GC -content. Therefore we need the GC -weight enumerator. Definition 5.
Let C be a code over 4 elements t a , a , a , a u and C be aDNA code. The complete weight enumerator of the code C , CW E C p w, x, y, z q isdefined by CW E C p w, x, y, z q “ ÿ c P C w n a p c q x n a p c q y n a p c q z n a p c q where n α p c q is the number of occurrences of α P t a , a , a , a u in acodeword c . The GC -weight enumerator of the code C , GCW C p x, y q is the weightenumerator that counts the number of coordinates in t G, C u and t A, T u ,which is defined by GCW C p x, y q “ CW E C p x, x, y, y q “ ÿ c P C x n G p c q x n C p c q y n A p c q y n T p c q We can get the size of a subcode C which has a fixed GC -content using thegiven polynomial GCW C p x, y q . If GCW C p x, y q “ ř a i x i y n ´ i , then the sub-code has the order | C | “ a k where C “ t c P C | c has a fixed GC -content k (or n ´ k ) u . To date, most DNA codes are constructed using GF p q or Z . However,B. Fine classified rings of order p up to isomorphism and so there are 11finite rings of 4 elements [7]. It is possible to construct DNA codes usingother finite rings. The main goal of this paper is to construct DNA codesover other rings except for GF p q and Z , especially the ring E .ring name ring presentation char A x a ; 4 a “ , a “ a y B x a ; 4 a “ , a “ a y C x a ; 4 a “ , a “ y D x a, b ; 2 a “ b “ , a “ a, b “ b, ab “ ba “ y E x a, b ; 2 a “ b “ , a “ a, b “ b, ab “ a, ba “ b y F x a, b ; 2 a “ b “ , a “ a, b “ b, ab “ b, ba “ a y G x a, b ; 2 a “ b “ , a “ , b “ b, ab “ ba “ a y H x a, b ; 2 a “ b “ , a “ , b “ b, ab “ ba “ y I x a, b ; 2 a “ b “ , a “ b, ab “ y J x a, b ; 2 a “ b “ , a “ b “ y K x a, b ; 2 a “ b “ , a “ a, b “ a ` b, ab “ ba “ b y Table 1: Classification table of finite rings of order 4
Proposition 2.1.
The following propositions hold. A – Z and K – GF p q , which have a multiplicative identity. Theother rings do not have a multiplicative identity. a a a a a aa a a a a a a a a a a a + 0 a b c a b ca a c bb b c ac c b a Table 2: Addition tables of the ring of order 4; char 4 and char 2 D – p Z ` Z q – Z r u s{p u ´ u q and G – Z r u s{p u ´ q – Z r u s{p u q . In the above rings, only E and F are non-commutative rings. Any product of two elements in C or J is zero. As mentioned above, most DNA codes are constructed using A – Z or K – GF p q . J. Liand and L. Wang constructed the cyclic DNA codes, usingthe ring F ` u F with u “ F ` u F – G . K. Guenda and T.Gulliver constructed DNA codes over the same ring F ` u F with u “ F r u s{p u ´ q – G [16]. N. Bennenni et alintroduced another cyclic DNA codes over the ring F ` v F with v “ v [5].This ring F ` v F – D . Even though there are some papers algebraic codesover the rings E , H and I , the DNA codes over those rings have not beenconstructed. Thus we focus on the other rings.Before construct DNA codes over rings, we need to define maps whichcan calculate easily the complement and the GC -content. It means thatthe complement map and the GC -content map should be defined over finiterings. Definition 6.
Let R be a ring of order 4 and f : t A, T, C, G u Ñ R be aproper representation map. It means f is bijective. A complement map φ over R is a bijection defined by φ p f p x qq “ f p x C q . We can check that φ p x q ‰ x and φ p x q “ x since x C ‰ x and p x C q C “ x .We denote φ by φ p x q “ x C . This φ is a bijection on R, so we can definethis map φ easily. The question is whether a simple definition of φ exist. Tobe specific we want to define an element α P R satisfying x C “ x ` α . Byaddition table of rings, we can find such α in any finite ring of order 4. Ifthe finite ring has char 4, define x C “ x ` a . If the finite ring has char 2,any element of R can be α . 6et C be an additive code over R which is a ring of order 4 and letx “ p αα ¨ ¨ ¨ α q . We can calculate y C by y C “ y ` x. If the codeword x P C ,then the code C C “ t c C | c P C u is the same as the original code C . So itimplies that the reverse-complement constraint and the reverse constraintare equal in C . Definition 7.
Let R be a ring of order 4. A GC -content map ψ : R Ñ GF p q is a function defined by ψ p x q “ , if f ´ p x q “ C or G , otherwisewhere f : t A, T, C, G u Ñ R is a bijection. The map ψ can be defined ψ : R Ñ A where A “ t , r u and r p‰ q P R .This definition can be extended to ψ : R Ñ A ã ÝÑ GF p q . For the GC -contentmap we also want to define simply as ψ p x q “ βx for some β P R . Proposition 2.2.
We can define the natural GC -content map over finitering of order 4, except the ring C, J and K .Proof. Note that this β satisfies βx “ βy for some x ‰ y . If we define ψ p x q “ βx as follows β “ $’’’’’’’’’’’&’’’’’’’’’’’% a, ring A , and 0 C “ aa or 3 a, ring B , and 0 C “ aa, ring D , and 0 C “ ba or b or c, ring E , and 0 C “ ca, ring G , and 0 C “ ab or c, ring H , and 0 C “ aa or c, ring I , and 0 C “ b then the map ψ is well-defined.In the ring F , there is no element β satisfying βx “ βy for some x ‰ y .However, if we define ψ p x q “ xβ where β “ a or b or c and 0 C “ c , then this ψ is well-defined.Since the rings C and J satisfy x ¨ y “ x, y , so the element β andthe map ψ satisfying ψ p x q “ βx do not exist.7he ring K is a field, so there is no element β satisfying βx “ βy forsome x ‰ y , except zero.Therefore the ring C , J and K do not have the natural GC -contentmap.The complement map φ and the GC -content ψ can be defined on a DNAcode C p n, M q and its codeword x “ p x ¨ ¨ ¨ x n q by φ p x q “ p φ p x q ¨ ¨ ¨ φ p x n qq and ψ p x q “ p ψ p x q ¨ ¨ ¨ ψ p x n qq . Therefore φ p C q is another DN A code and ψ p C q is a binary code. From now on let φ and ψ be the map on a code. Notethat d H p ψ p x qq is the number of GC s in the codeword x. So it is natural that ψ is called the GC -content map.
3. Quasi self-dual codes over E Following Alhmadi et al., a quasi self-dual code over the ring E is de-fined [3]. Recall the ring E is defined by two generators a and b with therelation as follows. E “ x a, b | a “ b “ , a “ a, b “ b, ab “ a, ba “ b y .Its multiplication table is given as follows. ˆ a b c a a a b b b c c c Table 3: Multiplication table of the ring of the ring E Since the ring E is noncommutative, we first should define a linear E -code. A linear E -code is a one-sided E -submodule of E n . Define an innerproduct on E n as p x, y q “ ř ni “ x i y i where x, y P E n . (The product is themultiplication on E .) Definition 8 ([3]) . Let C be a linear E -code. The right dual C K R of C is the right module C K R “ t y P E n | @ x P C , p x, y q “ u . The left dual C K L of C is the left module C K L “ t y P E n | @ x P C , p y, x q “ u . The code C is left self dual (resp. right self-dual) if C “ C K L (resp. C “ C K R ). The code C is self-dual if it is both left and right self-dual. The code C is self-orthogonal if @ x, y P C , p x, y q “ . A quasi self-dual(QSD) code is a self-orthogonal code of size n . It is local with maximal ideal J “ t , c u , and its residue field E { J – GF p q . Thus for any element e P E , we can write e “ as ` ct where s, t P t , u “ GF p q and where a natural action of GF p q on E .Denote by r : E Ñ E { J – GF p q , the map of reduction modulo J . Thus r p q “ r p c q “ r p a q “ r p b q “
1. Then this map r can be the GC -contentmap ψ . Let define f : t A, T, G, C u Ñ E by f p A q “ , f p T q “ c, f p G q “ a and f p C q “ b . Then r p f p C qq “ r p f p G qq “
1, and the others go to 0. Wecan check ψ p x q “ ax satisfies 0 C “ c and ψ p q “ ψ p c q “ ψ p a q “ ψ p b q “ a .Moreover Im p ψ q “ t , a u – GF p q . Therefore ψ – r . So now let ψ “ r .Then it can be extended from E n to GF p q n . And since pairing is given by t A, T u Ñ t , c u and t G, C u Ñ t a, b u , we should define x C “ x ` c . Definition 9 ([3]) . Let C be a code of length n over E . The residue code of C is res p C q “ t ψ p y q P GF p q n | y P C u . The torsion code of C is tor p C q “ t x P GF p q n | cx P C u . The both codes are binary code.
Theorem 3.1 ([3]) . If C is a QSD code over E , then C “ a res p C q ‘ c tor p C q as modules. Theorem 3.2 ([3]) . For any QSD E -linear codes C , we have res p C q Ď res p C q K , tor p C q “ res p C q K , | C | “ dim p res p C qq ` dim p tor p C qq . We can construct QSD E -codes by the above theorems.9 heorem 3.3 ([3]) . Let B be a self-orthogonal binary r n, k s code. The code C over the ring E defined by the relation C “ aB ` cB K is a QSD code. Its residue code is B and its torsion code is B K . By above theorem, we know that the classification of QSD E -codes isequivalent to the classification of their residue codes. Moreover, two QSDcodes C and C are equivalent to permutation if and only if their residuecodes are equivalent to permutation. Therefore we can get the followingtheorem. Corollary 3.4.
Let N p n, k q be the number of inequivalent QSD codes over E where n is the length and k is the dimension of their residue codes. Then N p n, k q “ Ψ p n, k q where Ψ p n, k q is the number of inequivalent binary self-orthogonal codes. We need the classification of inequivalent binary self-orthogonal codes.Hou et al., classified the case when k ď n ď
40 [10]. Pless classifiedthe case when k “ n { n ď
20 [15].
Lemma 3.5.
The number of inequivalent binary self-orthogonal r n, k, s codesis the number of inequivalent binary self-orthogonal r n ´ , k ´ s codes.Proof. Let C be a binary self-orthogonal r n, k, s code. Let take x P C with wt p x q “
2. Then we can puncture the positions of nonzero coordinates of x .We can get self-orthogonal r n ´ , k ´ s code. Conversely, we can get r n, k, s codes from r n ´ , k ´ s codes by adding weight 2 extra vector. Obviouslyif two codes are equivalent, then the induced codes are also equivalent.We can calculate the number of self-orthogonal r , s codes and self-orthogonal r , s codes using the lemma and the paper of I. Bouyukliev [6]. Lemma 3.6.
The number of inequivalent binary self-orthogonal r , s codesis 27. The number of inequivalent binary self-orthogonal r , s codes is 48.Proof. Note that the largest minimum distance of binary r , s codes is 5.So we can let d ď d is the minimum distance. Since the binary selforthogonal codes have only even weights, we consider the cases d “ “
4. By the above lemma, the number of self-orthogonal r , , s codesis the number of self-orthogonal r , s codes, that is 15. By the paper of I.Bouyukliev, there exist twelve self-orthogonal r , , s codes [6]. Hence thenumber of self-orthogonal r , s codes is 15 ` “ r , , s codes and r , , s codes. They are 23 and 25, respectively. Even thoughthere exist a r , , s code, it cannot be self-orthogonal. So the number of r , s codes is 23 ` “ n k N
11 12 130 1 2 3 4 5 0 1 2 3 4 5 6 0 1 2 3 41 5 10 14 12 4 1 6 16 26 28 15 3 1 6 16 30 36
13 14 155 6 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 723 6 1 7 23 51 75 61 27 4 1 7 23 58 98 94 48 10
4. Quasi self-dual DNA codes over E Theorem 4.1 ([3]) . Let C be a QSD code over E . Then CW E C p w, x, y, z q “ J p res p C q , tor p C qqp w, x, y, z q where J p A, B q of two binary linear codes A, B is the joint weight enumeratordefined by J p A, B qp w, x, y, z q ÿ u P A,v P B w i p u,v q x j p u,v q y k p u,v q z l p u,v q ,i, j, k, l are the integers of the number of indices ι P t , ¨ ¨ ¨ , n u with p u ι , v ι q “p , q , p , q , p , q and p , q , respectively. heorem 4.2. Let C be a QSD code over E . Then GCW C p x, y q “ n ÿ i “ n ´ k A i p res p C qq x i y n ´ i where n “ | C | , k “ dim p res p C qq and A i p res p C qq is the binary weight distri-bution of res p C q .Proof. By MacWilliams [14], W res p C q p x, y q “ dim p tor p C qq J p res p C q , tor p C qqp x, x, y, y q . So GCW C p x, y q “ CW E C p x, x, y, y q “ J p res p C q , tor p C qqp x, x, y, y q“ dim p tor p C qq W res p C q p x, y q “ n ´ k W res p C q p x, y q . For example, let res p C q “ xp ¨ ¨ ¨ ¨ ¨ ¨ qy , where the codewordhas m ones and n ´ m zeros ( m is even). Then res p C q is a 1-dimensionalcode, therefore GCW C p x, y q “ n ´ x m y n ´ m ` n ´ y n . We can get p n, n ´ q DNA codes which have fixed GC -content constraint m and 0 (resp).We want to give reverse (and reverse-complement) constraints for QSDDNA codes. Since any residue code has zero vector, its QSD DNA code hasthe vector p cc ¨ ¨ ¨ c q . Since the complement map is defined x C “ x ` c inthe ring E , x C P C for any x P C and for any QSD DNA code C . Thenmin t d H p x RC , y q | x, y P C u “
0. Thus we need to give reverse-complementconstraints to a subcode which have fixed a GC -content constraint. Definition 10.
Let C be a QSD DNA code of length n . Let C m be the subcodeof C , which has fixed GC -content constraint m . This C m has permutationequivalence codes, P m “ t σ p C m q | σ P S n u . Define d mRC : “ max C P P m t d C u where d C “ min t d H p x RC , y q | x, y P C u .
12t is clear that d RC “ p cc ¨ ¨ ¨ c q in C . Theorem 4.3.
Let C be a QSD code over E satisfying res p C q “ x a y “xp ¨ ¨ ¨ ¨ ¨ ¨ qy where d H p a q “ m ( m is even). Then d mRC “ t m, n ´ m u .Proof. Let take σ , σ P S n . Suppose that d H p σ p a q , σ p a q R q ď d H p σ p a q , σ p a q R q . Denote σ p a q “ p x ¨ ¨ ¨ x n q where x i P GF p q . Then d H p σ p a q , σ p a q R q “ the number of x i s , where x i ‰ x n ´ i . We claim that d σ p C m q “ d H p σ p a q , σ p a q R q .If x j ‰ x n ´ j for some j , then ax j ` ct ‰ ax n ´ j ` ct for any t , t P GF p q (since a p x j ` x n ´ j q “ a ‰ c p t ` t q ). So d H p x RC , y q ě x, y P σ p C m q (For any x, y , it generated by σ p a q and so the counting appears in the j th and p n ´ j q th position). If there exist k coordinates j s which sat-isfy x j ‰ x n ´ j , then d H p x RC , y q ě k for any x, y P σ p C m q . Therefore d σ p C m q “ d H p σ p a q , σ p a q R q . This claim means that we can get the mini-mum distance of the subcode σ p C m q using the distance of σ p a q .Then by the assumption we can get that d σ p C m q ď d σ p C m q . Therefore we need to increase the number of x i ’s satisfying x i ‰ x n ´ i . If m ă n {
2, then we can take σ P S n where σ p a q “ a “ p ¨ ¨ ¨ ¨ ¨ ¨ q sothat there are m positions of x i ’s satisfying x i ‰ x n ´ i . Thus d σ p C m q “ m . If m ě n {
2, then also σ p a q “ a “ p ¨ ¨ ¨ ¨ ¨ ¨ q has n ´ m positions of x i ’ssatisfying x i ‰ x n ´ i so that d σ p C m q “ p n ´ m q . Therefore d mRC “ m, if m ă n { p n ´ m q if m ě n { . If 2 m ă p n ´ m q , then m ă n { d mRC “ m . If 2 m ě p n ´ m q , then m ě n { d mRC “ p n ´ m q . Thus d mRC “ t m, n ´ m u . Theorem 4.4.
Let C be a QSD code over E satisfying res p C q “ Bˆ a a ˙F “ Bˆ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ˙F where d H p a q “ m and d H p a q “ m ( m and m are positive even integers).Let m “ m ` m . Then the following hold. If m “ m , then d m RC “ d m RC “ min t m, p n ´ t n { u q ´ m u and d mRC “ t m, n ´ m u . If m ‰ m , then d m RC “ t m , n ´ m u , d m RC “ t m , n ´ m u and d mRC “ t m, n ´ m u .Note that p n ´ t n { u q ´ m “ n ´ m, if n is even n ´ m ` , if n is odd .Proof. - Case 1. Suppose m “ m “ m { C m is generated by one vector p a ` a q , so by Theorem 4.3, d mRC “ t m, n ´ m u . Assume that m ă n { p a q or p a q can have 2 m “ m positionsof x i ’s satisfying x i ‰ x n ´ i . Thus d m RC “ m “ m .Now assume that n { ď m . Let σ p a q “ p ¨ ¨ ¨ ¨ ¨ ¨ q . By theassumption σ p a q has to form that σ p a q “ p ¨ ¨ ¨ x m ` ¨ ¨ ¨ x n q where x i P GF p q .Let n be even. To avoid the coincidence, we should take x i ’s such that x m ` “ ¨ ¨ ¨ “ x n { “
1. Locate the rest of ones x n ´ m ` n { ` “ ¨ ¨ ¨ “ x n “ d H p σ p a q , p σ p a q R qq “ p n { ´ m q “ n ´ m , d H p σ p a q , p σ p a q R qq “ d H p σ p a q , p σ p a q R qq “ m “ m . Since m ď n , so d m RC “ n ´ m .Next, let n be odd. If we take the same progress as the n even case,we can get d H p σ p a q , p σ p a q R qq “ p t n { u ´ m q “ t n { u ´ m . How-ever, since n is odd, we can let x t n { u ` “
1, which is in σ p a q . In thatcase, d H p σ p a q , p σ p a q R qq “ t n { u ´ m ` d H p σ p a q , p σ p a q R qq “ m , d H p σ p a q , p σ p a q R qq “ m ´
2. Since n { ă m , so t n { u ` ď m . Then2 t n { u ` ď m ` ď m (since 2 ď m ).Therefore d m RC “ t n { u ´ m `
2. Then d m RC can be formed as d m RC “ p n ´ t n { u q ´ m . Thus d m RC “ m, if m ă n { p n ´ t n { u q ´ m, if m ě n { m ă n {
2, then m ă n ´ m ď p n ´ t n { u q´ m . Thus d m RC “ min t m, p n ´ t n { u q ´ m u .- Case 2. Suppose m ‰ m . Then the subcode with fixed GC -contentconstraint m is generated by one vector a . So by Theorem 4.3, d m RC “ t m , n ´ m u . In the same argument, we can get the following results: d m RC “ t m , n ´ m u , and C m is generated by one vector p a ` a q , so d mRC “ t m, n ´ m u . 14 heorem 4.5. Let C be a QSD code over E satisfying res p C q “ Bˆ a a ˙F “ Bˆ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ¨ ˙F where d H p a q “ m ` m , d H p a q “ m ` m and d H p a X a q “ m ( m , m and m are positive even integers). Then the following holds. If m , m and m are all distinct, then d m i ` m j RC “ t m i ` m j , n ´p m i ` m j qu for all ď i ‰ j ď . Without loss of generality, let m “ m ‰ m . Then d m ` m RC “ d m RC “ t m , n ´ m u and d m ` m RC “ d m ` m RC is d m ` m RC “ $’&’% p m ` m q if m ` m ă n { n ´ m ´ δ n if n { ď m ` m ă n { ` m p t n { u ´ m q if n { ď m . If m “ m “ m , then d m ` m RC “ d m ` m RC “ d m ` m RC is d m ` m RC “ $’&’% m if m ă n { n ´ m ´ δ n if n { ď m ă n { p t n { u ´ m q if n { ď m . where δ n “ $’&’% if n ” if n ” if n ” .Proof. - Case 1. If m , m and m are all distinct, then the subcodes whichhave fixed GC -content constraint are generated by one vector. So d m i ` m j RC “ t m i ` m j , n ´ p m i ` m j qu is obvious.- Case 2. Suppose m ‰ m “ m . Then this case is obviously the same asthe case m “ m ‰ m . And let a “ a ` a . Then res p C q can be generatedby a and a . Since a “ a ` a , so d H p a q “ d H p a q` d H p a q´ d H p a X a q “ m ` m and d H p a X a q “ m . Therefore the case m ‰ m “ m is thesame as the case m “ m ‰ m .So now suppose that m “ m ‰ m . Then only the code a ` a generates the codeword which has fixed m ` m GC -content. Thus d m RC “ t m , n ´ m u is obvious. 15f m ` m ` m “ m ` m ď n {
2, then we can easily check that d m ` m RC “ t m ` m , n ´ p m ` m qu . Note that 2 m ` m ď n { t m ` m , n ´ p m ` m qu “ p m ` m q . So d m ` m RC “ p m ` m q .Assume that m ` m ` m “ m ` m ą n { m ` m “ m ă n { σ p a q “ p x ¨ ¨ ¨ x n q and σ p a q “ p y ¨ ¨ ¨ y n q where x i , y i P GF p q .Then we can let x “ ¨ ¨ ¨ x m “ x m ` “ ¨ ¨ ¨ “ x m “ y “ ¨ ¨ ¨ “ y m “ y m ` “ ¨ ¨ ¨ “ y m “ n ” x m ` “ ¨ ¨ ¨ “ x n { “ “ y m ` “ ¨ ¨ ¨ “ y n { . Then rest 1’s should belocated in x n { ` , . . . , x n and y n { ` , . . . , y n . Since n ” m ´ p n { ´ m q so it is even. If we let 1’s to one side,the coincidence will be increasing. Thus we can take x n ´ m ´ m { ` n { ` “¨ ¨ ¨ “ x n ´ m ` m { ´ n { “
1. Then the number of 1’s is m ´ n { ` m and the middle point is between n ´ m and n ´ m `
1. In this case d H p σ p a q , σ p a q R q “ n ´ m . The other Hamming distance is not smallerthan n ´ m .If n ” m ´ n { ` m is not even sowe cannot divide into half. So one side has more 1’s, and then the minimumdistance value is decreasing exactly 2. If n ” x t n { u ` “ y t n { u ` “
1. Then the minimum distance value is decreasing exactly 1.Therefore d m ` m RC “ n ´ m ´ δ n .Lastly assume that n { ď m ` m “ m . Denote σ p a q “ p x ¨ ¨ ¨ x n q and σ p a q “ p y ¨ ¨ ¨ y n q . Let x “ ¨ ¨ ¨ “ x m “ y t n { u ` “ ¨ ¨ ¨ “ y t n { u ` m “
1. And let x m ` “ ¨ ¨ ¨ “ x m ` m { “ y t n { u ` m ` “ ¨ ¨ ¨ “ y t n { u ` m “ m { “
1. Then d m ` m RC “ ˆ p m { q ` ˆ p t n { u ´ m ´ m { q “ p t n { u ´ m q .- Case 3. Assume that m “ m “ m . Then we can apply the samemethodas the case 2.For example, let res p C q “ Bˆ ˙F . Then by the formula, d RC “ d RC “
2. By these lemmas, we can calculate some d RC values.The table in the conclusion shows some proper value of d RC for each lengthand dimension of residue codes. The tables of specific d RC values up to theclassification of QSD DNA codes with n ď . Quasi self-dual DNA codes over F The ring F is defined by F “ @ a, b | a “ b “ , a “ a, b “ b, ab “ b, ba “ a D . Thus its multiplication table is given as follows. ˆ a b c a a b cb a b cc Table 4: Multiplication table of the ring of the ring F The ring E and F are not isomorphic. Even though p x, y q E ‰ p x, y q F forinner products, we can define a QSD DNA code over the ring F similarly.Let a linear F -code be a one-sided F -submodule of F n . Definition 11.
Let x, y P F n where x “ p x , ¨ ¨ ¨ , x n q and y “ p y , ¨ ¨ ¨ , y n q .Define an inner product of x, y as p x, y q “ ř x i y i . Let C be a linear F -code. The right dual C K R of C is the right module C K R “ t y P F n | @ x P C , p x, y q “ u . The left dual C K L of C is the left module C K L “ t y P F n | @ x P C , p y, x q “ u . The code C is left self dual (resp. right self-dual) if C “ C K L (resp. C “ C K R ). And the code C is self-dual is it is both of its duals. The code C is self-orthogonal if @ x, y P C , p x, y q “ . A quasi self-dual(QSD) code is a self-orthogonal code of size n . Remark that p x, y q E ‰ p x, y q F as an inner product. However if C is QSDin the ring E , then so is in the ring F . Theorem 5.1.
Let C be a QSD code over the ring E . Then by a map f : E ÞÑ F , f p C q is a QSD code over ring F .Proof. Define a bijection f : E ÞÑ F by f p a E q “ a F , f p b E q “ b F and f p c E q “ c F . Let C be a QSD code over the ring E . Take x, y P C , denotedby x “ p x , . . . , x n q and y “ p y , . . . , y n q . Then p x, y q E “ ř ni “ p x i , y i q “ x m ,y m “ c p x m , y m q ` ÿ x m ‰ nor c p x m , c q ` ÿ y n ‰ nor c p c, y n q ` ÿ x m ‰ nor cy n ‰ nor c p x m , y n q“ ÿ y n ‰ nor c p c, y n q ` ÿ x m ‰ nor cy n ‰ nor c p x m , y n q “ a terms or b terms in the summation sum x m ‰ nor cy n ‰ nor c p x m , y n q . Therefore the number of coordinates of a ’s and b ’s in y is even. If the numberof a ’s in odd, then p y, y q ‰
0. So both the number of a ’s and the number of b ’s are even. Hence every codeword in C has even a -positions and b -positions.Then n ÿ i “ p f p x i q , f p y i qq “ ÿ x m ‰ nor c p f p x m q , f p c qq ` ÿ x m ‰ ,cy n ‰ nor c p f p x m q , f p y n qq “
0. So we can regard an QSD code over the ring E as an QSD code over thering F . Definition 12.
Let C be a code of length n over F . The residue code of C is res p C q “ t ψ p y q | y P C u . The torsion code of C is tor p C q “ t x P GF p q n | cx P C u where ψ : F Ñ GF p q is the map ψ p q “ ψ p c q “ and ψ p a q “ ψ p b q “ , or ψ p x q “ xa . The map ψ p x q “ xa has an image t , c u – GF p q so this map ψ is well-defined. Lemma 5.2.
Every element f P F can be written f “ as ` ct where s, t P GF p q . F is isomorphic to the ring E as additive group, F alsohas this decomposition. Corollary 5.3. If C is a QSD code over F , then C “ a res p C q ‘ c tor p C q asmodules. Corollary 5.4.
Let N p n, k q be the number of inequivalent QSD codes over F where n is the length and k is the dimension of their residue codes. Then N p n, k q “ Ψ p n, k q where Ψ p n, k q is the number of inequivalent binary self-orthogonal codes. Corollary 5.5.
Let C be a QSD code over F . Then GCW C p x, y q “ n ÿ i “ n ´ k A i p res p C qq x i y n ´ i where n “ | C | , k “ dim p res p C qq and A i p res p C qq is the binary weight distri-bution of res p C q . Therefore we can check F has the same GC -weight distribution over E .
6. Conclusion
In this paper, we construct QSD DNA codes over E . For each DNA code,the GC -weight enumerator is obtained. This implies the (nonlinear) subcodeswhich have a fixed GC -content. Especially some minimum distances withreverse complement constraint in the ring E are calculated for n ď
8. Thetables of d RC are below. Some values of d RC are computed by MAGMAprogramming. The QSD DNA codes over the ring F is almost same as thecase of the ring E , so we can apply the below tables. References [1] L. M. Adleman, “Molecular computation of solutions to combinatorialproblems,”
Science , 266(5187) (1994): 1021–1024.[2] A. Alahmadi, A. Alkathiry, A. Altassan, W. Basaffar, A. Bonnecaze, H.Shoaib, and P. Sol´e, “Type IV codes over a non-local non-unital ring,”
Proyecciones (Antofagasta) , 39(4) (2020): 963–978.19 k Residue code Generator Matrix d mRC n @` ¨ ¨ ¨ ˘D ` c ¨ ¨ ¨ c ˘ d RC “
02 1 @` ˘D ` a a ˘ d RC “
03 1 @` ˘D ˆ a a
00 0 c ˙ d RC “
24 1 @` ˘D ¨˝ a a c
00 0 0 c ˛‚ d RC “
44 1 @` ˘D ¨˝ a a a ac c c c ˛‚ d RC “
04 2 Bˆ ˙F ˆ a a a a ˙ d RC “ ,d RC “
05 1 @` ˘D ¨˚˚˝ a a c c
00 0 0 0 c ˛‹‹‚ d RC “
45 1 @` ˘D ¨˚˚˝ a a a a c c c c c ˛‹‹‚ d RC “
25 2 Bˆ ˙F ¨˝ a a a a
00 0 0 0 c ˛‚ d RC “ ,d RC “
26 1 @` ˘D ¨˚˚˚˚˝ a a c c c
00 0 0 0 0 c ˛‹‹‹‹‚ d RC “ n k Residue code Generator Matrix d mRC @` ˘D ¨˚˚˚˚˝ a a a a c c c c c
00 0 0 0 0 c ˛‹‹‹‹‚ d RC “
46 1 @` ˘D ¨˚˚˚˚˝ a a a a a ac c c c c c c c ˛‹‹‹‹‚ d RC “
06 2 Bˆ ˙F ¨˚˚˝ a a a a c
00 0 0 0 0 c ˛‹‹‚ d RC “ ,d RC “
46 2 Bˆ ˙F ¨˚˚˝ a a a a a ac c c c ˛‹‹‚ d RC “ ,d RC “ ,d RC “
06 2 Bˆ ˙F ¨˚˚˝ a a a a a a a a c c c c c ˛‹‹‚ d RC “
46 2
C¨˝ ˛‚G ¨˝ a a a a a a ˛‚ d RC “ ,d RC “ ,d RC “
07 1 @` ˘D ¨˚˚˚˚˚˚˝ a a c c c c
00 0 0 0 0 0 c ˛‹‹‹‹‹‹‚ d RC “ n k Residue code Generator Matrix d mRC @` ˘D ¨˚˚˚˚˚˚˝ a a a a c c c c c c
00 0 0 0 0 0 c ˛‹‹‹‹‹‹‚ d RC “
67 1 @` ˘D ¨˚˚˚˚˚˚˝ a a a a a a c c c c c c c c c ˛‹‹‹‹‹‹‚ d RC “
27 2 Bˆ ˙F ¨˚˚˚˚˝ a a a a c c
00 0 0 0 0 0 c ˛‹‹‹‹‚ d RC “ ,d RC “
67 2 Bˆ ˙F ¨˚˚˚˚˝ a a a a a a c c c c c ˛‹‹‹‹‚ d RC “ ,d RC “ ,d RC “
27 2 Bˆ ˙F ¨˚˚˚˚˝ a a a a a a a a
00 0 c c c c c c ˛‹‹‹‹‚ d RC “
47 3
C¨˝ ˛‚G ¨˚˚˝ a a a a a a
00 0 0 0 0 0 c ˛‹‹‚ d RC “ ,d RC “ ,d RC “ k Residue code Generator Matrix d mRC @` ˘D ¨˚˚˚˚˚˚˚˚˝ a a c c c c c
00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‹‹‚ d RC “
48 1 @` ˘D ¨˚˚˚˚˚˚˚˚˝ a a a a c c c c c c c
00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‹‹‚ d RC “
88 1 @` ˘D ¨˚˚˚˚˚˚˚˚˝ a a a a a a c c c c c c c c c
00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‹‹‚ d RC “
48 1 @` ˘D ¨˚˚˚˚˚˚˚˚˝ a a a a a a a ac c c c c c c c c c c c ˛‹‹‹‹‹‹‹‹‚ d RC “
08 2 Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a c c c
00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‚ d RC “ ,d RC “ n k Residue code Generator Matrix d mRC Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a a a c c c c c
00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‚ d RC “ ,d RC “ ,d RC “
48 2 Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a a a a ac c c c c c c c ˛‹‹‹‹‹‹‚ d RC “ ,d RC “ ,d RC “
08 2 Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a a a a ac c c c c c c c ˛‹‹‹‹‹‹‚ d RC “ ,d RC “
08 2 Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a a a a a c c c c c c
00 0 0 0 0 0 0 c ˛‹‹‹‹‹‹‚ d RC “
48 2 Bˆ ˙F ¨˚˚˚˚˚˚˝ a a a a a a a a a ac c c c c c c c c ˛‹‹‹‹‹‹‚ d RC “ ,d RC “
48 3
C¨˝ ˛‚G ¨˚˚˚˚˝ a a a a a a c
00 0 0 0 0 0 0 c ˛‹‹‹‹‚ d RC “ ,d RC “ ,d RC “ n k Residue code Generator Matrix d mRC C¨˝ ˛‚G ¨˚˚˚˚˝ a a a a a a a ac c c c ˛‹‹‹‹‚ d RC “ ,d RC “ ,d RC “ ,d RC “
08 3
C¨˝ ˛‚G ¨˚˚˚˚˝ a a a a a a a a a a c c c c c ˛‹‹‹‹‚ d RC “ ,d RC “ ,d RC “
48 3
C¨˝ ˛‚G ¨˚˚˚˚˚˚˚˚˝ a a a a a a a a
00 0 a a a a c c c c c c c c c c ˛‹‹‹‹‹‹‹‹‚ d RC “
48 3
C¨˝ ˛‚G ¨˚˚˚˚˝ a a a a a a a a a a a a c cc c c c ˛‹‹‹‹‚ d RC “ ,d RC “
48 4
C¨˚˚˝ ˛‹‹‚G ¨˚˚˝ a a a a a a a a ˛‹‹‚ d RC “ ,d RC “ ,d RC “ ,d RC “
08 4
C¨˚˚˝ ˛‹‹‚G ¨˚˚˚˚˚˚˚˚˚˚˝ a a a a a a a a a a a a a a a ac c c c c c c c c c c c c c c c ˛‹‹‹‹‹‹‹‹‹‹‚ d RC “
3] A. Alahmadi, A. Altassan, W. Basaffar, A. Bonnecaze, H. Shoaib, P.Sol´e, Type VI codes over a non-unital ring, to appear in Journal ofAlgebra and Its Applications, Available from https://hal.archives-ouvertes.fr/hal-02433480/document [4] A. Alahmadi, A. Altassan, W. Basaffar, A. Bonnecaze, H. Shoaib, and P.Sol´e, “Quasi Type IV codes over a non-unital ring,” preprint, Availablefrom https://hal.archives-ouvertes.fr/hal-02544399/document [5] N. Bennenni, K. Guenda, and S. Mesnager, “New DNA Cyclic Codesover Rings,”
Advances in Math. of Comm.,
Journal of Combina-torial Mathematics and Combinatorial Computing,
59 (2006): 33-87.[7] B. Fine, “Classification of finite rings of order p ,” Mathematics Maga-zine,
The-oretical Computer Science, F ` u F for DNA computing,” Applicable Algebra in Engineering, Communica-tion and Computing,
IEEE trans. Inform. Theory,
Elec-tronic J. of Combinatorics
10 (2003): R33.[12] J. Liang and L. Wang, “On cyclic DNA codes over F ` u F ,” Journalof Applied Mathematics and Computing,
The art of DNA strings: sixteenyears of DNA coding theory, arXiv preprint arXiv:1607.00266[14] F. S. MacWilliams and N. J. A. Sloane,
The theory of error correctingcodess