Fast matrix multiplication techniques based on the Adleman-Lipton model
aa r X i v : . [ q - b i o . Q M ] D ec FAST MATRIX MULTIPLICATION TECHNIQUES BASED ON THEADLEMAN-LIPTON MODEL
ARAN NAYEBI
Abstract.
On distributed memory electronic computers, the implementation and association of fast parallelmatrix multiplication algorithms has yielded astounding results and insights. In this discourse, we use thetools of molecular biology to demonstrate the theoretical encoding of Strassen’s fast matrix multiplicationalgorithm with DNA based on an n -moduli set in the residue number system, thereby demonstrating theviability of computational mathematics with DNA. As a result, a general scalable implementation of thismodel in the DNA computing paradigm is presented and can be generalized to the application of all fastmatrix multiplication algorithms on a DNA computer. We also discuss the practical capabilities and issuesof this scalable implementation. Fast methods of matrix computations with DNA are important becausethey also allow for the efficient implementation of other algorithms (i.e. inversion, computing determinants,and graph theory) with DNA. Introduction
The multiplication of matrices is a fundamental operation applicable to a diverse range of algorithms fromcomputing determinants, inverting matrices, and solving linear systems to graph theory. Indeed, Bunch andHopcroft [12] successfully proved that given an algorithm for multiplying two n × n matrices in O ( n α ) opera-tions where 2 < α ≤
3, then the triangular factorization of a permutation of any n × n nonsingular matrix aswell as its inverse can be found in O ( n α ) operations. The standard method of square matrix multiplicationrequires 2 n operations. Let ω be the smallest number such that O ( n ω + ǫ ) multiplications suffice for all ǫ > m k ) by m k multiplications and (5 + m ) m k − m k additions. Thus,by recursive application of Strassen’s algorithm, the product of two matrices can be computed by at most(4 . n log operations. Following Strassen’s work, Coppersmith and Winograd [2] were able to improve theexponent to 2.38. Their approaches and those of subsequent researchers rely on the same framework: Forsome k , they devise a method to multiply matrices of order k with m ≪ k multiplications and recursivelyapply this technique to show that ω < log k m [14]. Only until recently, it was long supposed that ω could takeon the value of 2 without much evidence. Using a group-theoretic construction, Cohn, Kleinberg, Szegedy,and Umans [7] rederived the Coppersmith-Winograd algorithm to describe several families of wreath productgroups that yield nontrivial upper bounds on ω , the best asymptotic result being 2.41. They also presentedtwo conjectures in which either one would imply an exponent of 2.Unfortunately, although these improvements to Strassen’s algorithm are theoretically optimal, they lackpragmatic value. In practice, only the Strassen algorithm is fully implemented and utilized as such:For even integers m , n , and k , let X ∈ R m × k and Y ∈ R k × n be matrices with product Q ∈ R m × n , andset X = (cid:18) X X X X (cid:19) , Y = (cid:18) Y Y Y Y (cid:19) , Q = (cid:18) Q Q Q Q (cid:19) , where X ij ∈ R m/ × k/ , Y ij ∈ R k/ × n/ , and Q ij ∈ R m/ × n/ . Then perform the following to compute Q = XY , M := ( X + X )( Y + Y ) ,M := ( X + X ) Y , Mathematics Subject Classification.
Primary 65F05, 03D10; Secondary 68Q10, 68Q05, 03D80.
Key words and phrases.
DNA computing; residue number system; logic and arithmetic operations; Strassen algorithm. M := X ( Y − Y ) ,M := X ( − Y + Y ) ,M := ( X + X ) Y ,M := ( − X + X )( Y + Y ) ,M := ( X − X )( Y + Y ) ,Q = M + M − M + M ,Q = M + M ,Q = M + M ,Q = M + M − M + M . Even if the dimension of the matrices is not even or if the matrices are not square, it is easy to pad thematrices with zeros and perform the aforementioned algorithm.Typically, computations such as this one are performed using electronic components on a silicon substrate.In fact, it is a commonly held notion that most computers should follow this model. In the last decadehowever, a newer and more revolutionary form of computing has come about, known as DNA computing.DNA’s key advantage is that it can make computers much smaller than before, while at the same timemaintaining the capacity to store prodigious amounts of data. Since Adleman’s [9] pioneering paper, DNAcomputing has become a rapidly evolving field with its primary focus on developing DNA algorithms forNP-complete problems. However, unlike quantum computing in recent years, the viability of computationalmathematics on a DNA computer has not yet been fully demonstrated, for the whole field of DNA-basedcomputing has merged to controlling and mediating information processing for nano structures and molecularmovements. In fact, only recently have the primitive operations in mathematics (i.e. addition, subtraction,multiplication, and division) been implemented. Thus, the general problem dealt with in this paper isto explore the feasibility of computational mathematics with DNA. Fujiwara, Matsumoto, and Chen [1]proved a DNA representation of binary integers using single strands and presented procedures for primitivemathematical operations through simple manipulations in DNA. It is important to note that the work ofFujiwara et al. [1] and those of subsequent researchers have relied upon a fixed-base number system. Thefixed-base number system is a bottleneck for many algorithms as it restricts the speed at which arithmeticoperations can be performed and increases the complexity of the algorithm. Parallel arithmetic operationsare simply not feasible in the fixed-base number system because of the effect of a carry propagation. Recently,Zheng, Xu, and Li [17] have presented an improved DNA representation of an integer based on the residuenumber system (RNS) and give algorithms of arithmetic operations in Z M = { , , · · · , M − } where Z M isthe ring of integers with respect to modulo M . Their results exploit the massive parallelism in DNA mainlybecause of the carry-free property of all arithmetic operations (except division, of course) in RNS.In this paper we present a parallelization method for performing Strassen’s fast matrix multiplicationmethods on a DNA computer. Although DNA-based methods for the multiplication of boolean [8] andreal-numbered matrices [6] have been proven, these approaches use digraphs and are not divide-and-conquerlike Strassen’s algorithm (and hence are not particularly efficient when used with DNA). Divide-and-conqueralgorithms particularly benefit from the parallelism of the DNA computing paradigm because distinct sub-processes can be executed on different processors. The critical problem addressed in this paper is to providea DNA implementation of Strassen’s algorithm, while keeping in mind that in recent years it has been shownthat the biomolecular operations suggested by the Adleman-Lipton model are not very reliable in practice.More specifically, the objectives we aim to accomplish in this research paper are the following: • To provide in § • To establish a systematic approach in § • Next, based on this representation system, we describe in § AST MATRIX MULTIPLICATION TECHNIQUES BASED ON THE ADLEMAN-LIPTON MODEL 3 • And lastly, we present in § § r .Our approach uses the Cannon algorithm at the bottom level (within a tube containing a memory strand)and the Strassen algorithm at the top level (between memory strands). We show that the Strassen-Cannonalgorithm decreases in complexity as the recursion level r increases [3]. If the Cannon algorithm is replacedby other parallel matrix multiplication algorithms at the bottom level (such as the Fox algorithm), ourresult still holds. The difficulty that arises is that in order to use the Strassen algorithm at the top level,we must determine the sub-matrices after the recursive execution of the Strassen formula r times and thenfind the resultant matrix. On a sequential machine, this problem is trivial; however, on a parallel machinethis situation becomes much more arduous. Nguyen, Lavall´ee, and Bui [3] present a method for electroniccomputers to determine all the nodes at the unspecified level r in the execution tree of the Strassen algorithm,thereby allowing for the direct calculation of the resultant matrix from the sub-matrices calculated by parallelmatrix multiplication algorithms at the bottom level. Thus, we show that this result can theoretically beobtained using DNA, and combined with a storage map of sub-matrices to DNA strands and with the usageof the Cannon algorithm at the bottom level, we have a general scalable implementation of the Strassenalgorithm on Adleman’s DNA computer. As of the moment, we should note that this implementation isprimarily theoretical because in practice, the Adleman-Lipton model is not always feasible, as explained in §
5. The reason why we concentrate on the Strassen algorithm is that it offers superior performance than thetraditional algorithm for practical matrix sizes less than 10 [3]. However, our methods are also applicableto all fast matrix multiplication algorithms on a DNA computer, as these algorithms are always in recursiveform [15]. In addition, our results can be used to implement other algorithms such as inversion and computingdeterminants on a DNA computer since matrix multiplication is almost ubiquitous in application.2. Preliminary Theory
The Residue Number System.
The residue number system is defined by a set of pairwise, coprimemoduli P = { q n − , · · · , q } . Furthermore, an integer in RNS is represented as a vector of residues with respectto the moduli set P . As a consequence of the Chinese remainder theorem, for any integer x ∈ [0 , M − M = Q n − i =0 q i , each RNS representation is unique. As stated by Zheng, Xu, and Li [17], the vector( x n − , · · · , x ) denotes the residue representation of x .It has been previously mentioned that one of the important characteristic of RNS is that all arithmeticoperations except for division are carry-free. Thus, for any two integers x → ( x n − , · · · , x ) ∈ Z M and y → ( y n − , · · · , y ) ∈ Z M we obtain the following from [5]:(2.1.1) | x ◦ y | M → (cid:0) | x n − ◦ y n − | q n − , · · · , | x ◦ y | q (cid:1) , in which ◦ is any operation of addition, subtraction, or multiplication.2.2. The Adleman-Lipton Model.
In this section we present a theoretical and practical basis for our al-gorithms. By the Adleman-Lipton model, we define a test tube T as a multi-set of (oriented) DNA sequencesover the nucleotide alphabet { A, G, C, T } . The following operations can be performed as follows: • M erge ( T , T ): merge the contents in tube T and tube T , and store the results in tube T ; • Copy ( T , T ): make a copy of the contents in tube T and store the result in tube T ; • Detect ( T ): for a given tube T , this operation returns “True” if tube T contains at least one DNAstrand, else it returns “False”; • Separation ( T , X, T ): from all the DNA strands in tube T , take out only those containing thesequences of X over the alphabet { A, G, C, T } and place them in tube T ; • Selection ( T , l, T ): remove all strands of length l from tube T into tube T ; ARAN NAYEBI • Cleavage ( T, σ σ ): given a tube T and a sequence σ σ , for every strand containing (cid:20) σ σ σ σ (cid:21) , thenthe cleavage operation can be performed as such: (cid:20) α σ σ β α σ σ β (cid:21) Cleavage ( T,σ σ ) −−−−−−−−−−−→ (cid:20) α σ α σ (cid:21) , (cid:20) σ β σ β (cid:21) , where the overhead bar denotes the complementary strand. • Annealing ( T ): produce all feasible double strands in tube T and store the results in tube T (theassumption here is that ligation is executed after annealing); • Denaturation ( T ): disassociate every double strand in tube T into two single strands and store theresults in tube T ; • Empty ( T ): empty tube T .According to [5], the complexity of each of the aforementioned operations is O (1).2.3. Revised Adleman-Lipton Model through Ligation by Selection.
In practice, the recursive prop-erties of our implementation of the Strassen-Canon algorithm require a massive ligation step that is not fea-sible. The reason is that, in practice, the biomolecular operations suggested by the Adleman-Lipton modelare not completely reliable. This ligation step cannot produce longer molecules as required by our imple-mentation, and certainly not more than 10-15 ligations in a row. Not to mention that both the complexity ofthe tube content and the efficiency of the enzyme would obscure the results. As a result of these considera-tions, the operations
Separation ( T , X, T
2) and
Annealing ( T ) presented in § • The avoidance of the need to isolate, purify, and ligate individual fragments • The evasion of the need for specialized MCS linkers • And most importantly, the ease with which parallel processing of operations may be appliedHence, in order for the Adleman-Lipton model to be more relaible in the recursive operations our implemen-tation of Strassens algorithm requires, we replace the ligation procedure of § DNA Matrix Operations in RNS
DNA Representation of a Matrix in RNS.
We extend the DNA representation of integers in RNSpresented in [17] to representing an entire matrix Y in RNS by way of single DNA strands. Let matrix Y AST MATRIX MULTIPLICATION TECHNIQUES BASED ON THE ADLEMAN-LIPTON MODEL 5 be a t × t matrix with: Y = y y · · · y t y y · · · y t ... ... . .. ... y t y t · · · y tt . The key here is the RNS representation of each element y qr in the hypothetical matrix Y with 1 ≤ q ≤ t and 1 ≤ r ≤ t by way of DNA strands.We first utilize the improved DNA representation of n binary numbers with m binary bits as describedin [17] for the alphabet P : X = { A i , B j , C , C , E , E , D , D , , , | ≤ i ≤ M − , ≤ j ≤ m } . Here, A i indicates the address of M integers in RNS; B j denotes the binary bit position; C , C , E , E , D , and D are used in the Cleavage operation;
Separation operation; and 0 and 1 arebinary numbers. Thus, in the residue digit position, the value of the bit y qr with a bit address of i and a bitposition of j can be represented by a single DNA strand ( S i,j ) y qr : (3.1.1) ( S i,j ) qr = ( D B j E E A i C C V D ) y qr , for V ∈ { , } . Hence, the matrix Y can be represented as such: Y = ( D B j E E A i C C V D ) y ( D B j E E A i C C V D ) y · · · ( D B j E E A i C C V D ) y t ( D B j E E A i C C V D ) y ( D B j E E A i C C V D ) y · · · ( D B j E E A i C C V D ) y t ... ... ... ...( D B j E E A i C C V D ) y t ( D B j E E A i C C V D ) y t · · · ( D B j E E A i C C V D ) y tt , where each strand-element is not necessarily distinct. The reader must keep in mind that M integers inRNS defined by the n -moduli set P can be represented by 2 M ( m + 1) different memory strands, whereas inthe binary system, the respresentation of M integers requires 2 M (cid:16) P n − i =0 m i (cid:17) different memory strands.3.2. Residue Number Arithmetic with Matrices.
From (2.1.1), it is apparent that the operation ◦ iscarry-free, thereby allowing for the employment of parallel procedures in all residue digits. In [17] two prop-erties are given for the modular operation involving two integers x → ( x n − , · · · , x ) and y → ( y n − , · · · , y )in RNS defined by the set P = { m n − , m n − − , · · · , m − } . Lemma 3.2.1.
For ∀ j , m n − ∈ N , if j < m n − then | j | mn − = 2 j else | j | mn − = 0 . Lemma 3.2.2.
For l = 0 , · · · , n − , let x l + y l = z l where z l = ( z l ( m l ) , · · · , z l ) . If z l > m l − , then | z l | ml − = 1 + P m l − j =0 z lj j . Next, the procedures
RNSAdd and
RNSDiff add and subtract two integers in RNS defined by the moduliset P , respectively. The pseudocode for RNSAdd and
RNSDiff is given in § § n tubes T x qr l and T y qr l (for l = 0 , · · · , n −
1) containing the memory strands representing the elements x qr and y qr of t × t matrices X and Y , respectively. Once either operation is complete, it returns n tubes T Rsuml and T Rdiffl containing the result of residue addition or subtraction, respectively. We also use the following n temporary tubes for RNSAdd , namely, T ltemp , T lsum , and T lsum ′ . Similarly for RNSDiff , the n temporarytubes, T ltemp , T ldiff , and T ldiff ′ are used.Thus, based on Lemma 3.2.1 and Lemma 3.2.2, we introduce the following two algorithms for matrixaddition and subtraction in RNS which will be used when dealing with the block matrices in Strassen’salgorithm. For the sake of example, we are adding (and subtracting) the hypothetical t × t matrices X and Y . Essentially, the RNSMatrixAdd and
RNSMatrixDiff algorithms employ
RNSAdd and
RNSDiff in anested FOR loop.
ARAN NAYEBI
Matrix Addition.
The procedure
RNSMatrixAdd is defined as:
Algorithm 3.1:
RNSMatrixAdd ( T X , T Y ) for q ← to t do for r ← to t do n RNSAdd( T x qr n − , · · · , T x qr , T y qr n − , · · · , T y qr ); Matrix Subtraction.
The procedure
RNSMatrixDiff is defined as:
Algorithm 3.2:
RNSMatrixDiff ( T X , T Y ) for q ← to t do for r ← to t do n RNSDiff( T x qr n − , · · · , T x qr , T y qr n − , · · · , T y qr ); Strassen’s Algorithm Revisited
Bottom-Level Matrix Multiplication.
Although a vast repository of traditional matrix multipli-cation algorithms can be used between processors (or in our case, test tubes containing memory strands;however for the sake of brevity, we shall just use the term “memory strand” or “strand”), we will employthe Cannon algorithm [10] since it can be used on matrices of any dimension. We will only discuss squarestrand arrangments and square matrices for simplicity’s sake. Assume that we have p memory strands,organized in a logical sequence in a p × p mesh. For i ≥ j ≤ p −
1, the strand in the i th row and j th column has coordinates ( i, j ). The matrices X , Y , and their matrix product Q are of size t × t , and again asa simplifying assumption, let t be a multiple of p . All matrices will be partitioned into p × p blocks of s × s sub-matrices where s = t/p . As described in [3], the mesh can be percieved as an amalgamation of rings ofmemory strands in both the horizontal and vertical directions (opposite sides of the mesh are linked witha torus interconnection). A successful DNA implementation of Cannon’s algorithm requires communicationbetween the strands of each ring in the mesh where the blocks of matrix X are passed in parallel to the leftalong the horizontal rings and the blocks of the matrix Y are passed to the top along the vertical rings. Let X ij , Y ij , and Q ij denote the blocks of X , Y , and Q stored in the strand with coordinates ( i, j ). The Cannonalgorithm on a DNA computer can be described as such: Algorithm 4.1:
Cannon ( T X ij , T Y ij ) for i th column ← to i do (cid:8) LeftShift( T X ij ) for j th column ← to j do (cid:8) UpShift( T Y ij ) ∀ strands ( i, j ) do (cid:8) ValueAssignment( T X ij Y ij , T Q ij ) do ( p −
1) times
LeftShift( T X ij )UpShift( T Y ij )ValueAssignment (cid:16) T RNSMatrixAdd( T Qij ,T XijYij ) , T Q ij (cid:17) AST MATRIX MULTIPLICATION TECHNIQUES BASED ON THE ADLEMAN-LIPTON MODEL 7
Note that the procedure
UpShift can be derived from Zheng et al.’s [17]
LeftShift . Now we examine therun-time of the Cannon algorithm. The run time can be componentized into the communication time andthe computation time, and the total communication time is(4.1.1) 2 pα + 2 Bβt p , and the computation time is(4.1.2) 2 t t comp p , where t comp is the execution time for one arithmetic operation, α is the latency, β is the sequence-transferrate, the total latency is 2 pα , and the total sequence-transfer time is 2 pβB ( m/p ) with B as the number ofsequences to store one entry of the matrices. According to [3], the running time is(4.1.3) T ( t ) = 2 t t comp p + 2 pα + 2 Bβt p . Matrix Storage Pattern.
The primary difficulty is to be able to store the different sub-matrices ofthe Strassen algorithm in different strands, and these sub-matrices must be copied or moved to appropriatestrands if tasks are spawned. Hence, we present here a storage map of sub-matrices to strands based onthe result of Luo and Drake [11] for electronic computers. Essentially, if we allow each strand to have aportion of each sub-matrix at each resursion level, then we can make it possible for all strands to act as one strand. As a result, the addition and subtraction of the block matrices performed in the Strassen algorithmat all recursion levels can be performed in parallel without any inter-strand communication [3]. Each strandperforms its local sub-matrix additions and subtractions in RNS (via
RNSMatrixAdd and
RNSMatrixDiff described in § § r , and let n = t/p , t = t/ n = t /p for n, t , n ∈ N , then the run-time of the Strassen-Canon algorithm is:(4.2.1) T ( t ) = 18 T add (cid:18) t (cid:19) + 7 T (cid:18) t (cid:19) , where T add (cid:0) t (cid:1) is the run-time to add or subtract block matrices of order t/ T t ≈ ) r t t comp p + 5( ) r t comp p + (cid:18) (cid:19) r pα. Since the asymptotically significant term ) r t t comp p decreases as the recursion level r increases, then for t significantly large, the Strassen-Cannon algorithm should be faster than the Cannon algorithm. Even ifthe Cannon algorithm is replaced at the bottom level by other parallel matrix multiplication algorithms, thesame result holds.4.3. Recursion Removal.
As has been previously discussed, in order to use the Strassen algorithm betweenstrands (at the top level), we must determine the sub-matrices after r times recursive execution and thento determine the resultant matrix from these sub-matrices. Nguyen et al. [3] recently presented a methodon electronic computers to ascertain all of the nodes in the execution tree of the Strassen algorithm atthe unspecified recursion level r and to determine the relation between the sub-matrices and the resultantmatrix at level r . We extend it to the DNA computing paradigm. At each step, the algorithm will executea multiplication between 2 factors, namely the linear combinations of the elements of the matrices X and Y , respectively. Since we can consider that each factor is the sum of all elements from each matrix, with ARAN NAYEBI coefficient of 0, -1, or 1 [3], then we can represent these coefficients with the RNS representation of numberswith DNA strands described in § { D B E E A C C D , D B E E A C C D } , { D B E E A C C D , D B E E A C C D } , { D B E E A C C D , D B E E A C C D } ) , ( { D B E E A − C C D , D B E E A − C C D } , { D B E E A − C C D , D B E E A − C C D } , { D B E E A − C C D , D B E E A − C C D } ) , or ( { D B E E A C C D , D B E E A C C D } , { D B E E A C C D , D B E E A C C D } , { D B E E A C C D , D B E E A C C D } ) , respectively. For the sake of brevity, we shall denote the latter three equations as (0) RNS , ( − RNS , and(1)
RNS , respectively. This coefficient is obtained for each element in each recursive call and is dependentupon both the index of the call and the location of an element in the division of the matrix by 4 sub-matrices [3]. If we view the Strassen-Cannon algorithm’s execution as an execution tree [3], then each scalarmultiplication is correlated on a leaf of the execution tree and the path from the root to the leaf represents therecursive calls leading to the corresponding multiplication. Furthermore, at the leaf, the coefficient of eachelement (either (0)
RNS , ( − RNS , or (1)
RNS ) can be determined by the combination of all computations inthe path from the root. The reason is that since all of the computations are linear, they can be combined inthe leaf (which we will denote by t l ).Utilizing the nomenclature of [3], Strassen’s formula can be depicted as such:For l = 0 · · · t l = X i,j =0 , x ij SX ( l, i, j ) × X i,j =0 , y ij SY ( l, i, j ) , and(4.3.2) q ij = X l =0 t l SQ ( l, i, j ) , in which SX = l \ ij 00 01 10 110 (1) RNS (0)
RNS (0)
RNS (0)
RNS (0) RNS (1)
RNS (0)
RNS (0)
RNS (0) RNS (0)
RNS (1)
RNS (1)
RNS ( − RNS (0)
RNS (1)
RNS (1)
RNS (1) RNS (0)
RNS ( − RNS (0)
RNS (0) RNS (0)
RNS (1)
RNS (1)
RNS (0) RNS (0)
RNS (0)
RNS (1)
RNS SY = l \ ij 00 01 10 110 (1) RNS (0)
RNS (0)
RNS (0)
RNS (0) RNS (0)
RNS (1)
RNS (0)
RNS ( − RNS (1)
RNS (0)
RNS (0)
RNS (1) RNS ( − RNS (0)
RNS (1)
RNS (0) RNS ( − RNS (0)
RNS (1)
RNS (0) RNS (1)
RNS (0)
RNS (1)
RNS ( − RNS (1)
RNS (1)
RNS ( − RNS
AST MATRIX MULTIPLICATION TECHNIQUES BASED ON THE ADLEMAN-LIPTON MODEL 9 SQ = l \ ij 00 01 10 110 (1) RNS (1)
RNS (1)
RNS (1)
RNS (1) RNS (0)
RNS (0)
RNS (0)
RNS (0) RNS (1)
RNS (0)
RNS (0)
RNS (0) RNS (1)
RNS (1)
RNS (1)
RNS (0) RNS (0)
RNS (0)
RNS (1)
RNS (0) RNS (1)
RNS (0)
RNS (0)
RNS (0) RNS (0)
RNS (0)
RNS (1)
RNS
At recursion level r , t l can be represented as such:For l = 0 · · · k − t l = X i,j = n − x ij SX k ( l, i, j ) × X i,j =0 ,n − y ij SY k ( l, i, j ) , and(4.3.4) q ij = k − X l =0 t l SQ k ( l, i, j ) . It is easy to see that SX = SX , SY = SY , and SQ = SQ ; however, the difficulty that arises is todetermine the values of matrices SX k , SY k , and SQ k in order to have a general algorithm. The followingrelations were proved in [4], and we shall prove that these results hold with DNA:(4.3.5) SX k ( l, i, j ) = k Y r =1 SX ( l r , i r , j r ) , (4.3.6) SY k ( l, i, j ) = k Y r =1 SY ( l r , i r , j r ) , (4.3.7) SQ k ( l, i, j ) = k Y r =1 SQ ( l r , i r , j r ) . First we shall extend the definition of the tensor product for arrays of arbitrary dimensions [4] by representingthe tensor product in RNS by way of single DNA strands.
Proposition 4.3.1.
Let A and B be arrays of the same dimension l and of size m × m × · · · × m l and n × n × · · · × n l , respectively. The elements of A and B are represented using RNA by way of DNAstrands as presented in detail in § m n × m n × · · · × m l n l in which each element of A is replaced with the productof the element and B . This product can be computed with the algorithm RNSMult which is recognized bya serial of operations of the RNSAdd algorithm detailed in § [17] . P = A ⊗ B where P [ i , i , · · · , i l ] = A [ k , k , · · · , k l ] B [ h , h , · · · , h l ] . ≤ ∀ j ≤ l , i j = k j n j + h j ( k j n j and h j will be addedwith RNSAdd). If we let P = ⊗ ni =1 A i = ( · · · ( A ⊗ A ) ⊗ A ) · · · ⊗ A n ) where A i is an array of dimension l and of size m i × m i × · · · × m il , the following theorem allows us to directly compute the elements of P . All productsand sums of elements can be computed with
RNSMult and
RNSAdd , respectively.
Theorem 4.3.2.
If we let j k = P ns =1 (cid:0) h sk Q nr = s +1 m rk (cid:1) , then P [ j , j , · · · , j l ] = Q ni =1 A i [ h i , h i , · · · , h il ] .Proof. We give a proof by induction. For n = 1 and n = 2, the statement is true. Assume it is true with n ,then we shall prove that it is true with n + 1. P n +1 [ v , v , · · · , v l ] = Q n +1 i =1 A i [ h i , h i , · · · , h il ] where v k = n +1 X s =1 h sk n +1 Y r = s +1 m rk ! , for 1 ≤ ∀ k ≤ l . Hence, P n +1 = P n ⊗ A n +1 .Furthermore, by definition, P n +1 [ j , j , · · · , j l ] = P n [ p , p , · · · , p l ] A n +1 [ h ( n +1) , h n +1) , · · · , h l ( n +1) ] = n +1 Y i =1 A i [ h i , h i , · · · , h il ] , where j k = n X s =1 h sk n +1 Y r = s +1 m rk ! + h k ( n +1) = n +1 X s =1 h sk n +1 Y r = s +1 m rk ! . (cid:4) Theorem 4.3.3. SX k = ⊗ ki =1 SX , SY k = ⊗ ki =1 SY , and SQ k = ⊗ ki =1 SQ .Proof. We give a proof by induction. For k = 1, the statement is true. Assume it is true with k , then weshall prove that it is true with k + 1.According to (4.3.3) and (4.3.4), at level k + 1 of the execution tree, for 0 ≤ l ≤ k +1 − T l = X i ≥ ,j ≤ k +1 − X k +1 ,ij SX k +1 ( l, i, j ) × X i ≥ ,j ≤ k +1 − Y k +1 ,ij SY k +1 ( l, i, j ) . It follows from (4.3.1) and (4.3.2) that at level k + 2, for 0 ≤ l ≤ k +1 − ≤ l ′ ≤ T l [ l ′ ] = X i ′ ≥ ,j ′ ≤ X i ≥ ,j ≤ k +1 − X k +1 ,ij [ i ′ , j ′ ] SX k +1 ( l, i, j ) SX ( l ′ , i ′ , j ′ ) × X i ′ ≥ ,j ′ ≤ X i ≥ ,j ≤ k +1 − Y k +1 ,ij [ i ′ , j ′ ] SY k +1 ( l, i, j ) SY ( l ′ , i ′ , j ′ ) , (4.3.8)where X k +1 ,ij [ i ′ , j ′ ] and Y k +1 ,ij [ i ′ , j ′ ] are 2 k +2 × k +2 matrices obtained by partitioning the matrices X k +1 ,ij and Y k +1 ,ij into 4 sub-matrices (we use i ′ and j ′ to denote the sub-matrix’s quarter).We represent l, l ′ in base 7 RNS, and i, j, i ′ , j ′ in base 2 RNS. Since X k +1 ,ij [ i ′ , j ′ ] = X k +2 ,ij [ ii ′ , jj ′ ], thenfor 0 ≤ ll ′ (7) ≤ k +1 − M [ ll ′ (7) ] = X ii ′ (2) ≥ ,jj ′ (2) ≤ k +1 − X k +2 [ ii ′ (2) , jj ′ (2) ] SX k +1 ( l, i, j ) SX ( l ′ , i ′ , j ′ ) × X ii ′ (2) ≥ ,jj ′ (2) ≤ k +1 − Y k +2 [ ii ′ (2) , jj ′ (2) ] SY k +1 ( l, i, j ) SY ( l ′ , i ′ j ′ ) . (4.3.9) AST MATRIX MULTIPLICATION TECHNIQUES BASED ON THE ADLEMAN-LIPTON MODEL 11
Moreover, it directly follows from (4.3.3) and (4.3.4) that for 0 ≤ ll ′ (7) ≤ k +1 − M [ ll ′ (7) ] = X ii ′ (2) ≥ ,jj ′ (2) ≤ k +1 − X k +2 [ ii ′ (2) , jj ′ (2) ] SX k +2 (cid:16) ll ′ (7) , ii ′ (2) , jj ′ (2) (cid:17) × X ii ′ (2) ≥ ,jj ′ (2) ≤ k +1 − Y k +2 [ ii ′ (2) , jj ′ (2) ] SY k +2 (cid:16) ll ′ (7) , ii ′ (2) , jj ′ (2) (cid:17) . (4.3.10)From (4.3.12) and (4.3.10), we have SX k +2 (cid:16) ll ′ , ii ′ , jj ′ (cid:17) = SX k +1 ( l, i, j ) SX ( l ′ , i ′ j ′ ) , and SY k +2 (cid:16) ll ′ , ii ′ , jj ′ (cid:17) = SY k +1 ( l, i, j ) SY ( l ′ , i ′ j ′ ) . Thus, SX k +2 = SX k +1 ⊗ SX = ⊗ k +2 i =1 SX,SY k +2 = SY k +1 ⊗ SY = ⊗ k +2 i =1 SY, and SQ k +2 = SQ k +1 ⊗ SQ = ⊗ k +2 i =1 SQ. (cid:4)
From Theorem 4.3.2 and Theorem 4.3.3, (4.3.5), (4.3.6), and (4.3.7) follow.As a consequence of (4.3.3)-(4.3.7), we can form the following sub-matrices:(4.3.11) T l = X i,j =0 , r − X ij r Y u =1 SX ( l u , i u , j u ) ! × X i,j =0 , r − l =0 ··· r − Y ij r Y u =1 SX ( l u , i u , j u ) ! . As a result of the storage map of sub-matrices to strands presented in § locally determined within each strand, and their product T l can be computed by the DNA implementationof the Cannon algorithm presented in § X i =0 , r − j =0 , r − X ij r Y u =1 SX ( l u , i u , j u ) ! , and X i =0 , r − j =0 , r − Y ij r Y u =1 SY ( l u , i u , j u ) ! . All of the sub-matrices are added with the
RNSMatrixAdd algorithm presented in § RNSMatrixAdd algorithm of § Q ij = r − X l =0 T l SQ r ( l, i, j ) = r − X l =0 T l r Y u =1 SQ ( l u , i u , j u ) ! . Conclusion
Our general scalable implementation can be used for all of the matrix multiplication algorithms that usefast matrix multiplication algorithms at the top level (between strands) on a DNA computer. Moreover,since the computational complexity of these algorithms decreases when the recursion level r increases, we cannow find optimal algorithms for all particular cases. Of course, as mentioned previously in this paper, thecurrent science of DNA computing does not guarantee a perfect implementation of the Strassen algorithmas described herein; for now, these results should be regarded as primarily theoretical in nature. AST MATRIX MULTIPLICATION TECHNIQUES BASED ON THE ADLEMAN-LIPTON MODEL 13
References
1. A. Fujiwara, K. Matsumoto, and W. Chen,
Procedures for logic and arithmetic operations with DNA molecules , Int. J.Found. Comput. Sci. (2004): 461–474.2. D. Coppersmith and S. Winograd, Matrix multiplication via arithmetic progressions , J. Symb. Comp. (1990): 251–280.3. D. K. Nguyen, I. Lavall´ee, and M. Bui, A General Scalable Implementation of Fast Matrix Multiplication Algorithms onDistributed Memory Computers , Proceedings of the Sixth International Conference on Software Engineering, ArtificialIntelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-AssemblingWireless Networks, 2005: 116-122.4. D. K. Nguyen, I. Lavall´ee, and M. Bui,
A New Direction to Parallelize Winograd’s Algorithm on Distributed Memory Com-puters , Modeling, Simulation and Optimization of Complex Processes Proceedings of the Third International Conferenceon High Performance Scientific Computing, March 6–10, 2006, Hanoi, Vietnam: 445–457.5. G. Paun, G. Rozenberg, A. Salomaa,
DNA computing , Springer-Verlag, 1998.6. G. Zhang and S. Wang,
Matrix Multiplication Based on DNA Computing , ICNC (2009): 167–170.7. H. Cohn, R. Kleinberg, B. Szegedy, and C. Umans, Group-Theoretic Algorithms for Matrix Multiplication , Proceedings ofthe 46th Annual Symposium on Foundations of Computer Science, 23–25 October 2005, Pittsburgh, PA, IEEE ComputerSociety: 379–388.8. J. S. Oliver,
Matrix Multiplication with DNA , Journal of Molecular Evolution, (1997): 161–167.9. L. Adleman, Molecular Computation of Solutions to Combinatorial Problems , Science (1994): 1021–1024.10. L. E. Cannon,
A cellular computer to implement the kalman filter algorithms , Technical Report, Ph.D. Thesis, MontanaState University, (1969): 1–228.11. Q. Luo and J. B. Drake,
A scalable parallel strassen’s matrix multiplication algorithm for distributed memory computers ,Proceedings of the 1995 ACM symposium on Applied computing (1995): 221–226.12. R. Bunch and J. E. Hopcroft,
Triangular Factorization and Inversion by Fast Matrix Multiplication , Math. Comp. (1974): 231–236.13. S. Kodumal and D. Santi, DNA ligation by selection , BioTechniques (2004): 34–40.14. S. Robinson, Toward an Optimal Algorithm for Matrix Multiplication , SIAM News (2005): 1–3.15. V. Pan, How can we speed up matrix multiplcation? , SIAM Review, (1984): 393–416.16. V. Strassen, Gaussian elimination is not optimal , Numer. Math. (1969): 354–356. MR :2223.17. X. Zheng, J. Xu, and W. Li, Parallel DNA arithmetic operation based on n -moduli set , Appl. Math. Comp. (2009):177–184. E-mail address ::