Pattern Denoising in Molecular Associative Memory using Pairwise Markov Random Field Models
aa r X i v : . [ c s . ET ] J un P ATTER N D ENOISING IN M OLEC ULAR A SSOCIATIVE M EMORYUSING P AIRWISE M AR KOV R ANDOM F IELD M ODELS
Dharani Punithan [email protected] of Computer TechnologySeoul National UniversitySouth Korea
Byoung-Tak Zhang [email protected]. of Computer Science and EngineeringSeoul National UniversitySouth Korea A BSTRACT
We propose an in silico molecular associative memory model for pattern learning, storage and de-noising using Pairwise Markov Random Field (PMRF) model. Our PMRF-based molecular asso-ciative memory model extracts locally distributed features from the exposed examples, learns andstores the patterns in the molecular associative memory and denoises the given noisy patterns viaDNA computation based operations. Thus, our computational molecular model demonstrates thefunctionalities of content-addressability of human memory. Our molecular simulation results showthat the averaged mean squared error between the learned and denoised patterns are low ( < . )up to % of noise. Memory is a crucial part of the learning process in both animals and humans. It is the mental process of encoding,storing and retrieving. Among the different types of memories, the most useful one is associative memory (AM). AMstores data in a distributed fashion and is addressed through its contents. Hence, AM is also known as a content-addressable memory (CAM) [1]. AM works by learning patterns and retrieving or reconstructing a previously learnedpattern that most closely resembles the noisy patterns. Thus, it has applications in pattern matching, pattern recogni-tion, robotics, etc. This type of memory is robust and fault-tolerant as it exhibits error-correction ability.DNA works as a “memory” to store genetic information in the cellular organism. The striking features of DNA [2–4]such as self-assembly, huge information storage capacity and massive parallelism are similar to the brain [5–7]. Hence,associative memory can be realized on a molecular level, which can be vaster than human brain [2, 8]. Some recentstudies [9, 10] show that molecular systems can exhibit brain-like cognition. In our in silico molecular simulation, wedemonstrate the potential of molecular associative memory using the popular image processing tool PMRF. To ourknowledge, our work is the first to denoise patterns using molecular algorithms.This work is based on [11] and also an extension of our previous works [12, 13]. In our previous works, we usedmutation in the learning phase, which we avoid in this work so as to match the conventional DNA computing basedbio-algorithms. In addition to the recall functionality as in our previous works, here we propose molecular methodsto denoise the noisy patterns iteratively using the PMRF model. To summarize, the tasks of our proposed molecularassociative memory model are 1) to learn and store a set of patterns (digits from to ) when exposed to MNIST [14]training examples and 2) to denoise the noisy patterns iteratively. We combine DNA-based bio-molecular operationssuch as hybridization, melting, and amplification with PMRF model to demonstrate these functionalities. We usePMRF formulations, but the involved computations are based on hybridization reactions. We mainly take advantageof the hybridization operations to implement the proposed molecular content-addressable memory.1 ATTERN D ENOISING IN M OLECULAR A SSOCIATIVE M EMORY
Consider an undirected graph G = ( V , E ) on a two-dimensional lattice L , where nodes ( V ) represent random variables { X ij } and edges ( E ) represent (conditional dependencies) association between two nodes. Such a graph is called aMarkov Random Field (MRF) [15–17], X , if it holds the Markovian property p ( x ij | x kl , ( k, l ) ∈ L, ( k, l ) = ( i, j )) = p ( x ij | x kl , ( k, l ) ∈ N ij ); ∀ ( i, j ) ∈ L , where i and k are row indices, j and l are column indices, x ij and x kl are therealizations of the random variables associated with the specified lattice points and N ij is the neighborhood of ( i, j ) .The local conditional probability for each node can be defined using the clique potentials. A clique ( c ) is defined tobe either a single node or a collection of nodes in which every node is a neighbor of every other node. Each clique isspecified a potential V c ( . ) . The sum of all clique potentials for a realization x is called the energy function [15] : U ( x ) = X ij ∈ L X c ∈C ij V c ( x ) (1)where C ij is the set of all cliques associated with node x ij and V c ( x ) is the clique potential associated with a clique c .The number of nodes in a clique is called the order of the clique. Potentials of order one and two are called unaryand pairwise respectively. A Pairwise Markov Random Field (PMRF) [15–17] over the graph is associated with a setof unary (node) potentials and a set of pairwise (edge) clique potentials; which implies the order of clique size is (atmost) two. For PMRF, the energy is defined as [15] : U ( x ) = X ij ∈ L " V ( x ij ) + X kl ∈N ij V ( x ij , x kl ) (2)where V ( x ij ) and V ( x ij , x kl ) are the unary and the pairwise clique potentials respectively. The local conditionalprobabilities for PMRF are defined as in the equation 3 [15]. p ( x ij | x kl , kl ∈ N ij ) = exp h V ( x ij ) + P kl ∈N ij V ( x ij , x kl ) iP x ij exp h V ( x ij ) + P kl ∈N ij V ( x ij , x kl ) i (3)PMRFs are attractive because of their simplicity. These graphical models are popular in the field of statistical physicsand have applications in computer vision, computational biology, information extraction, etc. DNA consists of four different bases A (Adenine), T (Thymine), C (Cytosine) and G (Guanine). These bases areconnected together to form a single-stranded DNA sequence. Two single strands bind to form a double-stranded DNAhelix by Watson-Crick complementary rule [18] whereby adenine bonds with thymine (A-T) and vice versa (T-A), cy-tosine bonds with guanine (C-G) and vice versa (G-C). This base-pairing of complementary single-stranded moleculesto form a double-stranded DNA is called hybridization (or annealing). The reverse process, a double-stranded helixyielding its two constituent single-strands, is called melting (or denaturation). The process of multiplying the copiesof DNA strands is called amplification . Molecular memory is modeled as a set of m two-dimensional weighted graphs M = { G m = ( V m , C m , W m ) } , eachof size N × N , where m represents the number of binary patterns to be learned (digits from to ), V m is a set ofall nodes representing pixels { x mij } of the m th pattern, i and j represent row and column indices of the pixel location, C m is the set of all unary (first-order) and pairwise (second-order) cliques in the second-order (8-point) neighborhoodsystem of the m th pattern and W m represents the weights of the nodes of the m th pattern. We set N = 28 ; as eachMNIST example is of size × . 2 ATTERN D ENOISING IN M OLECULAR A SSOCIATIVE M EMORY
In our previous works [12, 13], all pixels of all patterns in the memory were initially black. On training, we extractedthe information from the training examples and the molecular memories were mutated with respect to the foregroundpixels of the training images. In this work, we avoid mutation to match the DNA computing based bio-algorithms.Hence, we create all possible unary and pairwise cliques in the initial memory. For each pixel location (row and columnindices), we create both black (background pixel) and white (foreground pixel) DNA molecules, as we learn binarypatterns. Then, for each pixel location and pixel color, we create all possible unary and pairwise cliques. We construct m such bags of DNA single-strands. Each single-strand represents either a unary (pixel) or a pairwise clique. We formthe molecules from the four-letter DNA alphabet A, T, G, C . For example, a pixel (node) information – location (rowand column indices) and color (black or white) – is encoded into a DNA sequence as ‘GTGGTTA’; ‘GTG’ (first threebases) represent row index ( i ) of a pixel, ‘GTT’ (next three bases) represent column index ( j ) of that pixel and ‘A’ (lastbase) represents the color of that binary pixel. We combine two such sequences to form the pairwise cliques.We then re-encode the character-based DNA sequence into a × n matrix, where n is the number of bases of theDNA sequence. Each DNA base is re-encoded into a vector : A as [1 , T , T as [ − , T , G as [0 , T , and C as [0 , − T . This initial molecular memory is trained on the MNIST examples to memorize patterns (digits from to ).During learning, the weights of the unary single-strands of the memory are updated using the conditional probabilities(refer algorithm 1). After memorization, the model recalls a stored pattern, which has the maximum weighted scoreof the DNA molecules, to the given noisy pattern. We then denoise the given noisy pattern iteratively by computingenergies (refer algorithm 2). We use PMRF model for computing conditional probabilities, weighted scores andenergies. These formulations depend only on clique potentials. We can define our own clique potentials according toour problem, as long as they emphasize some specific features [16]. In our modeling, we define clique potentials interms of hybridization reactions. On a complete hybridization, we assign to the respective clique potential; or else . Molecular Learning and Storage of PatternsInput: Initial molecular memory and MNIST training examples.Output: A set of learned molecular patterns from to ( M ). • Loop over MNIST training examplesRead the grayscale MNIST image and get the labelBinarize the MNIST image and remove noiseEncode each pixel information of the image into character-based DNA moleculesForm single-strands of unary and pairwise cliques of DNA moleculesRe-encode character-based DNA molecules into vector-based numerical DNA molecules
Hybridization:
Bind single-strands of MNIST image with single-strands of m th memory pattern match-ing MNIST label if (hybridization reactions are complete) then the corresponding memory clique potentials ( V ( x mij ) and V ( x mij , x mkl ) ) are set to else clique potentials are set to end ifMelting: Separate single-strands of MNIST image and m th memory patternCompute local conditional probabilities p ( x mij = 1 | x m N ij ) with clique potentials Amplification:
Update the weights ( w mij ) of single-strands of the memory usingconditional probabilities • Set weights of the memory single-strands lesser than . to • Normalize the weights of the memory single-strandsEach MNIST ( G m t ) image is mapped to a realization of a PMRF such that nodes represent pixels ( x m t ij ) of the image.We binarize each grayscale MNIST training image and remove noise if present. Each pixel in the image comprisesthe location (row and column) and the color (black or white) information. Each pixel information is encoded intocharacter-based DNA molecules. The DNA molecules, representing the pixel locations, are complementary to memorystrands. We form unary and pairwise single-strands in the second-order neighborhood system. We re-encode eachcharacter-based DNA molecules into respective vector-based numerical DNA molecules, as mentioned before. The3 ATTERN D ENOISING IN M OLECULAR A SSOCIATIVE M EMORY single-strands of the training image are hybridized with the single-strands of m th memory pattern corresponding tothe label of that training image. The addition of two single-strands (one from memory and another from the trainingexample) yielding a zero matrix, indicates a complete hybridization. On complete hybridizations, the clique potentials( V ( x mij ) and V ( x mij , x mkl ) ) are set to one; otherwise zero. We separate the set of DNA single-strands, representingthe training example, exposed in that iteration; this is known as melting operation. Then, we compute the conditionalprobabilities of the foreground (white) pixels ( x mij = 1 ) given the neighborhood ( x m N ij ) at the pixel location ( i, j ) of the m th memory pattern based on equation 3. For the binary random variables, the conditional probabilities are computedas in equation 4 [17]. p ( x mij = 1 | x m N ij ) = exp h V ( x mij ) + P kl ∈N ij V ( x mij , x mkl ) i exp h V ( x mij ) + P kl ∈N ij V ( x mij , x mkl ) i (4)We use the computed conditional probabilities (refer equation 4) to update the weights ( w mij ) of the foreground pixelsof the m th memory pattern (refer equation 5). w mij ( new ) = w mij ( old ) + η ∗ p ( x mij = 1 | x m N ij ) (5)where η = 1 / (1 + exp ( − γ ∗ ( iterN um − stepSize ))) is sigmoid decay learning rate, γ = 0 . is decay rate, iterNumis the current training iteration number and stepSize = 100 . This weight update step is referred to as amplification.Learning strengthens the weight of respective foreground pixels of the memory. After training, we set the weights ofthe foreground pixel, having smaller weights ( < . ), to zero. Then, we normalize the weights so that the sum ofall the weights of the foreground pixels is one. This whole learning procedure is given in the algorithm 1. Our model finds the closest memory pattern to a given noisy pattern ( G m ′ ) by applying hybridization operations andcomputing the weighted average of the local clique potentials. Each pixel ( x m ′ ij ) information in the noisy pattern isencoded into character-based DNA molecules. The encoded location information is complementary to the memoryDNA molecules representing the location. The unary and pairwise cliques of DNA sequences at each pixel locationin second-order neighborhood system of the noisy pattern are formed. We then re-encode each character-based DNAmolecule into numerical vector-based DNA molecules. The single-strands of the noisy pattern are hybridized withsingle-strands of the memory. The clique potentials ( V ( x mij ) and V ( x mij , x mkl ) ) are set to one on complete hybridiza-tions; otherwise to zero. The weighted score (refer equation 6) is computed for each of the memory pattern (digitsfrom to ) and the softmax (refer equation 7) of scores is computed to retrieve the closest memory pattern. Allsingle-strands of the noisy pattern are separated after each weighted score computation. score m = X x mij =1 w mij ∗ " V ( x mij ) + P kl ∈N ij V ( x mij , x mkl ) |C ij | ; m = 0 , ..., . (6)where |C ij | is the cardinality of the clique set. σ ( score m ) = exp( score m ) P l =0 exp( score l ) ; m = 0 , ..., . (7)The next step is to denoise the noisy pattern. We hybridize the single-strands of the best-matched memory pattern withthe single-strands of the noisy pattern. On complete hybridizations, the clique potentials ( V ( x m ′ ij ) and V ( x m ′ ij , x m ′ kl ) )are set to one; otherwise zero. We then compute the energy as defined in the equation 8. U ( x m ′ ) = X ij " V ( x m ′ ij ) + X kl ∈N ij V ( x m ′ ij , x m ′ kl ) (8)We randomly pick a pixel location and if the color of the pixel is black (white), we add the new character-based DNAmolecules with pixel color white (black) for that location. We form possible pairwise cliques. We then re-encode them4 ATTERN D ENOISING IN M OLECULAR A SSOCIATIVE M EMORY
Algorithm 2
Molecular Denoising of Noisy PatternsInput: Noisy stored pattern and a set of memory patterns ( M ) (from to ).Output: Denoised pattern. • Read the given noisy pattern • Encode each pixel into character-based DNA molecules • Form single-strands of unary and pairwise cliques of DNA molecules • Re-encode each character-based DNA molecule into numerical DNA molecules • Get best-matched pattern from memory for m = 0 to doHybridization: Bind single-strands of noisy image and m th memory patternSet memory potentials ( V ( x mij ) and V ( x mij , x mkl ) ) to on complete hybridizationsCompute weighted score ( score m ) for the m th memory pattern Melting:
Separate the single-strands of noisy image and m th memory pattern m := m + 1 end for Compute softmax σ ( score m ) of the scores and get the label of the best-matched memory pattern • if (label of best-matched memory pattern == label of noisy pattern) thenHybridization: Bind single-strands of noisy image and m th memory patternSet noisy potentials ( V ( x m ′ ij ) and V ( x m ′ ij , x m ′ kl ) ) to on complete hybridizationsoldEnergy := Compute the energy of the noisy pattern based on hybridization reactionsLoop over the pixels of the noisy patternPick a random pixel from the noisy pattern if (the pixel color is black) then Add new character-based DNA molecules for that location with pixel color white else if (the pixel color is white) then
Add new character-based DNA molecules for that location with pixel color black end if
Add all possible unary and pairwise cliques correspondinglyRe-encode the new character-based DNA molecules to numerical DNA molecules
Hybridization:
Bind the new noisy single-strands with memory single-strandsSet noisy potentials ( V ( x m ′ ij ) and V ( x m ′ ij , x m ′ kl ) ) to on complete hybridizationsnewEnergy := Compute the energy with the changed pixel color if ( newEnergy < oldEnergy ) then oldEnergy := newEnergy Melting : Separate old single-strands elseMelting : Separate newly added single-strands end ifend if into numerical DNA molecules. We hybridize the newly added single-strands with the memory single-strands and wecompute the energy again (refer equation 8). The DNA molecules corresponding to the higher energy are separated bymelting. We repeat this process for all the pixels in the noisy pattern.
In this section, we present the results of tasks of our molecular content-addressable memory model : 1) learning ofpatterns from exposed examples and storage in memory and 2) denoising of given noisy patterns.
Figure 1 shows the learned and stored patterns in our molecular associative memory. We use MNIST training datasetto train the model. We use , training examples; for each digit (from to ). We extract the features (unaryand pairwise cliques) from these training examples, encode them to DNA molecules and learn the exposed examples5 ATTERN D ENOISING IN M OLECULAR A SSOCIATIVE M EMORY
Figure 1: Patterns Learned and Stored in Molecular Associative Memoryvia DNA computing based bio-operations. On training, the weights of the foreground pixels of the memory patternsare strengthened by computing PMRF based conditional probabilities. In our modeling, local conditional probabilitiesare computed in terms of hybridization reactions. (a) Noise = 10% (b) Noise = 20% (c) Noise = 30% (d) Noise = 40% (e) Noise = 50%
Figure 2: Noisy Versions of pattern at Different Random Noise Percentages ( % )We create an artificial dataset by adding random noise to the learned patterns at different noise percentages (from to ). We randomly change the pixel color (from black to white and vice versa) of the patterns to the given noisepercentage. We then encode each pixel to DNA molecules for further processing. Adding of noise makes randompatterns. Hence, we examine our model up to of noise. For each pattern and for each noise percentage, we create noisy patterns and hence , in total. The noisy samples of pattern at different noise percentages are shownin figure 2. Recalled ?yes no Acc. (%) P a tt e r n s Ave. Acc. (%) : . Table 1: Recall Accuracy at
NoiseOur model recalls the stored pattern given the noisy pattern by computing the weighted average of local clique poten-tials; which are defined using the hybridization reactions. The average recall accuracies for noisy patterns at different6
ATTERN D ENOISING IN M OLECULAR A SSOCIATIVE M EMORY
Noise Percentage (%) R e c a ll A cc u r a cy ( % ) . . . . . . M SE Recall Accuracy (%)MSE
Figure 3: Recall Accuracy (%) and MSE Vs. Noise Percentage (%)Figure 4: Denoising of Patterns (from to ) with Noiseat epochs , , , , noise percentages are depicted in figure 3. The average recall accuracies are high ( > . ) up to of noise. Theaccuracies drop owing to heavy randomness over of noise. The average recall accuracies at of noise for eachof the patterns are shown in table 1. 7 ATTERN D ENOISING IN M OLECULAR A SSOCIATIVE M EMORY
Figure 5: Denoising of Patterns (from to ) with Noiseat epochs , , , , epoc M SE Pattern 0Pattern 1Pattern 2Pattern 3Pattern 4Pattern 5Pattern 6Pattern 7Pattern 8Pattern 9
Figure 6: MSEs for Typical Samples (from to ) of Noise during Denoising EpocsOn a successful recall, we denoise the noisy pattern iteratively involving hybridization reactions. We present bothqualitative (refer figures 4 and 5) and quantitative results of denoising (refer figure 3). The denoising of typicalpatterns (digits from to ) with of noise at epochs , , , , are shown in figures 4 and 5. Wecompute Mean Squared Error (MSE) at each epoch of denoising of typical patterns and are shown in figure 6. We8 ATTERN D ENOISING IN M OLECULAR A SSOCIATIVE M EMORY notice linear reconstruction of all the patterns with our proposed denoising molecular algorithm. The averaged MSEsof all the patterns at different noise percentages are shown in figure 3. MSEs are low ( < . ) up to of noise. We demonstrate that associative memory can be realized on a molecular level, involving only local features usingPairwise Markov Random Field (PMRF) models. We apply DNA based bio-molecular operations with PMRF modelsfor extracting, storing, learning, recalling and denoising of information. The results show that our proposed molecularsimulation of associative memory denoises information with low MSE ( < . ) up to of noise. Our molecularcomputation model, like the human brain, is able to recall and reconstruct (denoise) the patterns when noisy patternsare provided. This work was partly supported by Samsung Research Funding Center of Samsung Electronics (SRFC-IT1401- 12),the Institute for Information & Communications Technology Promotion (2015-0-00310-SW.StarLab, 2017-0-01772-VTT, 2018-0-00622-RMI, 2019-0-01367-BabyMind) and Korea Institute for Advancement Technology (P0006720-GENKO) grant funded by the Korea government. ICT at Seoul National University provided research facilities for thestudy.
References [1] T. Kohonen.
Self-organization and associative memory . Springer-Verlag New York, Inc., 3rd edition, 1989.[2] E. B. Baum. Building an associative memory vastly larger than the brain.
Science , 268(5210):583–585, 1995.[3] L. M. Adleman. Molecular computation of solutions to combinatorial problems.
Science , 266(5187):1021–1024,1994.[4] Paul W. K Rothemund, Nick Papadakis, and Erik Winfree. Algorithmic self-assembly of dna sierpinski triangles.
PLoS biology , 2(12):e424, 2004.[5] D. B. Chklovskii, B. W. Mel, and K. Svoboda. Cortical rewiring and information storage.
Nature ,431(7010):782–788, 2004.[6] Eran Zaidel and Marco Iacoboni.
The parallel brain: the cognitive neuroscience of the corpus callosum . MITpress, 2003.[7] George M. Whitesides and Bartosz Grzybowski. Self-assembly at all scales.
Science , 295(5564):2418–2421,2002.[8] John H. Reif, Thomas H. LaBean, Michael Pirrung, Vipul S. Rana, Bo Guo, Carl Kingsford, and Gene S. Wick-ham. Experimental construction of very large scale dna databases with associative search capability. In
Interna-tional Workshop on DNA-Based Computers , pages 231–247. Springer, 2001.[9] Lulu Qian, Erik Winfree, and Jehoshua Bruck. Neural network computation with dna strand displacement cas-cades.
Nature , 475(7356):368–372, 2011.[10] Kevin MacVittie, Jan Hal´amek, Vladimir Privman, and Evgeny Katz. A bioinspired associative memory systembased on enzymatic cascades.
Chemical Communications , 49(62):6962–6964, 2013.[11] Byoung-Tak Zhang. Hypernetworks: A molecular evolutionary architecture for cognitive learning and memory.
IEEE Computational Intelligence Magazine , 3(3):49–63, 2008.[12] Dharani Punithan and Byoung-Tak Zhang. Molecular associative memory with spatial auto-logistic model forpattern recall.
Procedia computer science , 123:373–379, 2018. (Proceedings of BICA 2017).[13] Dharani Punithan and Byoung-Tak Zhang. Ising model based molecular associative memory for pattern recall.In
Proceedings of Korea Computer Congress , pages 941–942. Korean Institute of Information Scientists andEngineers (KIISE), 2017.[14] Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist .[15] Stan Z. Li.
Markov random field modeling in image analysis . Springer Publishing Company, Inc., 3rd edition,2009.[16] Chee Sun Won and Robert M Gray.
Stochastic image processing . Springer Science & Business Media, 2013.[17] Martin Beckerman.
Adaptive cooperative systems . John Wiley & Sons, Inc., 1997.[18] J. D. Watson and F. H. C. Crick. Molecular structure of nucleic acids: A structure for deoxyribose nucleic acids.