Floating-Point Multiplication Using Neuromorphic Computing
aa r X i v : . [ c s . ET ] A ug Floating-Point MultiplicationUsing Neuromorphic Computing
Karn Dubey Urja Kothari Shrisha Rao
Abstract
Neuromorphic computing describes the use of VLSI systems tomimic neuro-biological architectures and is also looked at as a promis-ing alternative to the traditional von Neumann architecture. Any newcomputing architecture would need a system that can perform floating-point arithmetic. In this paper, we describe a neuromorphic systemthat performs IEEE 754-compliant floating-point multiplication. Thecomplex process of multiplication is divided into smaller sub-tasks per-formed by components Exponent Adder, Bias Subtractor, MantissaMultiplier and Sign OF/UF. We study the effect of the number of neu-rons per bit on accuracy and bit error rate, and estimate the optimalnumber of neurons needed for each component.
Keywords:
IEEE 754, floating point arithmetic, neuromorphic computing,Neural Engineering Framework (NEF)
Neuromorphic computing has recently become prominent as a possible fu-ture alternative to the traditional Von Neumann architecture (Zargham,1996) of computing. Some of the problems that are commonly faced whenworking with classical CMOS-based Von Neumann machines are the limita-tions on their energy efficiencies, and also the absolute limits to speed andscaling on account of physical limits (Mead, 1990; Koch and Segev, 2003).Though Moore’s Law held for long and made possible rapid and sustainedprogress in hardware performance (Moore, 1965), it is now quite clear thatthis will not last. Hence, there is a need to look for alternative comput-ing architectures, including neuromorphic computing (aand Youjie Li et al.,2017; Kim et al., 2015; Esser et al., 2016). The Von Neumann architecturealso has an inherent problem, commonly called the “Von Neumann bottle-neck,” because of the limited bandwidth between the CPU and the main1evice memory. Thus, newer architectures often avoid a wide gap betweenprocessing and main memory (Monroe, 2014; Moore, 1965).Rapid growth in cognitive applications is one of the important motiva-tions for interest in neuromorphic computing, which promises the abilityto perform a high number of complex functions through parallel operation.Neural solutions are possible for machine learning problems that involvecomplex mathematical calculations (Eliasmith, 2013; Pastur-Romay et al.,2017). There have been some attempts to develop systems of computationon neuromorphic architectures (Koch and Segev, 2003; Gosmann and Elia-smith, 2016) but not much has been done in the specific area of numericalcomputations, particularly for floating-point arithmetic.Floating-point arithmetic (IEEE, 2019) is ubiquitous in scientific as wellas general computing. It is a basic operation that should be supportedby any computational architecture. In this paper, we describe a systemwhich can perform the multiplication of two IEEE 754-compliant floating-point numbers on a neuromorphic architecture. Our work is an extensionto George et al. (2019) who showed how floating point addition can beachieved using neuromorphic computing. We have designed a modular archi-tecture which performs the conventional multiplication process (Erle et al.,2009), but instead of logic gates it uses groups of neurons as the basic unit.The architecture is easily scalable to double-precision floating point num-bers.The system is designed on the basis of the Neural Engineering Framework(NEF) which, as the name suggests, provides a basic framework to developa neuromorphic system. For the implementation, simulation and testing ofour design we used Nengo (Nengo, c; Bekolay et al., 2014), a graphical andscripting-based software package for simulating large-scale neural systems.To use Nengo, we define groups of neurons called ensembles , and then formconnections between them based on what computation (Nengo, a,b) shouldbe performed.The architecture is divided into four components: Exponent Adder, BiasSubtractor, Mantissa Multiplier, and Sign/Overflow and Underflow. TheExponent Adder uses a stage-wise adder which takes 8-bit exponents andproduces an 8-bit output along with carry. The Bias Subtractor takes theoutput of the Exponent Adder and subtracts the bias and produces 8-bitoutput. The subtraction is done using 2’s complement method. The Man-tissa Multiplier is the core of our system design; it follows a stage-wiseprocess, taking two 23-bit mantissa inputs, and outputs a 23-bit resultantmantissa (see Section 3.3). Our system also indicates if there is an overflowor underflow during the exponent addition process (see Section 3.5).2igure 1: IEEE 754 32-bit floating-point representationWe used two performance analysis metrics: Mean Absolute Error (MAE)and Mean Encoded Error (MEE) to estimate the performance of our system.We have also observed the effect on accuracy by varying number of neuronsof each component in our system.The rest of the paper is structured as follows. We first give a brief de-scription of the IEEE 754 floating-point multiplication process in Section 2.1,and then briefly describe the Neural Engineering Framework (NEF) and itsthree basic principles: representation, transformation and dynamics, in Sec-tion 2.2. After this we explain the overall architecture in Section 3 usingFigure 3. The performance analysis metrics in Section 4 deal with the twometrics that we have used to evaluate our system: the Mean Absolute Error(MAE) and Mean Encoded Error (MEE). In Section 4.1 we describe the re-lationship between the number of neurons and accuracy, and in Section 4.2we describe the relationship between the number of neurons and bit error.In Section 4.3 we describe how we estimated the optimal number of neuronsrequired for all the ensembles, and list them in Table 1. Finally, we presentthe conclusions of our work in Section 5.
First we briefly discuss the floating-point multiplication process as per theIEEE 754 standard (Erle et al., 2009), then we describe the Neural Engineer-ing Framework (NEF) which we have used to design, simulate and evaluateour system (Stewart, 2012).
Figure 2 illustrates the overall process of multiplication of two floating-pointnumbers
Input and Input represented in binary format. Figure 1 is an ex-ample of how a 32-bit floating-point number is represented according to theIEEE 754 standard (IEEE, 2019). A sign bit is used to represent whetherthe number is positive or negative. 8 and 23 bits are used to represent3igure 2: Process for multiplication of floating point numbersthe exponent and mantissa values respectively. While designing this sys-tem we assumed that both inputs, i.e., the two floating-point numbers, arerepresented according to the IEEE 754 standard in binary representation.In Figure 2, The exponents E and E are added. The Bias value (127)is subtracted from the sum of E and E . The difference is placed in theExponent field (see Figure 1). Each mantissa is of 24 bits (23 bits + 1 hiddenbit). Mantissa M and M are multiplied and give a 48 bit output; if the48 th bit is 1 then the result is normalized by right shifting and incrementingthe resultant exponent (if it is 0, then nothing further is to be done). Tofind the resultant mantissa, we take the first 24 bits (23 bits + 1 hiddenbit). The resultant sign field is the XOR of the two sign bits S and S .For a better understanding of the above algorithm, see Yi and Ding(2009). The Neural Engineering Framework (NEF) (Stewart, 2012; Voelker and Elia-smith, 2017; Voelker et al., 2017) is a computational framework which is usedfor mapping computations to the biological network of spiking neurons. Itprovides a general way to generate circuits that have analytically determinedsynaptic weights to provide the desired functionality. NEF consists of threeprinciples: representation, transformation, and dynamics (Nengo, c; Elia-smith and Anderson, 2002). Using these principles we can implement NEFfor constructing complex neural models.4 .2.1 Representation
Neural representations are defined by the combination of nonlinear encod-ing and weighted linear decoding. (We use the notation given by Stewart(2012).) If x is the value represented by a neural ensemble and e i is theencoding vector for which that neuron fires most strongly, then activity a i for each neuron can be represented as follows: a i = G i [ α i e i · x + b i ] , i = 1 . . . n (1)where G is neural non-linearity, α i is the gain parameter, and b i is theconstant background bias current for the neuron. Given an activity, esti-mating the value of x can be done by finding a linear decoder d i .ˆ x = X a i d i (2)Decoding weights d i can be seen as a least-squares minimization prob-lem, as d i is set of weights that minimizes the difference between x and itsestimate (Stewart, 2012). d = Γ − Υ (3)Γ ij = Σ x a i a j (4)Υ j = Σ x a j x (5) Section 2.2.1 shows how to encode and decode a vector in the distributedactivity of a population of neurons. To perform computation, these neuronsneed to be connected and information needs to be transferred from one groupof neurons to another. This is done via synaptic connections. In other words,we want our connections to compute some functions. Transformation isused for approximation of these functions (Stewart, 2012). Transformationis another weighted linear decoding for approximating function f ( x ); thedecoded weights d f ( x ) can be computed as: d f ( x ) = Γ − γ f ( x ) (6)Γ ij = Σ x a i a j (7)Υ jf ( x ) = Σ x a j f ( x ) (8)In general, the more non-linear and discontinuous function is, the loweris the accuracy of its computation. Accuracy also depends on other factors5ike neuron properties, number of neurons, and the encoding method. TheNEF is using the same trick seen in support vector machines (Cristianiniand Shawe-Taylor, 2000) to allow complex functions to be computed in asingle set of connections as we choose e i , α i and b i . The function f ( x ) isconstructed by a linear sum of tuning curves of neurons, so a wider varietyof tuning curves leads to better function approximation (Stewart, 2012). Dynamics of the neural systems can also be modeled in NEF using control-theoretic state variables. However, NEF also provides a direct method forcomputing dynamic functions of the form: dxdt = F ( x ) + H ( u ) (9)where x is the value getting represented, u is some input, and F and G aresome arbitrary functions. We have designed a system that performs floating-point multiplication ac-cording to the IEEE standard (IEEE, 2019). Figure 3 illustrates the systemarchitecture. The two inputs are represented as ( S , M , E ) and ( S , M , E )and the output is represented as ( S out , M out , E out ). Here S i represents thesign bit, M i represents the mantissa bit, and E i represents the exponent bit,where i ∈ { , , . . . , out } . This representation follows the IEEE-754 32-bitfloating point standard (IEEE, 2019). Each of the components is describedin the following subsections. For simulation we use the Leaky Integrate-and-Fire (LIF) neural model.We create the neural ensembles using the Nengo library to represent inputinformation. The values of two properties, radius and dimension of theensemble are set in the same way as George et al. (2019). We have also usedthe same encoding scheme as George et al. (2019) to transfer the outputof one ensemble as an input to another ensemble. For the AND ensemble(Section 3.3) we have used the following encoding scheme: E (ˆ x i ) = ( , ˆ x i ≥ . , otherwise (10)6igure 3: Architecture diagram for single precision IEEE floating point num-ber multiplication As shown in Figure 3, the Exponent Adder takes three inputs: E , E and anormalization bit produced by the Mantissa Multiplier (see Section 3.3). Itperforms addition of 8-bit E , E and Normalization bit (as C in ) produces an8-bit output E ′ and a carry bit C out . To implement this stage-wise additionprocess, we construct a network that takes two inputs (the correspondingbits of two exponents, i.e., a i and b i , where 0 ≤ i ≤
7, and represent themusing two different ensembles, say A ensemble and B ensemble. These twoensembles are then connected to another ensemble, say C ensemble, throughsynaptic connections. Now the sum of A ensemble and B ensemble is rep-resented by C ensemble. The adder is implemented in same way as in priorliterature (George et al., 2019; Nengo, a). The C out bit produced by theExponent Adder is used in the calculation of overflow and underflow (seeSection 3.5). The Mantissa Multiplier component is the core of our system. It is a stage-wise process. Figure 5 shows its working. We use an AND ensemble andadders as building blocks for multiplication (see Figure 4). The AND En-semble is used to implement neuromorphic AND logic. The encoding schemefor it is given in (10). In the AND ensemble we connect two inputs. If bothinputs are 1 then the output is more than 1.5, so the output is set to 1;7igure 4: Building block of Mantissa Multiplier component consisting ofAND Ensemble and AdderFigure 5: Process for multiplication of floating point numbers8therwise it is 0. The working and connection of each block at every stageis described below in detail by taking two mantissa A and B : • Each block j of stage i is given four inputs A i , B j , sum s in producedby block ( j + 1) of stage ( i −
1) and carry c in from block ( j −
1) ofstage i , where 0 ≤ i, j ≤ • As shown in Figure 5, the last block of each stage i takes c out of theprevious stage’s last block as s in . • The AND Ensemble of each block of every stage performs AND oper-ation on A i and B j and outputs A i B j . • The adder of blocks performs 3-bit addition of A i B j , s in and c in andproduces s out and c out (George et al., 2019; Nengo, a) • s out and c out produced as outputs are fed as input to the next stageand next block respectively.The first block of every stage is given c in as 0. The output obtained ateach stage ensemble is encoded and fed to the next stage ensemble as input.Encoding of the output at each stage helps to filter and boost up the outputsignal. At each stage the first block’s s out represents the output bit of themantissa as shown in Figure 5. At the end of this process we get a 48-bitproduct. If the 48 th bit is 1, then we set the normalization bit, right shiftthe product by one, which thereby results in incrementing the exponent byone (see Section 3.2). The resultant product is in the 1.M form as per IEEEstandard. We take the first 23 bits from M and stores it as a resultantmantissa M out . As shown in Figure 3, this component subtracts the bias from the resultwhich we get from exponent addition. The subtraction is done using the2’s complement method (Lilja and Sapatnekar, 2005). This is achieved bytaking the 2’s complement of the bias and then performing addition. Toperform 2’s complement, we design a converter, which takes 8-bit bias andrepresents it using a neural ensemble. We take a 1’s complement of bias byflipping its bits, and then take the 8-bit adder and add 1 to 1’s complementof bias. The final output is stored as a resultant exponent E out.9igure 6: Accuracy vs. Number of neurons/Ensemble Graph of MantissaMultiplier out and OF/UF This component computes S out bit of the output along with OF/UF (over-flow/underflow) flag which can then be used for rounding. It computesoutput sign bit S out by performing a neuromorphic XOR operation on twosign bits S and S (George et al., 2019). Overflow is indicated by settingthe OF/UF flag as 1 if a carry is found during exponent addition. We simulated the individual components of the system and integrated themto arrive at fully functional IEEE floating point multiplication. We probedthe outputs of each component at a time interval of 10ms and computederrors in each of them. We used the following two techniques for evaluatingthe performance of each component.Mean Absolute Error = P | Computed val − Actual val | number of valuesAccuracy = (1 − Mean Absolute Error) × P | Actual bit ⊕ Encoded val | number of bitsWe encoded the output value of each component and compare it withactual bit value. In other words we calculated hamming distance betweenthe encoded bit value and actual bit value then averaged it over all the bits. Figure 6 illustrates the accuracy of the Mantissa Multiplier. (For the BiasSubtractor and Exponent Adder we get very similar graphs.)We varied the number of neurons starting from 100 to maximum of 800per bit, and observed the accuracy across all components. We observed thatthe accuracy initially increases with the number of neurons but after somethreshold value of neurons, increase in accuracy is not significant. In theMantissa Multiplier component we can see that accuracy increases rapidlyuntil the number of neurons reach 300; after that there is no significantimprovement.
For each Mantissa Multiplier component we observed that bit error is highwhen the number of neurons is very low. In the Mantissa Multiplier, whenthe number of neurons are below 200, we got 1 bit error out of 48 bits whichis roughly equivalent to 2%. After increasing the number of neurons to 300we get no bit errors. For the Exponent Adder and Bias Subtractor we getno bit errors even for number of neurons below 200.
We observed in Section 4.1 that the accuracy increases with an increasein the number of neurons. We estimated the optimal number of neuronsrequired in all for all ensembles, as in Table 1
In this paper we describe an approach to build an IEEE-754 standard float-ing point unit using neuromorphic hardware with spiking neurons. Such11able 1: Number of neurons for each ensembleComponent Number of neuronsExponent Adder 300Bias Subtractor 300Mantissa Multiplier 600Sign and OF/UF 100devices can mimic aspects of the brain’s structure, and may be an energy-efficient alternative to the classical Von Neumann architecture. Such a neu-romorphic floating-point unit is a critical step in developing an alternative,neuromorphic CPU architecture.Our architecture comprises a complex floating-point multiplication pro-cess. The most complex part of the process is the Mantissa Multiplier,which we have realized successfully by using stage-wise multiplication anda robust encoding scheme. The architecture is easily scalable to double-precision floating point numbers also. We have checked the presence ofoverflow and underflow errors which than can be handled separately. Wehave studied the affect of number of neurons on accuracy and bit error. Fi-nally we derive the optimal number of neurons required for each component,giving an indication of the hardware resources required to implement thisapproach.
References
Qian Wang aand Youjie Li, Botang Shao, Siddhartha Dey, and Peng Li. En-ergy efficient parallel neuromorphic architectures with approximate arith-metic on FPGA.
Neurocomputing , 221:146–158, January 2017.Trevor Bekolay, James Bergstra, Eric Hunsberger, Travis DeWolf, TerrenceStewart, Daniel Rasmussen, Xuan Choo, Aaron Voelker, and Chris Elia-smith. Nengo: a Python tool for building large-scale functional brainmodels.
Frontiers in Neuroinformatics , 7, January 2014.Nello Cristianini and John Shawe-Taylor.
An Introduction to Support VectorMachines and Other Kernel-Based Learning Methods . Cambridge Univer-sity Press, 2000. doi: 10.1017/CBO9780511801389.12hris Eliasmith.
How to Build a Brain: A Neural Architecture for Biolog-ical Cognition . Oxford Series on Cognitive, Models and Architectures,September 2013. ISBN 9780199794546.Chris Eliasmith and Charles H. Anderson.
Neural Engineering: Compu-tation, Representation, and Dynamics in Neurobiological Systems . MITPress, 2002. ISBN 9780262050715.Mark A. Erle, Brian J. Hickmann, and Michael J. Schulte. Decimal Floating-Point Multiplication.
IEEE Trans. Comput. , 58(7):902–916, July 2009.doi: 10.1109/TC.2008.218.Steven K. Esser, Paul A. Merolla, John V. Arthur, Andrew S. Cassidy,Rathinakumar Appuswamy, Alexander Andreopoulos, David J. Berg, Jef-frey L. McKinstry, Timothy Melano, Davis R. Barch, Carmelo di Nolfo,Pallab Datta, Arnon Amir, Brian Taba, Myron D. Flickner, and Dhar-mendra S. Modha. Convolutional networks for fast, energy-efficient neu-romorphic computing.
PNAS , 113(41):11441–11446, October 2016. URL .Arun M. George, Rahul Sharma, and Shrisha Rao. IEEE 754 Floating-Point Addition for Neuromorphic Architecture.
Neurocomputing , 366:74–85, November 2019. URL http://doi.org/10.1016/j.neucom.2019.05.093 .Jan Gosmann and Chris Eliasmith. Optimizing semantic pointer repre-sentations for symbol-like processing in spiking neural networks.
PLoSONE , 11, February 2016. URL https://doi.org/10.1371/journal.pone.0149928 .IEEE. IEEE Standard for Floating-Point Arithmetic, July 2019. URL http://doi.org/10.1109/IEEESTD.2008.4610935 .Yongtae Kim, Yong Zhang, and Peng Li. Energy Efficient ApproximateArithmetic for Error Resilient Neuromorphic Computing.
IEEE Trans.VLSI Syst. , 23(11):2733–2737, November 2015. doi: 10.1109/TVLSI.2014.2365458.Christof Koch and Idan Segev, editors.
Methods in Neuronal Modeling:From Ions to Networks . MIT Press, Cambridge, MA, 2 edition, January2003.David J. Lilja and Sachin S. Sapatnekar.
Designing Digital Computer Sys-tems with Verilog . Cambridge University Press, 2005.13arver Mead. Neuromorphic Electronic Systems.
Proc. IEEE , 78(10):1629–1636, October 1990.Don Monroe. Neuromorphic computing gets ready for the (really) big time.
Communications of the ACM , 57(6):13–15, 2014.Gordon E. Moore. Cramming more components onto integrated circuits.
Electronics , 38(8):114–117, April 1965.Nengo. Addition example. , a. Accessed June 27, 2020.Nengo. Multiplication example. , b. Accessed June 27, 2020.Nengo. Documentation. nengo.ai/documentation , c. Accessed June 27,2020.L. A. Pastur-Romay, A. B. Porto-Pazos, F. Cedron, and A. Pazos. Par-allel computing for brain simulation.
Current Topics in MedicinalChemistry , 17(14):1646–1668, 2017. ISSN 1568-0266/1873-4294. doi:10.2174/1568026617666161104105725. URL .Terrence C. Stewart. A technical overview of the neural engineering frame-work.
AISB Quarterly , 35, October 2012. URL http://compneuro.uwaterloo.ca/files/publications/stewart.2012d.pdf .Aaron R. Voelker and Chris Eliasmith. Methods for applying the NeuralEngineering Framework to neuromorphic hardware. arXiv:1708.08133 [q-bio.NC] , August 2017.Aaron R. Voelker, Ben V. Benjamin, Terrence C. Stewart, Kwabena Boahen,and Chris Eliasmith. Extending the Neural Engineering Framework fornonideal silicon synapses. In , Baltimore, MD, May 2017.Kui Yi and Yue-Hua Ding. 32 bit Multiplication and Division ALU DesignBased on RISC Structure. In
Twenty-First International Joint Conferenceon Artificial Intelligence (IJCAI 2009) , Hainan Island, China, April 2009.Mehdi Zargham.