A Parallel Bitstream Generator for Stochastic Computing
Yawen Zhang, Runsheng Wang, Xinyue Zhang, Zherui Zhang, Jiahao Song, Zuodong Zhang, Yuan Wang, Ru Huang
AA Parallel Bitstream Generator for Stochastic Computing
Yawen Zhang, Runsheng Wang, Xinyue Zhang, Zherui Zhang, Jiahao Song, Zuodong Zhang , Yuan Wang, and Ru Huang
Key Laboratory of Microelectronic Devices and Circuits (MOE), Institute of Microelectronics, Peking University, Beijing 100871, China Email: [email protected]; [email protected]
Abstract — Stochastic computing (SC) presents high error tolerance and low hardware cost, and has great potential in applications such as neural networks and image processing. However, the bitstream generator, which converts a binary number to bitstreams, occupies a large area and energy consumption, thus weakening the superiority of SC. In this paper, we propose a novel technique for generating bitstreams in parallel, which needs only one clock for conversion and significantly reduces the hardware cost. Synthesis results demonstrate that the proposed parallel bitstream generator improves 2.5 × area and 712 × energy consumption. I. I NTRODUCTION
As a promising alternative to conventional binary computing, stochastic computing (SC) [1-2] which represents the data by the probability of a “1” in bitstreams, has high error tolerance under low operating voltage [3] showing great potential applications in the internet-of-things (IoT). What is more, SC achieves complex arithmetic operations with simple logic gates. For example, multiplication can be realized by AND gates and scaled addition can be realized by MUX gates, as shown in
Fig. 1 . However, the bitstream generator occupies over 80% area [4-5] and large energy consumption of whole SC circuits, which reduces the advantage of SC. Therefore, reducing the area and energy consumption of bitstream generators is one of the key issues in stochastic computing. Fig. 1. Stochastic number representation. Examples of multiplication and addition in stochastic computing. In this paper, we address these problems through a novel parallel bitstreams generator based on thermometer coding, which only requires a simple decoder to synchronously decode a binary number into bitstreams. Compared with the traditional bitstream generator, the proposed parallel bitstream generator achieves low hardware cost and low energy consumptions. II. A CCURACY OF D IFFERENT C ODING M ETHODS
Traditional approaches convert binary numbers to stochastic bitstreams including a pseudo-random number source and a comparator. Pseudo-random number source has three coding methods (
Fig. 2 ), which are linear feedback shift register (LFSR) sequences [1], low-discrepancy (LD) sequences [6] and thermometer coding method [7].
AND (1/8)
An example of multiplication using an AND gate
MUX An example of scaled addition using a MUX unit
Stochastic Number Representation = 3/8
Fig. 2. Three coding methods in stochastic computing. Traditional bitstream generators with three coding methods and the proposed parallel bitstream generator with thermometer coding. Ref. [6] shows that the accuracy of bitstreams based on LD sequences is better than that based on LFSR sequences, due to their low correlation. However, as shown in
Fig. 3 , LFSR sequences actually can also achieve relatively low discrepancy and low correlation by carefully selecting different initial seeds. The mean square error (MSE) of two input multiplication results (AND gate calculation results, 𝑝 𝑥 𝛬𝑝 𝑥 ) is used to evaluate the correlation of two bitstream generators. Taking the 8-bit LFSR sequences as an example, as shown in Fig. 4 , the MSE with different LFSR initial seeds is quite different. For the min MSE case, the error of most multiplication results is almost zero, which means that the correlation between these two LFSR sequences is very low.
For a fair comparison, in the following, all evaluations are based on the min MSE LFSR sequences. Fig. 3. Discrepancy of LFSR sequences and base-2 LD sequences. A A A Y Y Y Y Y Y Y A A A Y Y Y Y Y Y Y
0 0 0 0
1 0 0 1
2 0 1 0 3 0 1 1
4 1 0 0
5 1 0 1 6 1 1 0
7 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 1 1
0 0 0 0 1 1 10 0 0 1 1 1 1
0 0 1 1 1 1 1
0 1 1 1 1 1 11 1 1 1 1 1 1 A A A Y Y Y Y Y Y Y Binary Thermometer Coding
Decimal N Traditional Bitstream Generator
N bit LFSRRegister
Comparator
N bit 2 N CounterRegisterClk ComparatorN bit N/2
CounterRegister2
N/2
Clk/Clk ComparatorN/2 bit
LFSR sequences
Low-discrepancy sequencesThermometer coding sequences
Parallel Bitstream Generator
Generator Two independent bit streams
LFSR seq. LFSR (seed1) 0 15 11 9 8 4 2 1 12 6 3 13 10 5 14 7LFSR (seed2) 12 6 3 13 10 5 14 7 0 15 11 9 8 4 2 1LD seq.
Counter 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Counter+Conv. 0 8 4 12 2 10 6 14 1 9 5 13 3 11 7 15Thermometercoding Counter 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3Counter 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3
LFSR sequences with different seed
Base-2 LD sequences
LFSR sequences with different seed
Base-2 LD sequences
Fig. 4. The MSE calculation results using different 8-bit LFSR initial seeds. The calculation error of AND gate ( 𝑝 𝑥 𝛬𝑝 𝑥 ) using LFSR sequences with the min (left) and the max (right) MSE. Fig. 5 shows the MSE of two 4-bit input multiplication based on three coding methods. It is found that the MSE of LFSR sequences is the smallest in 16 bitstream length (BSL).
With the BSL increasing, the MSE of LD sequences decreases rapidly and becomes smaller than that of LFSR sequences. Due to the rounding error, the MSE of thermometer coding is large, but it instantaneously decreases to 0 when BSL reaches 256.
This also illustrates that thermometer coding does not have progressive accuracy [6]. The error of two 4-bit input multiplication under different BSL displays in
Fig. 6 . Fig. 5. The MSE of two 4bit input multiplication results in three coding methods. -2 0 2 4 6 8 10 12 14 16 18 20 22 24020406080100 P e r ce n t a g e MSE (10 -4 ) 𝑝 𝑥 𝛬𝑝 𝑥 𝑝 𝑥 𝑝 𝑥 𝑥 𝑥 Min MSE Max MSE
16 32 64 128 2560.010.1110100 M SE o f I npu t M u l t i p li ca t i on ( - ) Bit Stream Length
LD sequence
LFSR sequence
Thermometer coding
Fig. 6. The calculation error of multiplication results with different BSL in three coding methods. III. P ARALLEL B ITSTREAM G ENERATOR
By observing three existing coding methods [1][6-7] of bitstreams in SC, we found that, to some extent, they are actually all deterministic in practice. Especially, the thermometer coding method is the most regular and achieves complete accuracy after reaching a certain BSL. Since the thermometer coding method is simple and regular, here we propose a parallel bitstream generator based on thermometer coding method, which uses a binary-to-thermometer decoder as shown in
Fig. 2 . The proposed parallel bitstream generator reduces the latency to only one clock and achieves less hardware cost as well. Fig. 7. Synthesis results in different bitstream generators, in terms of area and energy. We use Synopsys Design Compiler to synthesize different bitstream generators based on TSMC 40nm technology library.
Fig. 7 shows the synthesis results of bitstream generators in terms of area, power and energy.
Compared with the traditional bitstream generator, the proposed parallel bitstream generator greatly reduces area and energy consumption. Though traditional bitstream generators can share the same pseudo-random number source, the comparators cannot be shared [8], and thus their
16 BSL 64 BSL
256 BSLLFSR sequences
Low-discrepancy sequences
Thermometer coding sequences E n e r g y ( p J ) A r ea ( u m ) Decoder
LFSR
Counter
Therm. Counter
Comp. B S L B S L B S L B S L B S L B S L B S L B S L B S L B S L B S L B S L P o w e r ( u W ) Parallel
Thermometer coding
Traditional
Thermometer coding
LD Sequence LFSR
Sequence ardware costs are still higher than parallel bitstream generator. Due to the simplicity of decoder and full parallelism, the proposed parallel bitstream generator performs much better than traditional bitstream generators in terms of area (
Fig. 8 ), power (
Fig. 9 ), and energy (
Fig. 10 ). In order to fairly compare the synthesis results based on different coding methods, here we assume that the MSE of two 4-bit input multiplication cannot exceed (𝑖𝑛𝑝𝑢𝑡 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛) ⁄ , as shown in Fig. 5 . With this limited accuracy, as shown in
TABLE , the proposed parallel bitstream generator achieves at least 2.5 × area improvements, 5.6 × power improvements, and 712 × energy improvements compared with traditional bitstream generators. It is worth noting that the bit generation efficiency of the proposed parallel bitstream generator reaches 31605 bit/pJ in 256 BSL, which achieves at least 1425 × improvements. IV. S UMMARY
In this paper, a parallel bitstream generator using a binary-to-thermometer coding decoder is proposed for the first time. We also provide a reasonable analysis method for the accuracy of different coding methods in SC. Comparing different bitstream generators with limited accuracy, the proposed parallel bitstream generator achieves at least 2.5 × area improvements and 712 × energy improvements. A CKNOWLEDGMENTS
This work was partly supported by NSFC (61522402 and 61421005) and the 111 Project (B18001). The authors would like to thank Weikang Qian for the helpful discussions. R
EFERENCES [1] J. P. Hayes, DAC, 2015; [2] W. Qian, et al., Trans. on Computers, 2011; [3] Y. Zhang, et al., IEDM, 2017; [4] P. Li, et al., VLSI Systems, 2014; [5] M. H. Najafi, et al., VLSI Systems, 2017; [6] A. Alaghi and J. P. Hayes, DATE, 2014; [7] D. Jens on and M. Riedel, ICCAD, 2016; [8] M. Yang, et al.,
ISVLSI, 2018 .
16 32 64 128 25602468 P o w e r ( u W ) Bit Stream Length
Parallel Therm. Sequences
Trad. Therm. Sequences
LD Sequences
LFSR Sequences
16 32 64 128 256020406080100 A r ea ( u m ) Bit Stream Length
Parallel Therm. Sequences
Trad. Therm. Sequences
LD Sequences
LFSR Sequences
16 32 64 128 2560246810121416 E n e r g y ( p J ) Bit Stream Length
Parallel Therm. Sequences
Trad. Therm. Sequences
LD Sequences
LFSR Sequences
Coding Method Component Bit Stream
Length
Bit Precision Power ( W) Area ( ) Latency ( s) Energy (pJ)
Bit Generation Efficiency (bit/pJ)
LFSR Sequences
Traditional Thermometer Coding
Parallel Thermometer Coding(This work)0.81 20.11 0.01 0.008 31605