[PDF] Efficient Programmable Random Variate Generation Accelerator from Sensor Noise

Abstract

We introduce a method for non-uniform random number generation based on sampling a physical process in a controlled environment. We demonstrate one proof-of-concept implementation of the method that reduces the error of Monte Carlo integration of a univariate Gaussian by 1068 times while doubling the speed of the Monte Carlo simulation. We show that the supply voltage and temperature of the physical process must be controlled to prevent the mean and standard deviation of the random number generator from drifting.

Full PDF

11 Efﬁcient Programmable Random Variate GenerationAccelerator from Sensor Noise

James Timothy Meech, Phillip Stanley-Marbell,

Senior Member, IEEE

Abstract —We introduce a method for non-uniform randomnumber generation based on sampling a physical process in acontrolled environment. We demonstrate one proof-of-conceptimplementation of the method, that doubles the speed of MonteCarlo integration of a univariate Gaussian. We show that thesupply voltage and temperature of the physical process must bemeasured and compensated for to prevent the mean and stan-dard deviation of the random number generator from drifting.The method we present and our detailed empirical hardwaremeasurements demonstrate the feasibility of programmable non-uniform random variate generation from low-power sensors andthe effect of ADC quantization on the statistical qualities of theapproach.

Index Terms —Sensor, Noise, Bayesian, Inference, Non-uniform,Random.

I. I

NTRODUCTION C URRENT software-based methods of non-uniform ran-dom variate generation are slow and inefﬁcient [1], [2].We present a programmable system capable of generatingGaussian random variates by extracting the noise propertiesof a MEMS sensor and demonstrate its principle and applica-tion. Sampling a random physical process that has a knowndistribution provides a continuous random variable with atheoretically unlimited sample rate. Table I compares severalstate-of-the-art methods of generating non-uniform randomvariates. Gaussian random variate generation is typically anorder of magnitude slower and less efﬁcient than uniformrandom variate generation [1]. We propose a method withpotential to be superior to all of the state-of-the-art methodsin terms of sample rate and efﬁciency, consisting of a physicalnoise source and an ADC. In a hardware implementation, thesample rate of the ADC limits the random number generationrate.

A. Somewhat Related Work: Uniform Random VariatesUniform random numbers are widely used in cryptography [3].The hardware non-uniform random number generators in Ta-ble I are fundamentally different to prior work on uniformrandom number generators [4], [5], [6], [7], [8], [9]. Work onuniform random number generators is often based on somenon-uniform physical entropy source but these publicationsdo not describe the distribution of the source. This omissionmakes it impossible to compare them directly to the pro-grammable random variate accelerator (PRVA) method thatwe present in Section III.

James Timothy Meech and Phillip Stanley-Marbell are with the Departmentof Electrical Engineering, University of Cambridge, Cambridge, CB3 0FA UKe-mail: [email protected]. TABLE IC

OMPARISON OF STATE - OF - THE - ART

PRVA

METHODS . PRVA : PROGRAMMABLE RANDOM VARIATE ACCELERATOR . CPU : CENTRALPROCESSING UNIT , GPU : GRAPHICS PROCESSING UNIT , MPPA : MASSIVELY PARALLEL PROCESSOR ARRAY , FPGA : FIELDPROGRAMMABLE GATE ARRAY , MR : MEMRISTOR , PD : PHOTONDETECTION , RET : RESONANCE ENERGY TRANSFER , PDI : PHOTODIODE , EN : ELECTRONIC NOISE , EXP : EXPONENTIAL , * : THIS WORK . Source Speed Energy Dist(s) PRVA Publication

CPU 890 Mb/s 3.17 Mb/J Normal Yes [1], 2009GPU 12.9 Gb/s 108 Mb/J Normal Yes [1], 2009MPPA 860 Mb/s 403 Mb/J Normal Yes [1], 2009FPGA 12.1 Gb/s 645 Mb/J Normal Yes [1], 2009MR 6000 b/s 120 Gb/J Unknown No [10], 2017PD 1.77 Gb/s - Normal No [11], 2017RET 2.89 Gb/s 578 Gb/J EXP Yes [12], 2018PDI 17.4 Gb/s - Husumi No [13], 2018PD 66.0 Mb/s - Arbitrary Yes [14], 2018PD 320 Mb/s - EXP No [15], 2018FPGA 6.40 Gb/s - Normal Yes [16], 2019PD 8.25 Gb/s - Normal No [17], 2019EN 13.8 kb/s

209 kb/J Normal Yes [*], 2019

B. Generating Non-Uniform Random Variates Is Hard

The inversion and accept-reject methods are used in soft-ware for generating samples from non-uniform random vari-ates [18]. Let U and F be uniform and non-uniform randomvariables respectively and F − the analytical closed-formsolution for the inverse cumulative distribution function of F [18]. Algorithm 1 shows the inversion method. The inver-sion method requires that F − has an analytical closed-formsolution. The Gaussian distribution has no analytical closed-form solution for F − so cannot be used with the inversionmethod [2]. The accept-reject method must be used instead.Let U and F be independent random variables and u and f thedensities of U and F on R d , d -dimensional Euclidean space.Let c ≥ be a constant such that the condition f ( x ) ≤ cu ( x ) holds for all x . Algorithm 2 shows the accept-reject method forgenerating samples from F . The accept-reject method requiresmore math operations than the inversion method to transforma sample [18]. This causes it to take more clock cycles tocompute and therefore, more time to transform each sample.When using the accept-reject method, samples deviating fromthe desired distribution are rejected [18]. There was no direct effort to optimize speed or energy efﬁciency in thiswork. Back-of-the-envelope calculations show room for and increasesin speed and energy efﬁciency respectively. a r X i v : . [ c s . OH ] A p r Algorithm 1:

Inversion method.

Result:

Sample from non-uniform random variate F Generate uniform [0,1] random variate U RETURN F ← F − ( U ) Algorithm 2:

Accept-reject method.initialization T = 2 initialization U = 1 Result:

Sample from non-uniform random variate F repeat Generate uniform [0,1] random variate U Generate uniform [0,1] random variate F Set T ← c f ( F ) u ( F ) until U T ≤ RETURN F C. Uses of Non-Uniform Random Variates

Non-uniform random variate generators are fundamental to ap-plications employing Monte Carlo methods [19], such as pop-ulation balance modeling of the crystallization process [20],ray tracing [21] and ﬁnancial computing [22]. When conduct-ing Bayesian inference we must evaluate Bayes’ theorem tocalculate the probability that a belief B is true given newdata D , we denote this as P ( B | D ) . To do this we need theprobability that the belief is true regardless of our data ( P ( B ) ),the probability that the data is true given the belief ( P ( D | B ) )and the probability that the data is true regardless of the belief( P ( D ) ). We will refer to P ( B ) as the prior , P ( D | B ) as the likelihood and P ( D ) as the marginal likelihood . P ( B | D ) , the posterior is deﬁned as: P ( B | D ) = P ( D | B ) P ( B ) P ( D ) . (1)We calculate the marginal likelihood by integrating the jointdensity P ( B, D ) [23]. In practice, the analytical calculation ofthe marginal likelihood is impossible for all but the simplestjoint distributions [23]. We instead sample from the jointdistribution P ( B, D ) to obtain summary statistics that wecan use to describe it [23]. These random samples from be-spoke probability distributions must be produced using a non-uniform random variate generator. This kind of computation isperformed on low power embedded systems such as drones forparticle ﬁlter localization [24]. In the particle ﬁlter algorithmEquation 1 must be evaluated once every time step [25]. D. Contributions

1) The idea that physical noise sources such as MEMSsensors can be used in a PRVA (Section I).2) Estimation of performance increase and error reductionachieved by using a PRVA for Monte Carlo integration(Section II).3) Investigation of the impact of temperature and supplyvoltage on the noise distribution obtained from a com-mercial MEMS sensor (Section III-A). II. M

OTIVATING E XAMPLE

We performed Monte Carlo integration of a Gaussian with amean of µ = 980 . and standard deviation of σ = 7 . using samples from a Gaussian with the same mean andvariance. We ran the experiment with a Gaussian generatedby the C++ random library and repeated it with samples fromthe PRVA collected at 3 V and 20 ◦ C. The sensor-generatedrandom variates were saved in a ﬁle and then presented tothe C++ benchmark program in a element array. Weuniformly-interpolate between the points in the PRVA gen-erated Gaussian using a [-1, 1] uniform C++ random numbergenerator. We assume that the PRVA can produce a sample inthe same amount of time that it takes to perform a read frommemory. The current PRVA cannot do this but it is possiblewith fast ADCs and custom analog-out accelerometers.We performed the same integration using samples from auniform distribution with various ranges. Let E be the errorof the integration, t be the time taken by the integration, N be the number of random samples, and D be the distribution(either uniform or Gaussian). Let samps be the array ofrandom variates, A be the area, b and h the rectangle baseand height, and f the probability density function of theGaussian for integration. Algorithm 3 shows the integrationscheme that we used. We repeated each process 1000 timesand calculated the average time t and error E . Figure 1shows that the PRVA outperforms the C++ uniform randomnumber generator for most ranges. It is only outperformed byuniform generators with a range of ± σ to ± σ . Theproportion of the uniform probability density function densityoverlapping the Gaussian density decreases as the range ofthe uniform distribution is increased. For a given functionit is impossible know beforehand which range of uniformrandom numbers will produce a sufﬁciently small bound onthe error of integration. To avoid this problem we samplefrom a distribution that closely matches the distribution to beintegrated. The divergence between the C++ Gaussian randomnumber generator and the PRVA is due to the lack of uniquenumbers in the tails of the distribution. The PRVA performsthe task up to 1.4 times faster than the C++ Gaussian randomnumber generator with eight threads and always twice as fastwith one thread. Algorithm 3:

Monte Carlo integration.

Result:

Error E and time t Timer startGenerate N random samples from distribution D Sort N random samples for All pairs of samples do b = samps[ i ] − samps[ i − ] h = ( f (samps[ i ]) + f (samps[ i − ]))/2 A += b × h end E = abs( − A )Timer stop t = stop − startRETURN E , t Number of samples -12 -10 -8 -6 -4 -2 E rr o r C++ uniform [ - , + ]C++ uniform [ -10 , +10 ]C++ uniform [ -100 , +100 ]C++ uniform [ -1000 , +1000 ]C++ uniform [ -10000 , +10000 ]C++ uniform [ -100000 , +100000 ]C++ uniform [ -1000000 , +1000000 ]C++ Gaussian N( , )Interpolated hardware Gaussian

Fig. 1. The error of Monte Carlo integration depends upon the range foruniform random numbers but not for Gaussian random numbers. The errorbars were plotted using a 90 % conﬁdence interval on the mean. The codewas compiled with g++-mp-7 c++11 and run on a 2.8 GHz Intel Core i7 CPUwith Openmp to utilize all eight processor threads.

Microcontroller BMX055

SCLSDA3.3 V0 V VDDIO0 V

Keithly 2450 SourceMeter

VDD0 VSCLSDA V+V- W i r e Tilt and RotateVibration IsolationThermal Chamber

Fig. 2. Experimental setup, powering the sensor and the I2C interfaceseparately allowed more accurate measurement of the power consumed bythe sensor. The microcontroller consumed approximately 66 mW from its USBsupply.

III. M

ETHODOLOGY

A PRVA based on physical noise sources must have negligibledrift of the mean and standard deviation over time. Drift wouldcause errors in calculations using the output of the PRVA.Any environmental parameter that causes non-negligible driftmust be measured and compensated for. We sampled the z-axis of a MEMS accelerometer (the accelerometer in the BoschBMX055) to obtain the distributions.

A. Temperature-Controlled Experiments

Figure 2 shows the experimental setup. We placed the micro-controller, accelerometer, tilt and rotate stage and vibrationisolation platform inside a Binder MK56 thermal chamber.We connected a microcontroller to the sensor via I2C fora 1154 Hz sample rate. Orders of magnitude higher samplerates are possible using an off-the-shelf ADC and analog-outaccelerometer. We used a Keithly 2450 source measurementunit to power the sensor and measure the current drawn. We setthe chamber temperature to 25 ◦ C and allowed 30 minutes forthe temperature of the sensor to equilibrate whilst constantlysampling z-axis acceleration values from it. We then sampled values from the BMX055 sensor at a 3.6 V supply voltage.We repeated this for all the voltages in the range of 3.6 to 1.4 Vwith a 0.2 V decrement. We then repeated this process for thetemperature from 25 down to -5 ◦ C with a decrement of 5 ◦ C. B. Quantization Investigation

We investigated the effect of quantization on the Kullback-Leibler (KL) divergence between a discrete distribution and itsideal ﬁtted curve. We used the MATLAB normrnd functionto generate values from a Gaussian distribution with thesame mean and standard deviation as the BMX055 z-axis at2.6 V and 10 ◦ C. We then discretized the values into variousnumbers of bins, ﬁtted a Gaussian distribution to them andcalculated the KL divergence between the ﬁtted distributionand the actual distribution.IV. R

ESULTS AND D ISCUSSION

We calculated the KL divergence between two discrete distri-butions using the following equation [26]. Let P and Q bediscrete probability distributions, x a given sample value and χ the sample space: D KL ( P || Q ) = − (cid:88) x ∈ χ P ( x ) log (cid:18) Q ( x ) P ( x ) (cid:19) . (2)Figure 3(a) shows the BMX055 z-axis acceleration distri-bution at 2.6 V and 10 ◦ C. We found that the KL divergencebetween the distribution from the BMX055 z-axis and its ﬁttedGaussian (0.00263) was more than an order of magnitudesmaller than the equivalent result for a MATLAB-generateddistribution (0.0392). This shows that the distribution of ran-dom numbers would produce accurate results in applicationssuch as particle ﬁlters where the distribution represents thestate [25].We rounded the MATLAB-generated ﬂoats for comparisonwith the BMX055-generated integers. The KL divergencebetween a value MATLAB-generated uniform distributionwith range [ µ − σ, µ + 3 σ ] and its ﬁtted Gaussian is 0.116for reference. We averaged the KL divergence calculationsover 100 distributions to account for the random variationsin the measurement. The numbers generated by the sensor arecloser to an ideal Gaussian distribution than those generated byMATLAB. We used a custom multi-sensor embedded systemto perform the initial investigation into which sensors could beused to generate non-uniform random numbers [27]. We foundthat the BMX055 provided the highest resolution and closestﬁt to a Gaussian. The noise from the BME680 temperature,pressure and humidity, HDC1000 temperature, MMA8451accelerometer, AMG8833 temperature and MAG3110 magne-tometer sensors was also Gaussian but with a lower resolution.Figures 4(a) and 4(b) show how voltage and temperatureaffect the mean and standard deviation of the BMX055 z-axisacceleration measurement. Temperature has a greater effect onmean and standard deviation than voltage. Both voltage andtemperature have a greater effect upon the standard deviationthan the mean. A PRVA based on this phenomenon shouldmeasure and compensate for both the temperature and thesupply voltage of the sensor. Figure 3(b) shows the effect ofincreasing the bin size on the divergence between a distributionof values and its ideal ﬁtted distribution. This showsthat increased quantization decreases the difference betweena distribution and its ﬁtted Gaussian.

950 960 970 980 990 1000

Acceleration (mg) (a) -5 -4 -3 -2 -1 F r equen cy Number of bins (b) -5 -4 -3 -2 -1 K L d i v e r gen c e Fig. 3. (a)

Distribution of BMX055 accelerometer z-axis noise and closestﬁtted distribution. The KL divergence or difference between the data andthe ﬁtted distribution is 0.00263 demonstrating that the accelerometer noiseclosely matches a Gaussian distribution. (b)

The effect of increasing dis-cretization on the KL divergence whilst keeping the total number of samplesconstant at 100,000. Increased quantization decreases KL divergence.

Power (mW) (a) M ean a cc e l e r a t i on ( % ) ° C20 ° C15 ° C10 ° C 5 ° C0 ° C-5 ° C Power (mW) (b) A cc e l e r a t i on s t anda r d de v i a t i on ( % ) ° C20 ° C15 ° C10 ° C 5 ° C0 ° C-5 ° C Fig. 4. (a)

The mean of BMX055 z-axis distributions decreases withincreasing temperature and increases with increasing supply power. The y-axis is normalized to the maximum observed value. The x-axis shows thepower consumed by the BMX055 sensor alone. (b)

The standard deviationof BMX055 z-axis distributions decreases with increasing temperature andsupply power.

V. G

AUSSIAN T O G AUSSIAN T RANSFORM

A univariate Gaussian can be transformed to any other univari-ate Gaussian with one multiplication and one addition. Thisis signiﬁcantly less computation than the accept-reject methodwhich requires at least 10 operations per accept-reject test:an exponential, square, square root, subtraction, comparisonand ﬁve divisions / multiplications. The accept-reject methodmay need to repeat this set of operations numerous timesfor each random variate. The CPU must generate two uni-form random numbers as input for one accept-reject methodsampling attempt. Rapid addition and multiplication can beachieved using fast adders and multipliers implemented onan FPGA, Figure 5 shows how this could be achieved. TheCPU requests a distribution by specifying parameters to thetransform circuitry which proceeds to transform the inputGaussian to ﬁt the requested output distribution. The transformcircuitry then stores the values in a small high-speed cachethat the CPU can read from. Ofﬂoading the transformation todedicated hardware leaves the processor free to execute otherinstructions which will improve performance. This architecturehas not yet been implemented in an FPGA but will be thesubject of future work.

Multiply +Add

Sensor

CPUCache

FPGA

Fig. 5. Gaussian to Gaussian transform implementation. The CPU can requestdistributions with certain parameters and they are presented to it in a cache.

VI. C

ONCLUSION

Sensors are a feasible entropy source for a PRVA at a highersample rate and with greater efﬁciency than the state-of-the-art. The mean and standard deviation of the noise producedby the z-axis of a commercial accelerometer depend uponthe temperature of the environment and the supply voltage.Both should be measured and compensated for in the FPGAtransform circuitry. Quantizing a Gaussian distribution de-creases the KL divergence between it and a ﬁtted Gaussian.A PRVA can double the speed of Monte Carlo integration ofa univariate Gaussian on a single thread. The PRVA samplingrate can be increased by substituting the BMX055 for a setof analog-out accelerometers (ADXL335s) and high speedADCs (ADS54J60s). This method would require a customizedversion of the ADXL335 with the proof mass ﬁxed in place,ampliﬁcation so the noise distribution covers at least 90 % ofthe ADC range and no low pass ﬁltering on its output.VII. A

CKNOWLEDGEMENTS

This research is supported by an Alan Turing Institute awardTU/B/000096 under EPSRC grant EP/N510129/1, by EPSRCgrant EP/R022534/1, by EPSRC grant EP/V004654/1, and byEPSRC grant EP/L015889/1.R

EFERENCES[1] D. B. Thomas, L. Howes, and W. Luk, “A comparison of cpus, gpus,fpgas, and massively parallel processor arrays for random numbergeneration,” in

Proceedings of the ACM/SIGDA international symposiumon Field programmable gate arrays , 2009, pp. 63–72.[2] D. Thomas and W. Luk, “Efﬁcient hardware generation of randomvariates with arbitrary distributions,” in , 2006, pp. 57–66.[3] D. E. Eastlake, J. I. Schiller, and S. Crocker, “Randomness requirementsfor security,”

RFC , vol. 4086, pp. 1–48, 2005.[4] C. Hennebert, H. Hossayni, and C. Lauradoux, “Entropy harvestingfrom physical sensors,” in

Proceedings of the sixth ACM conferenceon Security and privacy in wireless and mobile networks , 2013, pp.149–154.[5] K. Wallace, K. Moran, E. Novak, G. Zhou, and K. Sun, “Toward sensor-based random number generation for mobile and iot devices,”

IEEEInternet of Things Journal , vol. 3, no. 6, pp. 1189–1201, 2016.[6] S.-M. Cho, E. Hong, and S.-H. Seo, “Random number generator usingsensors for drone,”

IEEE Access , vol. 8, pp. 30 343–30 354, 2020.[7] K. Lee and M. Lee, “True random number generator (trng) utilizingfm radio signals for mobile and embedded devices in multi-access edgecomputing,”

Sensors , vol. 19, no. 19, p. 4130, 2019.[8] B. Perach and S. Kvatinsky, “An asynchronous and low-power truerandom number generator using stt-mtj,”

IEEE Transactions on VeryLarge Scale Integration Systems , vol. 27, no. 11, pp. 2473–2484, 2019.[9] S. Srinivasan, S. Mathew, R. Ramanarayanan, F. Sheikh, M. Anders,H. Kaul, V. Erraguntla, R. Krishnamurthy, and G. Taylor, “2.4 ghz 7mwall-digital pvt-variation tolerant true random number generator in 45nmcmos,” in . IEEE, 2010, pp. 203–204. [10] H. Jiang, D. Belkin, S. E. Savelev, S. Lin, Z. Wang, Y. Li, S. Joshi,R. Midya, C. Li, M. Rao et al. , “A novel true random number generatorbased on a stochastic diffusive memristor,”

Nature communications ,vol. 8, no. 1, pp. 1–9, 2017.[11] D. G. Marangon, G. Vallone, and P. Villoresi, “Source-device-independent ultrafast quantum random number generation,”

Physicalreview letters , vol. 118, no. 6, p. 060503, 2017.[12] X. Zhang, R. Bashizade, C. LaBoda, C. Dwyer, and A. R. Lebeck, “Ar-chitecting a stochastic computing unit with molecular optical devices,”in . IEEE, 2018, pp. 301–314.[13] M. Avesani, D. G. Marangon, G. Vallone, and P. Villoresi, “Source-device-independent heterodyne-based quantum random number genera-tor at 17 gbps,”

Nature communications , vol. 9, no. 1, pp. 1–7, 2018.[14] L. Nguyen, P. Rehain, Y. M. Sua, and Y.-P. Huang, “Programmablequantum random number generator without postprocessing,”

Opticsletters , vol. 43, no. 4, pp. 631–634, 2018.[15] A. Tomasi, A. Meneghetti, N. Massari, L. Gasparini, D. Rucatti, andH. Xu, “Model, validation, and characterization of a robust quantumrandom number generator based on photon arrival time comparison,”

Journal of Lightwave Technology , vol. 36, no. 18, pp. 3843–3854, 2018.[16] Y. Hu, Y. Wu, Y. Chen, G. C. Wan, and M. S. Tong, “Gaussian randomnumber generator: Implemented in fpga for quantum key distribution,”

International Journal of Numerical Modelling: Electronic Networks,Devices and Fields , vol. 32, no. 3, p. e2554, 2019.[17] X. Guo, C. Cheng, M. Wu, Q. Gao, P. Li, and Y. Guo, “Parallel real-timequantum random number generator,”

Optics letters , vol. 44, no. 22, pp.5566–5569, 2019.[18] L. Devroye, “Non-Uniform Random Variate Generation.” Springer-Verlag, 1986, p. 42, ISBN: 1461386454.[19] D. B. Thomas and W. Luk, “Resource efﬁcient generators for theﬂoating-point uniform and exponential distributions,” in , 2008, pp. 102–107.[20] J. R. van Peborgh Gooch and M. J. Hounslow, “Monte carlo simulationof size-enlargement mechanisms in crystallization,”

AIChE Journal ,vol. 42, no. 7, pp. 1864–1874, 1996.[21] R. L. Cook, “Stochastic sampling in computer graphics,”

ACM Trans.Graph. , vol. 5, no. 1, pp. 51–72, 1986.[22] G. L. Zhang, P. H. W. Leong, C. H. Ho, K. H. Tsoi, C. C. C. Cheung,D. Lee, R. C. C. Cheung, and W. Luk, “Reconﬁgurable acceleration formonte carlo based ﬁnancial simulation,” in

Proceedings. 2005 IEEEInternational Conference on Field-Programmable Technology, 2005. ,2005, pp. 215–222.[23] B. Lambert, “A students guide to bayesian statistics.” SAGE, 2018,pp. 23–50, ISBN: 1473916364.[24] J. M. Pak, C. K. Ahn, Y. S. Shmaliy, and M. T. Lim, “Improvingreliability of particle ﬁlter-based localization in wireless sensor networksvia hybrid particle/ﬁr ﬁltering,”

IEEE Transactions on Industrial Infor-matics , vol. 11, no. 5, pp. 1089–1098, 2015.[25] A. Doucet, N. De Freitas, and N. Gordon, “An introduction to sequentialmonte carlo methods,” in

Sequential Monte Carlo methods in practice .Springer, 2001, pp. 3–14.[26] S. Kullback and R. A. Leibler, “On information and sufﬁciency,”

Ann.Math. Statist. , vol. 22, no. 1, pp. 79–86, 03 1951.[27] P. Stanley-Marbell and M. Rinard, “Warp: A hardware platform forefﬁcient multimodal sensing with adaptive approximation,”