Efficient Programmable Random Variate Generation Accelerator from Sensor Noise
11 Efficient Programmable Random Variate GenerationAccelerator from Sensor Noise
James Timothy Meech, Phillip Stanley-Marbell,
Senior Member, IEEE
Abstract —We introduce a method for non-uniform randomnumber generation based on sampling a physical process in acontrolled environment. We demonstrate one proof-of-conceptimplementation of the method, that doubles the speed of MonteCarlo integration of a univariate Gaussian. We show that thesupply voltage and temperature of the physical process must bemeasured and compensated for to prevent the mean and stan-dard deviation of the random number generator from drifting.The method we present and our detailed empirical hardwaremeasurements demonstrate the feasibility of programmable non-uniform random variate generation from low-power sensors andthe effect of ADC quantization on the statistical qualities of theapproach.
Index Terms —Sensor, Noise, Bayesian, Inference, Non-uniform,Random.
I. I
NTRODUCTION C URRENT software-based methods of non-uniform ran-dom variate generation are slow and inefficient [1], [2].We present a programmable system capable of generatingGaussian random variates by extracting the noise propertiesof a MEMS sensor and demonstrate its principle and applica-tion. Sampling a random physical process that has a knowndistribution provides a continuous random variable with atheoretically unlimited sample rate. Table I compares severalstate-of-the-art methods of generating non-uniform randomvariates. Gaussian random variate generation is typically anorder of magnitude slower and less efficient than uniformrandom variate generation [1]. We propose a method withpotential to be superior to all of the state-of-the-art methodsin terms of sample rate and efficiency, consisting of a physicalnoise source and an ADC. In a hardware implementation, thesample rate of the ADC limits the random number generationrate.
A. Somewhat Related Work: Uniform Random VariatesUniform random numbers are widely used in cryptography [3].The hardware non-uniform random number generators in Ta-ble I are fundamentally different to prior work on uniformrandom number generators [4], [5], [6], [7], [8], [9]. Work onuniform random number generators is often based on somenon-uniform physical entropy source but these publicationsdo not describe the distribution of the source. This omissionmakes it impossible to compare them directly to the pro-grammable random variate accelerator (PRVA) method thatwe present in Section III.
James Timothy Meech and Phillip Stanley-Marbell are with the Departmentof Electrical Engineering, University of Cambridge, Cambridge, CB3 0FA UKe-mail: [email protected]. TABLE IC
OMPARISON OF STATE - OF - THE - ART
PRVA
METHODS . PRVA : PROGRAMMABLE RANDOM VARIATE ACCELERATOR . CPU : CENTRALPROCESSING UNIT , GPU : GRAPHICS PROCESSING UNIT , MPPA : MASSIVELY PARALLEL PROCESSOR ARRAY , FPGA : FIELDPROGRAMMABLE GATE ARRAY , MR : MEMRISTOR , PD : PHOTONDETECTION , RET : RESONANCE ENERGY TRANSFER , PDI : PHOTODIODE , EN : ELECTRONIC NOISE , EXP : EXPONENTIAL , * : THIS WORK . Source Speed Energy Dist(s) PRVA Publication
CPU 890 Mb/s 3.17 Mb/J Normal Yes [1], 2009GPU 12.9 Gb/s 108 Mb/J Normal Yes [1], 2009MPPA 860 Mb/s 403 Mb/J Normal Yes [1], 2009FPGA 12.1 Gb/s 645 Mb/J Normal Yes [1], 2009MR 6000 b/s 120 Gb/J Unknown No [10], 2017PD 1.77 Gb/s - Normal No [11], 2017RET 2.89 Gb/s 578 Gb/J EXP Yes [12], 2018PDI 17.4 Gb/s - Husumi No [13], 2018PD 66.0 Mb/s - Arbitrary Yes [14], 2018PD 320 Mb/s - EXP No [15], 2018FPGA 6.40 Gb/s - Normal Yes [16], 2019PD 8.25 Gb/s - Normal No [17], 2019EN 13.8 kb/s
209 kb/J Normal Yes [*], 2019
B. Generating Non-Uniform Random Variates Is Hard
The inversion and accept-reject methods are used in soft-ware for generating samples from non-uniform random vari-ates [18]. Let U and F be uniform and non-uniform randomvariables respectively and F − the analytical closed-formsolution for the inverse cumulative distribution function of F [18]. Algorithm 1 shows the inversion method. The inver-sion method requires that F − has an analytical closed-formsolution. The Gaussian distribution has no analytical closed-form solution for F − so cannot be used with the inversionmethod [2]. The accept-reject method must be used instead.Let U and F be independent random variables and u and f thedensities of U and F on R d , d -dimensional Euclidean space.Let c ≥ be a constant such that the condition f ( x ) ≤ cu ( x ) holds for all x . Algorithm 2 shows the accept-reject method forgenerating samples from F . The accept-reject method requiresmore math operations than the inversion method to transforma sample [18]. This causes it to take more clock cycles tocompute and therefore, more time to transform each sample.When using the accept-reject method, samples deviating fromthe desired distribution are rejected [18]. There was no direct effort to optimize speed or energy efficiency in thiswork. Back-of-the-envelope calculations show room for and increasesin speed and energy efficiency respectively. a r X i v : . [ c s . OH ] A p r Algorithm 1:
Inversion method.
Result:
Sample from non-uniform random variate F Generate uniform [0,1] random variate U RETURN F ← F − ( U ) Algorithm 2:
Accept-reject method.initialization T = 2 initialization U = 1 Result:
Sample from non-uniform random variate F repeat Generate uniform [0,1] random variate U Generate uniform [0,1] random variate F Set T ← c f ( F ) u ( F ) until U T ≤ RETURN F C. Uses of Non-Uniform Random Variates
Non-uniform random variate generators are fundamental to ap-plications employing Monte Carlo methods [19], such as pop-ulation balance modeling of the crystallization process [20],ray tracing [21] and financial computing [22]. When conduct-ing Bayesian inference we must evaluate Bayes’ theorem tocalculate the probability that a belief B is true given newdata D , we denote this as P ( B | D ) . To do this we need theprobability that the belief is true regardless of our data ( P ( B ) ),the probability that the data is true given the belief ( P ( D | B ) )and the probability that the data is true regardless of the belief( P ( D ) ). We will refer to P ( B ) as the prior , P ( D | B ) as the likelihood and P ( D ) as the marginal likelihood . P ( B | D ) , the posterior is defined as: P ( B | D ) = P ( D | B ) P ( B ) P ( D ) . (1)We calculate the marginal likelihood by integrating the jointdensity P ( B, D ) [23]. In practice, the analytical calculation ofthe marginal likelihood is impossible for all but the simplestjoint distributions [23]. We instead sample from the jointdistribution P ( B, D ) to obtain summary statistics that wecan use to describe it [23]. These random samples from be-spoke probability distributions must be produced using a non-uniform random variate generator. This kind of computation isperformed on low power embedded systems such as drones forparticle filter localization [24]. In the particle filter algorithmEquation 1 must be evaluated once every time step [25]. D. Contributions
1) The idea that physical noise sources such as MEMSsensors can be used in a PRVA (Section I).2) Estimation of performance increase and error reductionachieved by using a PRVA for Monte Carlo integration(Section II).3) Investigation of the impact of temperature and supplyvoltage on the noise distribution obtained from a com-mercial MEMS sensor (Section III-A). II. M
OTIVATING E XAMPLE
We performed Monte Carlo integration of a Gaussian with amean of µ = 980 . and standard deviation of σ = 7 . using samples from a Gaussian with the same mean andvariance. We ran the experiment with a Gaussian generatedby the C++ random library and repeated it with samples fromthe PRVA collected at 3 V and 20 ◦ C. The sensor-generatedrandom variates were saved in a file and then presented tothe C++ benchmark program in a element array. Weuniformly-interpolate between the points in the PRVA gen-erated Gaussian using a [-1, 1] uniform C++ random numbergenerator. We assume that the PRVA can produce a sample inthe same amount of time that it takes to perform a read frommemory. The current PRVA cannot do this but it is possiblewith fast ADCs and custom analog-out accelerometers.We performed the same integration using samples from auniform distribution with various ranges. Let E be the errorof the integration, t be the time taken by the integration, N be the number of random samples, and D be the distribution(either uniform or Gaussian). Let samps be the array ofrandom variates, A be the area, b and h the rectangle baseand height, and f the probability density function of theGaussian for integration. Algorithm 3 shows the integrationscheme that we used. We repeated each process 1000 timesand calculated the average time t and error E . Figure 1shows that the PRVA outperforms the C++ uniform randomnumber generator for most ranges. It is only outperformed byuniform generators with a range of ± σ to ± σ . Theproportion of the uniform probability density function densityoverlapping the Gaussian density decreases as the range ofthe uniform distribution is increased. For a given functionit is impossible know beforehand which range of uniformrandom numbers will produce a sufficiently small bound onthe error of integration. To avoid this problem we samplefrom a distribution that closely matches the distribution to beintegrated. The divergence between the C++ Gaussian randomnumber generator and the PRVA is due to the lack of uniquenumbers in the tails of the distribution. The PRVA performsthe task up to 1.4 times faster than the C++ Gaussian randomnumber generator with eight threads and always twice as fastwith one thread. Algorithm 3:
Monte Carlo integration.
Result:
Error E and time t Timer startGenerate N random samples from distribution D Sort N random samples for All pairs of samples do b = samps[ i ] − samps[ i − ] h = ( f (samps[ i ]) + f (samps[ i − ]))/2 A += b × h end E = abs( − A )Timer stop t = stop − startRETURN E , t Number of samples -12 -10 -8 -6 -4 -2 E rr o r C++ uniform [ - , + ]C++ uniform [ -10 , +10 ]C++ uniform [ -100 , +100 ]C++ uniform [ -1000 , +1000 ]C++ uniform [ -10000 , +10000 ]C++ uniform [ -100000 , +100000 ]C++ uniform [ -1000000 , +1000000 ]C++ Gaussian N( , )Interpolated hardware Gaussian
Fig. 1. The error of Monte Carlo integration depends upon the range foruniform random numbers but not for Gaussian random numbers. The errorbars were plotted using a 90 % confidence interval on the mean. The codewas compiled with g++-mp-7 c++11 and run on a 2.8 GHz Intel Core i7 CPUwith Openmp to utilize all eight processor threads.
Microcontroller BMX055
SCLSDA3.3 V0 V VDDIO0 V
Keithly 2450 SourceMeter
VDD0 VSCLSDA V+V- W i r e Tilt and RotateVibration IsolationThermal Chamber
Fig. 2. Experimental setup, powering the sensor and the I2C interfaceseparately allowed more accurate measurement of the power consumed bythe sensor. The microcontroller consumed approximately 66 mW from its USBsupply.
III. M
ETHODOLOGY
A PRVA based on physical noise sources must have negligibledrift of the mean and standard deviation over time. Drift wouldcause errors in calculations using the output of the PRVA.Any environmental parameter that causes non-negligible driftmust be measured and compensated for. We sampled the z-axis of a MEMS accelerometer (the accelerometer in the BoschBMX055) to obtain the distributions.
A. Temperature-Controlled Experiments
Figure 2 shows the experimental setup. We placed the micro-controller, accelerometer, tilt and rotate stage and vibrationisolation platform inside a Binder MK56 thermal chamber.We connected a microcontroller to the sensor via I2C fora 1154 Hz sample rate. Orders of magnitude higher samplerates are possible using an off-the-shelf ADC and analog-outaccelerometer. We used a Keithly 2450 source measurementunit to power the sensor and measure the current drawn. We setthe chamber temperature to 25 ◦ C and allowed 30 minutes forthe temperature of the sensor to equilibrate whilst constantlysampling z-axis acceleration values from it. We then sampled values from the BMX055 sensor at a 3.6 V supply voltage.We repeated this for all the voltages in the range of 3.6 to 1.4 Vwith a 0.2 V decrement. We then repeated this process for thetemperature from 25 down to -5 ◦ C with a decrement of 5 ◦ C. B. Quantization Investigation
We investigated the effect of quantization on the Kullback-Leibler (KL) divergence between a discrete distribution and itsideal fitted curve. We used the MATLAB normrnd functionto generate values from a Gaussian distribution with thesame mean and standard deviation as the BMX055 z-axis at2.6 V and 10 ◦ C. We then discretized the values into variousnumbers of bins, fitted a Gaussian distribution to them andcalculated the KL divergence between the fitted distributionand the actual distribution.IV. R
ESULTS AND D ISCUSSION
We calculated the KL divergence between two discrete distri-butions using the following equation [26]. Let P and Q bediscrete probability distributions, x a given sample value and χ the sample space: D KL ( P || Q ) = − (cid:88) x ∈ χ P ( x ) log (cid:18) Q ( x ) P ( x ) (cid:19) . (2)Figure 3(a) shows the BMX055 z-axis acceleration distri-bution at 2.6 V and 10 ◦ C. We found that the KL divergencebetween the distribution from the BMX055 z-axis and its fittedGaussian (0.00263) was more than an order of magnitudesmaller than the equivalent result for a MATLAB-generateddistribution (0.0392). This shows that the distribution of ran-dom numbers would produce accurate results in applicationssuch as particle filters where the distribution represents thestate [25].We rounded the MATLAB-generated floats for comparisonwith the BMX055-generated integers. The KL divergencebetween a value MATLAB-generated uniform distributionwith range [ µ − σ, µ + 3 σ ] and its fitted Gaussian is 0.116for reference. We averaged the KL divergence calculationsover 100 distributions to account for the random variationsin the measurement. The numbers generated by the sensor arecloser to an ideal Gaussian distribution than those generated byMATLAB. We used a custom multi-sensor embedded systemto perform the initial investigation into which sensors could beused to generate non-uniform random numbers [27]. We foundthat the BMX055 provided the highest resolution and closestfit to a Gaussian. The noise from the BME680 temperature,pressure and humidity, HDC1000 temperature, MMA8451accelerometer, AMG8833 temperature and MAG3110 magne-tometer sensors was also Gaussian but with a lower resolution.Figures 4(a) and 4(b) show how voltage and temperatureaffect the mean and standard deviation of the BMX055 z-axisacceleration measurement. Temperature has a greater effect onmean and standard deviation than voltage. Both voltage andtemperature have a greater effect upon the standard deviationthan the mean. A PRVA based on this phenomenon shouldmeasure and compensate for both the temperature and thesupply voltage of the sensor. Figure 3(b) shows the effect ofincreasing the bin size on the divergence between a distributionof values and its ideal fitted distribution. This showsthat increased quantization decreases the difference betweena distribution and its fitted Gaussian.
950 960 970 980 990 1000
Acceleration (mg) (a) -5 -4 -3 -2 -1 F r equen cy Number of bins (b) -5 -4 -3 -2 -1 K L d i v e r gen c e Fig. 3. (a)
Distribution of BMX055 accelerometer z-axis noise and closestfitted distribution. The KL divergence or difference between the data andthe fitted distribution is 0.00263 demonstrating that the accelerometer noiseclosely matches a Gaussian distribution. (b)
The effect of increasing dis-cretization on the KL divergence whilst keeping the total number of samplesconstant at 100,000. Increased quantization decreases KL divergence.
Power (mW) (a) M ean a cc e l e r a t i on ( % ) ° C20 ° C15 ° C10 ° C 5 ° C0 ° C-5 ° C Power (mW) (b) A cc e l e r a t i on s t anda r d de v i a t i on ( % ) ° C20 ° C15 ° C10 ° C 5 ° C0 ° C-5 ° C Fig. 4. (a)
The mean of BMX055 z-axis distributions decreases withincreasing temperature and increases with increasing supply power. The y-axis is normalized to the maximum observed value. The x-axis shows thepower consumed by the BMX055 sensor alone. (b)
The standard deviationof BMX055 z-axis distributions decreases with increasing temperature andsupply power.
V. G
AUSSIAN T O G AUSSIAN T RANSFORM
A univariate Gaussian can be transformed to any other univari-ate Gaussian with one multiplication and one addition. Thisis significantly less computation than the accept-reject methodwhich requires at least 10 operations per accept-reject test:an exponential, square, square root, subtraction, comparisonand five divisions / multiplications. The accept-reject methodmay need to repeat this set of operations numerous timesfor each random variate. The CPU must generate two uni-form random numbers as input for one accept-reject methodsampling attempt. Rapid addition and multiplication can beachieved using fast adders and multipliers implemented onan FPGA, Figure 5 shows how this could be achieved. TheCPU requests a distribution by specifying parameters to thetransform circuitry which proceeds to transform the inputGaussian to fit the requested output distribution. The transformcircuitry then stores the values in a small high-speed cachethat the CPU can read from. Offloading the transformation todedicated hardware leaves the processor free to execute otherinstructions which will improve performance. This architecturehas not yet been implemented in an FPGA but will be thesubject of future work.
Multiply +Add
Sensor
CPUCache
FPGA
Fig. 5. Gaussian to Gaussian transform implementation. The CPU can requestdistributions with certain parameters and they are presented to it in a cache.
VI. C
ONCLUSION
Sensors are a feasible entropy source for a PRVA at a highersample rate and with greater efficiency than the state-of-the-art. The mean and standard deviation of the noise producedby the z-axis of a commercial accelerometer depend uponthe temperature of the environment and the supply voltage.Both should be measured and compensated for in the FPGAtransform circuitry. Quantizing a Gaussian distribution de-creases the KL divergence between it and a fitted Gaussian.A PRVA can double the speed of Monte Carlo integration ofa univariate Gaussian on a single thread. The PRVA samplingrate can be increased by substituting the BMX055 for a setof analog-out accelerometers (ADXL335s) and high speedADCs (ADS54J60s). This method would require a customizedversion of the ADXL335 with the proof mass fixed in place,amplification so the noise distribution covers at least 90 % ofthe ADC range and no low pass filtering on its output.VII. A
CKNOWLEDGEMENTS
This research is supported by an Alan Turing Institute awardTU/B/000096 under EPSRC grant EP/N510129/1, by EPSRCgrant EP/R022534/1, by EPSRC grant EP/V004654/1, and byEPSRC grant EP/L015889/1.R
EFERENCES[1] D. B. Thomas, L. Howes, and W. Luk, “A comparison of cpus, gpus,fpgas, and massively parallel processor arrays for random numbergeneration,” in
Proceedings of the ACM/SIGDA international symposiumon Field programmable gate arrays , 2009, pp. 63–72.[2] D. Thomas and W. Luk, “Efficient hardware generation of randomvariates with arbitrary distributions,” in , 2006, pp. 57–66.[3] D. E. Eastlake, J. I. Schiller, and S. Crocker, “Randomness requirementsfor security,”
RFC , vol. 4086, pp. 1–48, 2005.[4] C. Hennebert, H. Hossayni, and C. Lauradoux, “Entropy harvestingfrom physical sensors,” in
Proceedings of the sixth ACM conferenceon Security and privacy in wireless and mobile networks , 2013, pp.149–154.[5] K. Wallace, K. Moran, E. Novak, G. Zhou, and K. Sun, “Toward sensor-based random number generation for mobile and iot devices,”
IEEEInternet of Things Journal , vol. 3, no. 6, pp. 1189–1201, 2016.[6] S.-M. Cho, E. Hong, and S.-H. Seo, “Random number generator usingsensors for drone,”
IEEE Access , vol. 8, pp. 30 343–30 354, 2020.[7] K. Lee and M. Lee, “True random number generator (trng) utilizingfm radio signals for mobile and embedded devices in multi-access edgecomputing,”
Sensors , vol. 19, no. 19, p. 4130, 2019.[8] B. Perach and S. Kvatinsky, “An asynchronous and low-power truerandom number generator using stt-mtj,”
IEEE Transactions on VeryLarge Scale Integration Systems , vol. 27, no. 11, pp. 2473–2484, 2019.[9] S. Srinivasan, S. Mathew, R. Ramanarayanan, F. Sheikh, M. Anders,H. Kaul, V. Erraguntla, R. Krishnamurthy, and G. Taylor, “2.4 ghz 7mwall-digital pvt-variation tolerant true random number generator in 45nmcmos,” in . IEEE, 2010, pp. 203–204. [10] H. Jiang, D. Belkin, S. E. Savelev, S. Lin, Z. Wang, Y. Li, S. Joshi,R. Midya, C. Li, M. Rao et al. , “A novel true random number generatorbased on a stochastic diffusive memristor,”
Nature communications ,vol. 8, no. 1, pp. 1–9, 2017.[11] D. G. Marangon, G. Vallone, and P. Villoresi, “Source-device-independent ultrafast quantum random number generation,”
Physicalreview letters , vol. 118, no. 6, p. 060503, 2017.[12] X. Zhang, R. Bashizade, C. LaBoda, C. Dwyer, and A. R. Lebeck, “Ar-chitecting a stochastic computing unit with molecular optical devices,”in . IEEE, 2018, pp. 301–314.[13] M. Avesani, D. G. Marangon, G. Vallone, and P. Villoresi, “Source-device-independent heterodyne-based quantum random number genera-tor at 17 gbps,”
Nature communications , vol. 9, no. 1, pp. 1–7, 2018.[14] L. Nguyen, P. Rehain, Y. M. Sua, and Y.-P. Huang, “Programmablequantum random number generator without postprocessing,”
Opticsletters , vol. 43, no. 4, pp. 631–634, 2018.[15] A. Tomasi, A. Meneghetti, N. Massari, L. Gasparini, D. Rucatti, andH. Xu, “Model, validation, and characterization of a robust quantumrandom number generator based on photon arrival time comparison,”
Journal of Lightwave Technology , vol. 36, no. 18, pp. 3843–3854, 2018.[16] Y. Hu, Y. Wu, Y. Chen, G. C. Wan, and M. S. Tong, “Gaussian randomnumber generator: Implemented in fpga for quantum key distribution,”
International Journal of Numerical Modelling: Electronic Networks,Devices and Fields , vol. 32, no. 3, p. e2554, 2019.[17] X. Guo, C. Cheng, M. Wu, Q. Gao, P. Li, and Y. Guo, “Parallel real-timequantum random number generator,”
Optics letters , vol. 44, no. 22, pp.5566–5569, 2019.[18] L. Devroye, “Non-Uniform Random Variate Generation.” Springer-Verlag, 1986, p. 42, ISBN: 1461386454.[19] D. B. Thomas and W. Luk, “Resource efficient generators for thefloating-point uniform and exponential distributions,” in , 2008, pp. 102–107.[20] J. R. van Peborgh Gooch and M. J. Hounslow, “Monte carlo simulationof size-enlargement mechanisms in crystallization,”
AIChE Journal ,vol. 42, no. 7, pp. 1864–1874, 1996.[21] R. L. Cook, “Stochastic sampling in computer graphics,”
ACM Trans.Graph. , vol. 5, no. 1, pp. 51–72, 1986.[22] G. L. Zhang, P. H. W. Leong, C. H. Ho, K. H. Tsoi, C. C. C. Cheung,D. Lee, R. C. C. Cheung, and W. Luk, “Reconfigurable acceleration formonte carlo based financial simulation,” in
Proceedings. 2005 IEEEInternational Conference on Field-Programmable Technology, 2005. ,2005, pp. 215–222.[23] B. Lambert, “A students guide to bayesian statistics.” SAGE, 2018,pp. 23–50, ISBN: 1473916364.[24] J. M. Pak, C. K. Ahn, Y. S. Shmaliy, and M. T. Lim, “Improvingreliability of particle filter-based localization in wireless sensor networksvia hybrid particle/fir filtering,”
IEEE Transactions on Industrial Infor-matics , vol. 11, no. 5, pp. 1089–1098, 2015.[25] A. Doucet, N. De Freitas, and N. Gordon, “An introduction to sequentialmonte carlo methods,” in
Sequential Monte Carlo methods in practice .Springer, 2001, pp. 3–14.[26] S. Kullback and R. A. Leibler, “On information and sufficiency,”
Ann.Math. Statist. , vol. 22, no. 1, pp. 79–86, 03 1951.[27] P. Stanley-Marbell and M. Rinard, “Warp: A hardware platform forefficient multimodal sensing with adaptive approximation,”