Abstract

Parallel Monte Carlo simulations often expose faults in random number generators

Full PDF

aa r X i v : . [ c s . D C ] A p r Why The Results of Parallel and Serial Monte CarloSimulations May Diﬀer

Abstract

Parallel Monte Carlo simulations often expose faults in random number generators

A parallel Monte Carlo simulation is a sampling of a stochastic process when this sam-pling is performed on concurrently active multiple processors. The counterpart serial MonteCarlo simulation samples the same stochastic process but using a uniprocessor. (Only thesimulations in which the process being sampled is supposed to be the same in both cases arediscussed here.) Yet, in practice, it is often observed (but not as often reported) that thestochastic properties of these two processes diﬀer.A typical example of such state of aﬀairs is reported in [MR03], where a substantialdiﬀerence between statistics obtained using a parallel algorithm introduced in [L87] andthe comparable statistics obtained using the corresponding serial algorithm introduced in[BKL75] is observed. The authors in [MR03] propose to compensate for the alleged damagedue to parallelization (the true origin of which appears to be unknown to them, though somespeculations are oﬀered) with another damage that ”bends the structure” in the oppositedirection: they modify the algorithm in [L87] so as to ﬁt the two outcomes.A parallelization being done correctly, as in [L87], a mathematical IF-THEN theoremcan be proven that assures the two stochastic processes to be identical. Yet a computerexperiment shows that the processes diﬀer. This may only mean the IF conditions of thetheorem are not satisﬁed in the experiment. Where can the IF conditions fail? The stochas-tic process generated by either serial or parallel simulation is formed by feeding a sourcestochastic process, based on a random number generator, to the deterministic mechanismof the algorithm. The theorem claims the two resulting processes, for the parallel and forthe serial algorithm, to be identical, provided the source process satisﬁes certain properties.That the resulting processes turn out to diﬀer may only mean that the source process doesnot satisfy the assumed properties.The previous simple argument is general, applicable to many simulations. For example, adiﬀerence in the simulation outcomes between a parallel and the serial simulation was noticed1n [KLE96]. The simulation task in [KLE96] was rather diﬀerent from that in [MR03], butthe reason for the fault was the same: bad random number generator.It is a well-known, textbook recommendation that a good random number generator hasto be employed if one expects to obtain statistically valid results in Monte Carlo simulations.A new ”twist” is that if the random number generator is not good, the faulty results willquite probably be exposed during the parallel simulations but not necessarily during theserial ones. The faults will be detected by comparing the parallel runs with the serial runsor by comparing parallel runs among themselves when those runs are made under diﬀerentmappings of the task onto the parallel machine and/or using diﬀerent numbers of processorsto host the task. By contrast, in only-serial Monte Carlo simulations, there is usually noinherent mechanism to detect statistical errors. Without obtaining comparable results in adiﬀerent way, such as via analytical estimates or by using diﬀerently arranged simulations,the errors have a good chance to remain unnoticed.For example, in [MR03], not only the reported statistics obtained in parallel runs isincorrect, as it is noticed in [MR03], but the statistics obtained in serial runs has to befaulty too, as long as the same faulty random number generator is used. Yet, the authorsin [MR03] ”bend” only the parallel algorithm to eliminate the diﬀerences. Apparently theytrust the serial results.Now, as we diagnosed the ailment, let us suggest a cure that does not require one to”bend” good algorithms. In the overwhelming majority of Monte Carlo simulations, thesource stochastic process mentioned above is a sequence of independent samples of a randomvalue uniformly distributed on the interval (0 , − log ( x ) where x is sampled on (0 , x are less probable than larger values, then the reported clock is slow and with eachtick the average time lag of the reported clock vs. the correct clock increases. This wouldmake the results of both parallel and serial runs inaccurate but in diﬀerent degrees becauseparallel and serial runs diﬀer in number of ticks and in size of increments for the same timeincrement of the correct clock.A simple way to test the uniformity of the distribution of variable x is to use anothervariable y = f ( x ) instead of x where f transforms interval (0 ,

1) onto itself without changinguniformity. For example, take f ( x ) = f ( x ) = 1 − x or take f ( x ) = f ( x ) = x + 1 / x < / f ( x ) = x − / x ≥ / f ( x ) = f ( f ( x )) and so on.If statistical averages change, the distribution of x is not uniform (and/or other faults arepresent in the random number generator).A number of ways exist to ﬁx the distribution non-uniformity. A simple and practical ﬁxis as follows. Recognize a subinterval ( a, b ), a < b , of the interval (0 ,

1) such that the densityof the distribution is satisfactorily uniform on ( a, b ). Instead of feeding in variable x , feedin variable y derived from x as follows: when the drawn x does not belong to ( a, b ), discardthat x and draw again, and when inequality a < x < b holds, take y = ( x − a ) / ( b − a ).2 eferences [MR03] P. A. Mahobar and A. D. Rollett, Asynchronous Parallel Potts Model for Simu-lation of Grain Growth, Materials Science and Technology (MS and T ’03)’ con-ference incorporating ’Modeling, Microstructure and Control in Ferrous and Non-Ferrous Industry’ symposium, ed. F. Kongoli et al., TMS and ISS, Chicago, Nov.9 - 12, pp. 399 - 412 (2003). Also see: mimp.mems.cmu.edu/papers/2003 20.pdf [L87] B. D. Lubachevsky, Eﬃcient Parallel Simulations of Asynchronous Cellular Ar-rays, Complex Systems , (1987), no. 6, 1099-1123, S.Wolfram (ed.). Also see: arXiv:cs/0502039 .[BKL75] A. B. Bortz, et al., A New Algorithm for Monte Carlo Simulation ofIsing Spin Systems, J. Comp. Physics , (1975), pp.10-18. Also see: tphex.hep.by/Journal of Computational Physics/Volume 17/1/2.pdf [KLE96] K. Kumaran et al., Massively Parallel Simulations of ATM Systems, 10thWorkshop on Parallel and Distributed Simulations (PADS’96). Also see: