ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran
PParaMonte: A high-performance serial/parallel MonteCarlo simulation library for C, C++, Fortran
Amir Shahmoradi
1, 2 and Fatemeh Bagheri Department of Physics, The University of Texas, Arlington, TX Data Science Program, TheUniversity of Texas, Arlington, TX
DOI:Software • Review• Repository• Archive
Editor:Submitted:
29 September 2020
Published:License
Authors of papers retaincopyright and release the workunder a Creative CommonsAttribution 4.0 InternationalLicense (CC BY 4.0).
Summary
ParaMonte (standing for Parallel Monte Carlo) is a serial and MPI/Coarray-parallelized library ofMonte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions, inparticular, the posterior distributions of Bayesian models in data science, Machine Learning, andscientific inference. The ParaMonte library has been developed with the design goal of unifyingthe automation , accessibility , high-performance , scalability , and reproducibility of MonteCarlo simulations. The current implementation of the library includes ParaDRAM , a
Para llel D elyaed- R ejection A daptive M etropolis Markov Chain Monte Carlo sampler, accessible from awide range of programming languages including C, C++, Fortran, with a unified ApplicationProgramming Interface and simulation environment across all supported programming languages.The ParaMonte library is MIT-licensed and is permanently located and maintained at https://github.com/cdslaborg/paramonte. Statement of need
Monte Carlo simulation techniques, in particular, the Markov Chain Monte Carlo (MCMC) areamong the most popular methods of quantifying uncertainty in scientific inference problems.Extensive work has been done over the past decades to develop Monte Carlo simulationprogramming environments that aim to partially or fully automate the problem of uncertaintyquantification via Markov Chain Monte Carlo simulations. Example open-source libraries inC/C++/Fortran include
MCSim in C (Bois, 2009),
MCMCLib and
QUESO (Prudencio & Schulz,2012) libraries in C++, and mcmcf90 in Fortran (Haario, Laine, Mira, & Saksman, 2006).These packages, however, mostly serve the users of one particular programming languageenvironment. Some are able to perform only serial simulations while others are inherentlyparallelized. Furthermore, majority of the existing packages have significant dependencieson other external libraries. Such dependencies can potentially make the build process of thepackages an extremely complex and arduous task due to software version incompatibilities, aphenomenon that has become known as the dependency-hell among software developers.The ParaMonte library presented in this work aims to address the aforementioned problems byproviding a standalone high-performance serial/parallel Monte Carlo simulation environmentwith the following principal design goals,•
Full automation of the library’s build process and all Monte Carlo simulations toensure the highest level of user-friendliness of the library and minimal time investmentrequirements for building, running, and post-processing of the Monte Carlo and MCMCsimulations.•
Interoperability of the core of the library with as many programming languages ascurrently possible, including C, C++, Fortran, as well as MATLAB and Python via the
Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ a r X i v : . [ c s . M S ] S e p araMonte::MATLAB and ParaMonte::Python libraries.•
High-Performance , meticulously-low-level, implementation of the library that guar-antees the fastest-possible Monte Carlo simulations , without compromising thereproducibility of the simulation or the extensive external reporting of the simulationprogress and results.•
Parallelizability of all simulations via both MPI and PGAS/Coarray communicationparadigms while requiring zero-parallel-coding efforts from the user .• Zero external-library dependencies to ensure hassle-free library builds and MonteCarlo simulation runs.•
Fully-deterministic reproducibility and automatically-enabled restart functionality for all ParaMonte simulations, up to 16 digits of decimal precision if requested by the user.•
Comprehensive-reporting and post-processing of each simulation and its results, aswell as their efficient compact storage in external files to ensure the simulation resultswill be comprehensible and reproducible at any time in the distant future.
The Build process
The ParaMonte library is permanently located on GitHub and is available to view at: https://github.com/cdslaborg/paramonte. The build process of the library is fully automated.Extensive detailed instructions are also available on the documentation website of the library.For the convenience of the users, each release of the ParaMonte library’s source code alsoincludes prebuilt, ready-to-use, copies of the library for x64 architecture on Windows, Linux,macOS, in all supported programming languages, including C, C++, Fortran. These prebuiltlibraries automatically ship with language-specific example codes and build scripts that fullyautomate the process of building and running the examples.Where the prebuilt libraries cannot be used, the users can simply call the Bash and Batchbuild-scripts that are provided in the source code of the library to fully automate the buildprocess of the library. The ParaMonte build scripts are capable of automatically installingany missing components that may be required for the library’s successful build, including theGNU C/C++/Fortran compilers and the cmake build software, as well as the MPI/Coarrayparallelism libraries. All of these tasks are performed with the explicit permission granted bythe user. The ParaMonte build scripts are heavily inspired by the impressive
OpenCoarrays software (Fanfarillo et al., 2014) developed and maintained by the Sourcery Institute.
The ParaDRAM sampler
The current implementation of the ParaMonte library includes the
Para llel D elayed- R ejection A daptive M etropolis Markov Chain Monte Carlo ( ParaDRAM ) sampler (Shahmoradi & Bagheri,2020), (Shahmoradi & Bagheri, 2020a), (Shahmoradi & Bagheri, 2020b), (Kumbhare &Shahmoradi, 2020), and several other samplers whose development is in progress as of writingthis manuscript. The ParaDRAM algorithm is a variant of the DRAM algorithm of (Haario etal., 2006) and can be used in serial or parallel mode.In brief, the ParaDRAM sampler continuously adapts the shape and scale of the proposaldistribution throughout the simulation to increase the efficiency of the sampler. This is incontrast to the traditional MCMC samplers where the proposal distribution remains fixed
Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ *_report.txt files of everysimulation performed by the ParaMonte samplers.
Monitoring Convergence
Although the continuous adaptation of the proposal distribution increases the sampling efficiencyof the ParaDRAM sampler, it breaks the ergodicity and reversibility conditions of the MarkovChain Monte Carlo methods. Nevertheless, the convergence of the resulting pseudo-Markovchain to the target density function is guaranteed as long as the amount of adaptation ofthe proposal distribution decreases monotonically throughout the simulation (Shahmoradi &Bagheri, 2020).Ideally, the diminishing adaptation criterion of the adaptive MCMC methods can be monitored bymeasuring the total variation distance (TVD) between subsequent adaptively-updated proposaldistributions. Except for trivial cases, however, the analytical or numerical computation ofTVD is almost always intractable. To circumvent this problem, we have introduced a noveltechnique in the ParaDRAM algorithm to continuously measure the amount of adaptation ofthe proposal distribution throughout adaptive MCMC simulations. This is done by computingan upper bound on the value of TVD instead of a direct computation of the TVD.The mathematics of computing this
AdaptationMeasure upper bound is extensively detailed in(Shahmoradi & Bagheri, 2020), (Shahmoradi & Bagheri, 2020a). The computed upper bound isalways a real number between 0 and 1, with 0 indicating the identity of two proposal distributionsand 1 indicating completely different proposal distributions. The
AdaptationMeasure isautomatically computed with every proposal adaptation and is reported to the output chainfiles for all ParaDRAM simulations. It can be subsequently visualized to ensure the diminishingadaptation criterion of the ParaDRAM sampler.
Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ igure 1: An illustration of the diminishing adaptation of the proposal distribution of the ParaDRAMsampler for an example problem of sampling a 4-dimensional Multivariate Normal distribution. Themonotonically-decreasing adaptivity evidenced in this plot guarantees the Markovian property and theasymptotic ergodicity of the resulting Markov chain from the ParaDRAM sampler.
Figure Figure 1 depicts the evolution of the adaptation measure for an example problemof sampling a 4-dimensional MultiVariate Normal Distribution. A non-diminishing evolutionof the
AdaptationMeasure can be also a strong indicator of the lack of convergence theMarkov chain to the target density. The evolution of the covariance matrices of the proposaldistribution for the same sampling problem is shown in Figure Figure 2.
Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ igure 2: A 3-dimensional illustration of the dynamic adaptation of the covariance matrix of the4-dimensional MultiVariate Normal (MVN) proposal distribution of the ParaDRAM sampler for anexample problem of sampling a 4-dimensional MVN distribution.
Parallelism
Two modes of parallelism are currently implemented for all ParaMonte samplers,• The
Perfect Parallelism (multi-Chain): In this mode, independent instances of theadaptive MCMC sampler run concurrently. Once all simulations are complete, theParaDRAM sampler compares the output samples from all processors with each other toensure no evidence for a lack of convergence to the target density exists in any of theoutput chains.• The
Fork-Join Parallelism (single-Chain): In this mode, a single processor is responsiblefor collecting and dispatching information, generated by all processors, to create a singleMarkov Chain of all visited states with the help of all processors.For each parallel simulation in the Fork-Join mode, the ParaMonte samplers automaticallycompute the speedup gained compared to the serial mode. In addition, the speedup for awide range of the number of processors is also automatically computed and reported in theoutput *_report.txt files that are automatically generated for all simulations. The processorcontributions to the construction of each chain are also reported along with output visitedstates in the output *_chain.* files. These reports are particularly useful for finding theoptimal number of processors for a given problem at hand, by first running a short simulationto predict the optimal number of processors from the sampler’s output information, followed bythe production run using the optimal number of processors. For a comprehensive descriptionand algorithmic details see (Shahmoradi & Bagheri, 2020), (Shahmoradi & Bagheri, 2020a).
Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ igure 3: An illustration of the contributions of 512 Intel Xeon Phi 7250 processors to a ParaMonte-ParaDRAM simulation parallelized via the Fork-Join paradigm. The predicted best-fit Geometricdistribution from the post-processing phase of the ParaDRAM simulation is shown by the black line.The data used in this figure is automatically generated for each parallel simulation performed via anyof the ParaMonte samplers.
As we argue in (Shahmoradi & Bagheri, 2020, Shahmoradi & Bagheri (2020a)), the contributionof the processors to the construction of a Markov Chain in the Fork-Join parallelism paradigmfollows a Geometric distribution. Figure Figure 3 depicts the processor contributions to anexample ParaDRAM simulation of a variant of Himmelblau’s function and the Geometric fit tothe distribution of processor contributions. Figure Figure 4 illustrates an example predictedstrong-scaling behavior of the sampler and the predicted speedup by the sampler for a rangeof processor counts.
Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ igure 4: A comparison of the actual strong scaling behavior of an example ParaMonte-ParaDRAMsimulation from 1 to 1088 processors with the strong-scaling behavior predicted during the post-processing phases of ParaDRAM simulations. The data used in this figure is automatically generatedfor each parallel simulation performed via any of the ParaMonte samplers.
Efficient compact storage of the output chain
Efficient continuous external storage of the output of ParaDRAM simulations is essential forboth the post-processing of the results and the restart functionality of the simulations, shouldany interruptions happen at runtime. However, as the number of dimensions or the complexityof the target density increases, such external storage of the output can easily become a challengeand a bottleneck in the speed of an otherwise high-performance ParaDRAM sampler. Giventhe currently-available computational technologies, input/ouput (IO) to external hard-drivescan be 2-3 orders of magnitude slower than the Random Access Memory (RAM) storage.To alleviate the effects of such external-IO speed bottlenecks, the ParaDRAM sampler imple-ments a novel method of carefully storing the resulting MCMC chains in a small compact ,yet ASCII human-readable, format in external output files. This compact-chain (as opposedto the verbose (Markov)-chain ) format leads to significant speedup of the simulation whilerequiring 4-100 times less external memory to store the chains in the external output files.The exact amount of reduction in the external memory usage depends on the efficiency ofthe sampler. Additionally, the format of output file can be set by the user to binary , furtherreducing the memory foot-print of the simulation while increasing the simulation speed. Theimplementation details of this compact-chain format are extensively discussed in (Shahmoradi& Bagheri, 2020), (Shahmoradi & Bagheri, 2020a).
Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ ample refinement In addition to the output *_progress.txt , *_report.txt , *_chain.txt files, eachParaDRAM sampling generates a *_sample.txt file containing the final refined decorre-lated sample from the objective function. To do so, the sampler computes the IntegratedAutoCorrelation (IAC) of the chain along each dimension of the domain of the objective function.However, the majority of existing methods for the calculation of IAC tend to underestimatethis quantity.Therefore, to ensure the final sample resulting from a ParaDRAM simulation is fully decorrelated,we have implemented a novel approach that aggressively and recursively refines the resultingMarkov chain from a ParaDRAM simulation until no trace of autocorrelation is left in the finalrefined sample. This approach optionally involves two separate phases of sample refinement,1. At the first stage, the Markov chain is decorrelated recursively, for as long as needed,based on the IAC of its compact format, where only the the uniquely-visited states arekept in the (compact) chain.2. Once the Markov chain is refined such that its compact format is fully decorrelated, thesecond phase of decorrelation begins, during which the Markov chain is decorrelatedbased on the IAC of the chain in its verbose (Markov) format. This process is repeatedrecursively for as long as there is any residual autocorrelation in the refined sample.We have empirically noticed, via numerous experimentations, that this recursive aggressiveapproach is superior to other existing methods of sample refinement in generating final refinedsamples that are neither too small in size nor autocorrelated. The restart functionality
Each ParaMonte sampler is automatically capable of restarting an existing interrupted simula-tion, whether in serial or parallel. All that is required is to rerun the interrupted simulationwith the same output file names. The ParaMonte samplers automatically detect the presenceof an incomplete simulation in the output files and restart the simulation from where it was leftoff. Furthermore, if the user sets the seed of the random number generator of sampler prior torunning the simulation, the ParaMonte samplers are capable of regenerating the samechain that would have been produced if the simulation had not been interrupted inthe first place . Such fully-deterministic reproducibility into-the-future is guaranteed with 16digits of decimal precision for the results of any ParaMonte simulation. To our knowledge,this is a unique feature of the ParaMonte library that does not appear to exist in any of thecontemporary libraries for Markov Chain Monte Carlo simulations.
Documentation and Repository
Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ cknowledgements We thank the Texas Advanced Computing Center for providing the supercomputer time fortesting and development of this library.
References
Bois, F. Y. (2009). GNU mcsim: Bayesian statistical inference for sbml-coded systems biologymodels.
Bioinformatics , (11), 1453–1454.Fanfarillo, A., Burnus, T., Cardellini, V., Filippone, S., Nagle, D., & Rouson, D. (2014). Open-Coarrays: Open-source transport layers supporting coarray fortran compilers. In Proceedingsof the 8th international conference on partitioned global address space programming models (pp. 1–11).Haario, H., Laine, M., Mira, A., & Saksman, E. (2006). DRAM: Efficient adaptive mcmc.
Statistics and computing , (4), 339–354.Kumbhare, S., & Shahmoradi, A. (2020). Parallel adapative monte carlo optimization, sampling,and integration in c/c++, fortran, matlab, and python. Bulletin of the American PhysicalSociety .Osborne, J. A., Shahmoradi, A., & Nemiroff, R. J. (2020). A multilevel empirical bayesianapproach to estimating the unknown redshifts of 1366 batse catalog long-duration gamma-raybursts.Osborne, J. A., Shahmoradi, A., & Nemiroff, R. J. (2020). A Multilevel Empirical BayesianApproach to Estimating the Unknown Redshifts of 1366 BATSE Catalog Long-DurationGamma-Ray Bursts. arXiv e-prints , arXiv:2006.01157.Prudencio, E., & Schulz, K. (2012). The parallel C++ statistical library queso: Quantificationof uncertainty for estimation, simulation and optimization. In M. Alexander, P. D’Ambra, A.Belloum, G. Bosilca, M. Cannataro, M. Danelutto, B. Martino, et al. (Eds.),
Euro-par 2011:Parallel processing workshops , Lecture notes in computer science (Vol. 7155, pp. 398–407).Springer Berlin Heidelberg. ISBN: 978-3-642-29736-6Shahmoradi, A. (2013). A multivariate fit luminosity function and world model for longgamma-ray bursts.
The Astrophysical Journal , (2), 111.Shahmoradi, A. (2013). Gamma-Ray bursts: Energetics and Prompt Correlations. arXive-prints , arXiv:1308.1097.Shahmoradi, A., & Bagheri, F. (2020). ParaDRAM: A cross-language toolbox for parallelhigh-performance delayed-rejection adaptive metropolis markov chain monte carlo simulations. arXiv preprint arXiv:2008.09589 .Shahmoradi, A., & Bagheri, F. (2020a). ParaDRAM: A Cross-Language Toolbox for Par-allel High-Performance Delayed-Rejection Adaptive Metropolis Markov Chain Monte CarloSimulations. arXiv e-prints , arXiv:2008.09589.Shahmoradi, A., & Bagheri, F. (2020b, August). ParaMonte: Parallel Monte Carlo library.Shahmoradi, A., & Nemiroff, R. (2014). Classification and energetics of cosmological gamma-ray bursts. In American astronomical society meeting abstracts (Vol. 223).Shahmoradi, A., & Nemiroff, R. J. (2015). Short versus long gamma-ray bursts: A comprehen-sive study of energetics and prompt gamma-ray correlations.
Monthly Notices of the Royal
Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ stronomical Society , (1), 126–143.Shahmoradi, A., & Nemiroff, R. J. (2019). A Catalog of Redshift Estimates for 1366 BATSELong-Duration Gamma-Ray Bursts: Evidence for Strong Selection Effects on the Phenomeno-logical Prompt Gamma-Ray Correlations. arXiv e-prints , arXiv:1903.06989. Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.
Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/