[PDF] ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran

Abstract

ParaMonte (standing for Parallel Monte Carlo) is a serial and MPI/Coarray-parallelized library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in data science, Machine Learning, and scientific inference. The ParaMonte library has been developed with the design goal of unifying the **automation**, **accessibility**, **high-performance**, **scalability**, and **reproducibility** of Monte Carlo simulations. The current implementation of the library includes **ParaDRAM**, a **Para**llel **D**elyaed-**R**ejection **A**daptive **M**etropolis Markov Chain Monte Carlo sampler, accessible from a wide range of programming languages including C, C++, Fortran, with a unified Application Programming Interface and simulation environment across all supported programming languages. The ParaMonte library is MIT-licensed and is permanently located and maintained at [this https URL](this https URL).

Full PDF

PParaMonte: A high-performance serial/parallel MonteCarlo simulation library for C, C++, Fortran

Amir Shahmoradi

1, 2 and Fatemeh Bagheri Department of Physics, The University of Texas, Arlington, TX Data Science Program, TheUniversity of Texas, Arlington, TX

DOI:Software • Review• Repository• Archive

Editor:Submitted:

29 September 2020

Published:License

Authors of papers retaincopyright and release the workunder a Creative CommonsAttribution 4.0 InternationalLicense (CC BY 4.0).

Summary

Para llel D elyaed- R ejection A daptive M etropolis Markov Chain Monte Carlo sampler, accessible from awide range of programming languages including C, C++, Fortran, with a uniﬁed ApplicationProgramming Interface and simulation environment across all supported programming languages.The ParaMonte library is MIT-licensed and is permanently located and maintained at https://github.com/cdslaborg/paramonte. Statement of need

Monte Carlo simulation techniques, in particular, the Markov Chain Monte Carlo (MCMC) areamong the most popular methods of quantifying uncertainty in scientiﬁc inference problems.Extensive work has been done over the past decades to develop Monte Carlo simulationprogramming environments that aim to partially or fully automate the problem of uncertaintyquantiﬁcation via Markov Chain Monte Carlo simulations. Example open-source libraries inC/C++/Fortran include

MCSim in C (Bois, 2009),

MCMCLib and

QUESO (Prudencio & Schulz,2012) libraries in C++, and mcmcf90 in Fortran (Haario, Laine, Mira, & Saksman, 2006).These packages, however, mostly serve the users of one particular programming languageenvironment. Some are able to perform only serial simulations while others are inherentlyparallelized. Furthermore, majority of the existing packages have signiﬁcant dependencieson other external libraries. Such dependencies can potentially make the build process of thepackages an extremely complex and arduous task due to software version incompatibilities, aphenomenon that has become known as the dependency-hell among software developers.The ParaMonte library presented in this work aims to address the aforementioned problems byproviding a standalone high-performance serial/parallel Monte Carlo simulation environmentwith the following principal design goals,•

Full automation of the library’s build process and all Monte Carlo simulations toensure the highest level of user-friendliness of the library and minimal time investmentrequirements for building, running, and post-processing of the Monte Carlo and MCMCsimulations.•

Interoperability of the core of the library with as many programming languages ascurrently possible, including C, C++, Fortran, as well as MATLAB and Python via the

Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ a r X i v : . [ c s . M S ] S e p araMonte::MATLAB and ParaMonte::Python libraries.•

High-Performance , meticulously-low-level, implementation of the library that guar-antees the fastest-possible Monte Carlo simulations , without compromising thereproducibility of the simulation or the extensive external reporting of the simulationprogress and results.•

Parallelizability of all simulations via both MPI and PGAS/Coarray communicationparadigms while requiring zero-parallel-coding eﬀorts from the user .• Zero external-library dependencies to ensure hassle-free library builds and MonteCarlo simulation runs.•

Fully-deterministic reproducibility and automatically-enabled restart functionality for all ParaMonte simulations, up to 16 digits of decimal precision if requested by the user.•

Comprehensive-reporting and post-processing of each simulation and its results, aswell as their eﬃcient compact storage in external ﬁles to ensure the simulation resultswill be comprehensible and reproducible at any time in the distant future.

The Build process

The ParaMonte library is permanently located on GitHub and is available to view at: https://github.com/cdslaborg/paramonte. The build process of the library is fully automated.Extensive detailed instructions are also available on the documentation website of the library.For the convenience of the users, each release of the ParaMonte library’s source code alsoincludes prebuilt, ready-to-use, copies of the library for x64 architecture on Windows, Linux,macOS, in all supported programming languages, including C, C++, Fortran. These prebuiltlibraries automatically ship with language-speciﬁc example codes and build scripts that fullyautomate the process of building and running the examples.Where the prebuilt libraries cannot be used, the users can simply call the Bash and Batchbuild-scripts that are provided in the source code of the library to fully automate the buildprocess of the library. The ParaMonte build scripts are capable of automatically installingany missing components that may be required for the library’s successful build, including theGNU C/C++/Fortran compilers and the cmake build software, as well as the MPI/Coarrayparallelism libraries. All of these tasks are performed with the explicit permission granted bythe user. The ParaMonte build scripts are heavily inspired by the impressive

OpenCoarrays software (Fanfarillo et al., 2014) developed and maintained by the Sourcery Institute.

The ParaDRAM sampler

The current implementation of the ParaMonte library includes the

Para llel D elayed- R ejection A daptive M etropolis Markov Chain Monte Carlo ( ParaDRAM ) sampler (Shahmoradi & Bagheri,2020), (Shahmoradi & Bagheri, 2020a), (Shahmoradi & Bagheri, 2020b), (Kumbhare &Shahmoradi, 2020), and several other samplers whose development is in progress as of writingthis manuscript. The ParaDRAM algorithm is a variant of the DRAM algorithm of (Haario etal., 2006) and can be used in serial or parallel mode.In brief, the ParaDRAM sampler continuously adapts the shape and scale of the proposaldistribution throughout the simulation to increase the eﬃciency of the sampler. This is incontrast to the traditional MCMC samplers where the proposal distribution remains ﬁxed

Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ *_report.txt ﬁles of everysimulation performed by the ParaMonte samplers.

Monitoring Convergence

Although the continuous adaptation of the proposal distribution increases the sampling eﬃciencyof the ParaDRAM sampler, it breaks the ergodicity and reversibility conditions of the MarkovChain Monte Carlo methods. Nevertheless, the convergence of the resulting pseudo-Markovchain to the target density function is guaranteed as long as the amount of adaptation ofthe proposal distribution decreases monotonically throughout the simulation (Shahmoradi &Bagheri, 2020).Ideally, the diminishing adaptation criterion of the adaptive MCMC methods can be monitored bymeasuring the total variation distance (TVD) between subsequent adaptively-updated proposaldistributions. Except for trivial cases, however, the analytical or numerical computation ofTVD is almost always intractable. To circumvent this problem, we have introduced a noveltechnique in the ParaDRAM algorithm to continuously measure the amount of adaptation ofthe proposal distribution throughout adaptive MCMC simulations. This is done by computingan upper bound on the value of TVD instead of a direct computation of the TVD.The mathematics of computing this

AdaptationMeasure upper bound is extensively detailed in(Shahmoradi & Bagheri, 2020), (Shahmoradi & Bagheri, 2020a). The computed upper bound isalways a real number between 0 and 1, with 0 indicating the identity of two proposal distributionsand 1 indicating completely diﬀerent proposal distributions. The

AdaptationMeasure isautomatically computed with every proposal adaptation and is reported to the output chainﬁles for all ParaDRAM simulations. It can be subsequently visualized to ensure the diminishingadaptation criterion of the ParaDRAM sampler.

Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ igure 1: An illustration of the diminishing adaptation of the proposal distribution of the ParaDRAMsampler for an example problem of sampling a 4-dimensional Multivariate Normal distribution. Themonotonically-decreasing adaptivity evidenced in this plot guarantees the Markovian property and theasymptotic ergodicity of the resulting Markov chain from the ParaDRAM sampler.

Figure Figure 1 depicts the evolution of the adaptation measure for an example problemof sampling a 4-dimensional MultiVariate Normal Distribution. A non-diminishing evolutionof the

AdaptationMeasure can be also a strong indicator of the lack of convergence theMarkov chain to the target density. The evolution of the covariance matrices of the proposaldistribution for the same sampling problem is shown in Figure Figure 2.

Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ igure 2: A 3-dimensional illustration of the dynamic adaptation of the covariance matrix of the4-dimensional MultiVariate Normal (MVN) proposal distribution of the ParaDRAM sampler for anexample problem of sampling a 4-dimensional MVN distribution.

Parallelism

Two modes of parallelism are currently implemented for all ParaMonte samplers,• The

Perfect Parallelism (multi-Chain): In this mode, independent instances of theadaptive MCMC sampler run concurrently. Once all simulations are complete, theParaDRAM sampler compares the output samples from all processors with each other toensure no evidence for a lack of convergence to the target density exists in any of theoutput chains.• The

Fork-Join Parallelism (single-Chain): In this mode, a single processor is responsiblefor collecting and dispatching information, generated by all processors, to create a singleMarkov Chain of all visited states with the help of all processors.For each parallel simulation in the Fork-Join mode, the ParaMonte samplers automaticallycompute the speedup gained compared to the serial mode. In addition, the speedup for awide range of the number of processors is also automatically computed and reported in theoutput *_report.txt ﬁles that are automatically generated for all simulations. The processorcontributions to the construction of each chain are also reported along with output visitedstates in the output *_chain.* ﬁles. These reports are particularly useful for ﬁnding theoptimal number of processors for a given problem at hand, by ﬁrst running a short simulationto predict the optimal number of processors from the sampler’s output information, followed bythe production run using the optimal number of processors. For a comprehensive descriptionand algorithmic details see (Shahmoradi & Bagheri, 2020), (Shahmoradi & Bagheri, 2020a).

Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ igure 3: An illustration of the contributions of 512 Intel Xeon Phi 7250 processors to a ParaMonte-ParaDRAM simulation parallelized via the Fork-Join paradigm. The predicted best-ﬁt Geometricdistribution from the post-processing phase of the ParaDRAM simulation is shown by the black line.The data used in this ﬁgure is automatically generated for each parallel simulation performed via anyof the ParaMonte samplers.

As we argue in (Shahmoradi & Bagheri, 2020, Shahmoradi & Bagheri (2020a)), the contributionof the processors to the construction of a Markov Chain in the Fork-Join parallelism paradigmfollows a Geometric distribution. Figure Figure 3 depicts the processor contributions to anexample ParaDRAM simulation of a variant of Himmelblau’s function and the Geometric ﬁt tothe distribution of processor contributions. Figure Figure 4 illustrates an example predictedstrong-scaling behavior of the sampler and the predicted speedup by the sampler for a rangeof processor counts.

Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ igure 4: A comparison of the actual strong scaling behavior of an example ParaMonte-ParaDRAMsimulation from 1 to 1088 processors with the strong-scaling behavior predicted during the post-processing phases of ParaDRAM simulations. The data used in this ﬁgure is automatically generatedfor each parallel simulation performed via any of the ParaMonte samplers.

Eﬃcient compact storage of the output chain

Eﬃcient continuous external storage of the output of ParaDRAM simulations is essential forboth the post-processing of the results and the restart functionality of the simulations, shouldany interruptions happen at runtime. However, as the number of dimensions or the complexityof the target density increases, such external storage of the output can easily become a challengeand a bottleneck in the speed of an otherwise high-performance ParaDRAM sampler. Giventhe currently-available computational technologies, input/ouput (IO) to external hard-drivescan be 2-3 orders of magnitude slower than the Random Access Memory (RAM) storage.To alleviate the eﬀects of such external-IO speed bottlenecks, the ParaDRAM sampler imple-ments a novel method of carefully storing the resulting MCMC chains in a small compact ,yet ASCII human-readable, format in external output ﬁles. This compact-chain (as opposedto the verbose (Markov)-chain ) format leads to signiﬁcant speedup of the simulation whilerequiring 4-100 times less external memory to store the chains in the external output ﬁles.The exact amount of reduction in the external memory usage depends on the eﬃciency ofthe sampler. Additionally, the format of output ﬁle can be set by the user to binary , furtherreducing the memory foot-print of the simulation while increasing the simulation speed. Theimplementation details of this compact-chain format are extensively discussed in (Shahmoradi& Bagheri, 2020), (Shahmoradi & Bagheri, 2020a).

Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ ample reﬁnement In addition to the output *_progress.txt , *_report.txt , *_chain.txt ﬁles, eachParaDRAM sampling generates a *_sample.txt ﬁle containing the ﬁnal reﬁned decorre-lated sample from the objective function. To do so, the sampler computes the IntegratedAutoCorrelation (IAC) of the chain along each dimension of the domain of the objective function.However, the majority of existing methods for the calculation of IAC tend to underestimatethis quantity.Therefore, to ensure the ﬁnal sample resulting from a ParaDRAM simulation is fully decorrelated,we have implemented a novel approach that aggressively and recursively reﬁnes the resultingMarkov chain from a ParaDRAM simulation until no trace of autocorrelation is left in the ﬁnalreﬁned sample. This approach optionally involves two separate phases of sample reﬁnement,1. At the ﬁrst stage, the Markov chain is decorrelated recursively, for as long as needed,based on the IAC of its compact format, where only the the uniquely-visited states arekept in the (compact) chain.2. Once the Markov chain is reﬁned such that its compact format is fully decorrelated, thesecond phase of decorrelation begins, during which the Markov chain is decorrelatedbased on the IAC of the chain in its verbose (Markov) format. This process is repeatedrecursively for as long as there is any residual autocorrelation in the reﬁned sample.We have empirically noticed, via numerous experimentations, that this recursive aggressiveapproach is superior to other existing methods of sample reﬁnement in generating ﬁnal reﬁnedsamples that are neither too small in size nor autocorrelated. The restart functionality

Each ParaMonte sampler is automatically capable of restarting an existing interrupted simula-tion, whether in serial or parallel. All that is required is to rerun the interrupted simulationwith the same output ﬁle names. The ParaMonte samplers automatically detect the presenceof an incomplete simulation in the output ﬁles and restart the simulation from where it was leftoﬀ. Furthermore, if the user sets the seed of the random number generator of sampler prior torunning the simulation, the ParaMonte samplers are capable of regenerating the samechain that would have been produced if the simulation had not been interrupted inthe ﬁrst place . Such fully-deterministic reproducibility into-the-future is guaranteed with 16digits of decimal precision for the results of any ParaMonte simulation. To our knowledge,this is a unique feature of the ParaMonte library that does not appear to exist in any of thecontemporary libraries for Markov Chain Monte Carlo simulations.

Documentation and Repository

Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ cknowledgements We thank the Texas Advanced Computing Center for providing the supercomputer time fortesting and development of this library.

References

Bois, F. Y. (2009). GNU mcsim: Bayesian statistical inference for sbml-coded systems biologymodels.

Bioinformatics , (11), 1453–1454.Fanfarillo, A., Burnus, T., Cardellini, V., Filippone, S., Nagle, D., & Rouson, D. (2014). Open-Coarrays: Open-source transport layers supporting coarray fortran compilers. In Proceedingsof the 8th international conference on partitioned global address space programming models (pp. 1–11).Haario, H., Laine, M., Mira, A., & Saksman, E. (2006). DRAM: Eﬃcient adaptive mcmc.

Statistics and computing , (4), 339–354.Kumbhare, S., & Shahmoradi, A. (2020). Parallel adapative monte carlo optimization, sampling,and integration in c/c++, fortran, matlab, and python. Bulletin of the American PhysicalSociety .Osborne, J. A., Shahmoradi, A., & Nemiroﬀ, R. J. (2020). A multilevel empirical bayesianapproach to estimating the unknown redshifts of 1366 batse catalog long-duration gamma-raybursts.Osborne, J. A., Shahmoradi, A., & Nemiroﬀ, R. J. (2020). A Multilevel Empirical BayesianApproach to Estimating the Unknown Redshifts of 1366 BATSE Catalog Long-DurationGamma-Ray Bursts. arXiv e-prints , arXiv:2006.01157.Prudencio, E., & Schulz, K. (2012). The parallel C++ statistical library queso: Quantiﬁcationof uncertainty for estimation, simulation and optimization. In M. Alexander, P. D’Ambra, A.Belloum, G. Bosilca, M. Cannataro, M. Danelutto, B. Martino, et al. (Eds.),

Euro-par 2011:Parallel processing workshops , Lecture notes in computer science (Vol. 7155, pp. 398–407).Springer Berlin Heidelberg. ISBN: 978-3-642-29736-6Shahmoradi, A. (2013). A multivariate ﬁt luminosity function and world model for longgamma-ray bursts.

The Astrophysical Journal , (2), 111.Shahmoradi, A. (2013). Gamma-Ray bursts: Energetics and Prompt Correlations. arXive-prints , arXiv:1308.1097.Shahmoradi, A., & Bagheri, F. (2020). ParaDRAM: A cross-language toolbox for parallelhigh-performance delayed-rejection adaptive metropolis markov chain monte carlo simulations. arXiv preprint arXiv:2008.09589 .Shahmoradi, A., & Bagheri, F. (2020a). ParaDRAM: A Cross-Language Toolbox for Par-allel High-Performance Delayed-Rejection Adaptive Metropolis Markov Chain Monte CarloSimulations. arXiv e-prints , arXiv:2008.09589.Shahmoradi, A., & Bagheri, F. (2020b, August). ParaMonte: Parallel Monte Carlo library.Shahmoradi, A., & Nemiroﬀ, R. (2014). Classiﬁcation and energetics of cosmological gamma-ray bursts. In American astronomical society meeting abstracts (Vol. 223).Shahmoradi, A., & Nemiroﬀ, R. J. (2015). Short versus long gamma-ray bursts: A comprehen-sive study of energetics and prompt gamma-ray correlations.

Monthly Notices of the Royal

Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/ stronomical Society , (1), 126–143.Shahmoradi, A., & Nemiroﬀ, R. J. (2019). A Catalog of Redshift Estimates for 1366 BATSELong-Duration Gamma-Ray Bursts: Evidence for Strong Selection Eﬀects on the Phenomeno-logical Prompt Gamma-Ray Correlations. arXiv e-prints , arXiv:1903.06989. Amir Shahmoradi, (2020). ParaMonte: A high-performance serial/parallel Monte Carlo simulation library for C, C++, Fortran.

Journal of OpenSource Software , 0(0), 0000. https://doi.org/, 0(0), 0000. https://doi.org/