HMCF - Hamiltonian Monte Carlo Sampling for Fields - A Python framework for HMC sampling with NIFTy
JJSS
Journal of Statistical Software
MMMMMM YYYY, Volume VV, Issue II. doi: 10.18637/jss.v000.i00
Hamiltonian Monte Carlo Sampling for Fields
Christoph Lienhard Torsten A. Enßlin
Abstract
HMCF “Hamiltonian Monte Carlo for Fields” is a software add-on for the
NIFTy “Nu-merical Information Field Theory” framework implementing Hamiltonian Monte Carlo(HMC) sampling in
Python . HMCF as well as
NIFTy are designed to address inferenceproblems in high-dimensional spatially correlated setups such as image reconstruction.They are optimized to deal with the typically high number of degrees of freedom.
HMCF adds an HMC sampler to
NIFTy that automatically adjusts the many freeparameters steering the HMC sampling machinery. A wide variety of features ensureefficient full-posterior sampling for high-dimensional inference problems. These featuresinclude integration step size adjustment, evaluation of the mass matrix, convergence di-agnostics, higher order symplectic integration and simultaneous sampling of parametersand hyperparameters in Bayesian hierarchical models.
Keywords : Python , Bayesian statistics, field inference, Hamiltonian sampling.
1. Introduction
HMCF implements a Hamiltonian Monte Carlo (HMC) sampler (Duane, Kennedy, Pendleton,and Roweth 1987) for the
NIFTy (“Numerical Information Field Theory”) framework (Selig,Bell, Junklewitz, Oppermann, Reinecke, Greiner, Pachajoa, and Enßlin 2013; Steininger,Dixit, Frank, Greiner, Hutschenreuter, Knollmüller, Leike, Porqueres, Pumpe, Reinecke et al.
Python3 on Unix-like systems. The mainpurpose of
HMCF is to create samples which are distributed according to arbitrary, once-differentiable probability distributions. These samples can, for example be used to approxi-mate expectation values in cases where brute-force integration is infeasible.
NIFTy is a
Python library developed for computational work with information field theory(IFT, Enßlin, Frommert, and Kitaura (2009); Enßlin and Weig (2010)). IFT extends clas- a r X i v : . [ phy s i c s . d a t a - a n ] D ec HMC Sampling for Fields sical probability theory onto functional spaces.
NIFTy is interesting for spatially correlatedinference problems such as image reconstruction (Selig, Vacca, Oppermann, and Enßlin 2015;Junklewitz, Bell, Selig, and Enßlin 2016; Pumpe, Reinecke, and Enßlin 2018) or work ongeospatial datasets (Daniel Buscombe 2016). A main advantage is the resolution-independentcalculation of statistical estimates.With
HMCF , Bayesian models already implemented in
NIFTy can be reused for an HMCsampling approach, easily. This can help estimating the impact of approximations present inother approaches, or enable tackling entirely new problems.
At the heart of
HMCF lies the
HMCSampler class which constructs an HMC sampler basedon a predefined
NIFTy
Energy class describing a probability distribution P ( x ) as an energy Ψ( s ) = − log( P ( x )). Samples drawn from this distribution are saved to the disk as HDF5 archives (The HDF Group 2018).To ensure a successful sampling process
HMCF implements a variety of additional features.The sampler calculates a convergence measure to determine when the burn-in phase hasfinished. Several predefined strategies on how to exactly calculate the measure are availableand can be chosen from. It is critical for an HMC sampler to use a proper mass matrix whichis why
HMCF can recalculate it several times during burn-in achieving better performanceby orders of magnitude in comparison to the usage of an identity as mass matrix. Anotherimportant sampling parameter, the integration step size of the symplectic integrator, is alsoadjusted such that a predefined acceptance rate is matched. Again, the exact adjustingstrategy can be chosen from a predefined set of options. Although for most purposes asecond order leapfrog integrator is sufficient, higher order integrators are available as well.Furthermore,
HMCF uses multi-processing in that individual Markov chains use separatecores.
HMCF is optimized to ease the work with HMC sampling. All of the above can be done inonly a few lines of code if a well-defined
NIFTy
Energy class is available.
There are many software packages for HMC sampling available in many different languages.But unlike
HMCF most packages are static in that they use in general the identity as themass matrix or need a mass matrix specified in the beginning. Since especially in high-dimensional cases a good mass matrix estimation is crucial for a successful sampling processwe will concentrate on packages which estimate the mass matrix.A very popular cross-platform package for HMC is
Stan (Stan Development Team 2017).
Stan provides interfaces for R , Python , shell , MATLAB , Julia , Stata , Mathematica and
Scala .Its biggest advantage is the
C++ back-end which makes it by far the fastest sampler if thesame parameters such as integration step size and mass matrix are chosen. Another notableadvantage over
HMCF is an implementation of the no-u-turn sampler (NUTS, Hoffman andGelman (2014)) which can be seen as an extension to the standard HMC approach.In
Stan the mass matrix is set to the identity initially, but is recalculated from the generatedsamples during the burn-in phase. The mass matrix can but does not have to be restrictedto a diagonal matrix.
HMCF differs in that the user is able to define their own mass matrix ournal of Statistical Software
Stan developers announced such a feature in future versions, though. Using the samplesgenerated by the initial chain itself involves the risk of having highly correlated samples incase the sampler was malfunctioning due to the usage of the initial mass matrix. To avoid this,
HMCF uses samples drawn from the curvature of the full posterior at the current position toreevaluate the mass matrix. We found this approach to be much more efficient. Reevaluatedmass matrices are always diagonal in
HMCF but since it is targeted at high-dimensionalproblems where non-sparse matrices can not be represented explicitly this is not really adisadvantage. Furthermore, more recent
NIFTy based algorithms use harmonic space degreesof freedom as field variables (Knollmüller, Steininger, and Enßlin 2017) which fits better to amass matrix being diagonal in these field parameters.Another important package for HMC sampling in
Python is pyMC3 (Salvatier, Wiecki, andFonnesbeck 2018). pyMC3 provides a huge variety of different samplers among other functionsfor statistical applications. When it comes to the HMC sampler in pyMC3 the main differenceto HMCF is that the mass matrix is again evaluated based on the samples of the Markovchain itself which might be problematic as described in the paragraph above. Again pyMC3 has a NUTS feature.Apart from that the main advantage of
HMCF is that it is easy to use for already writtenalgorithms in
NIFTy and its optimization for high-dimensional statistical problems.
This introduction is followed by a short installation guide. In section 3 we give an introductionto HMC sampling on a theoretical / mathematical level. Afterwards, we illustrate the workwith
NIFTy and
HMCF using a simple Bayesian hierarchical model as an example in section4. This document ends with a short summary in section 5 on why there is a need for a distinct
HMCF package.
2. Installation
HMCF relies on the following other
Python packages:
NumPy : The very basic
Python package for numerical analysis on multidimensional arrays.
SciPy (Oliphant 2007) : Library implementing advanced algorithms and functions in
Python . h5py : A Python wrapper for the HDF5 file format.
NIFTy (4.1.0 or newer, Steininger et al. (2017); Martin Reinecke (2018)) : A package forstatistical field inference problems. matplotlib (Hunter 2007), optional : A package for producing figures. Necessary for the
HMCF tools sub-package.
PyQt5 optional : Necessary for the tools sub-package.
HMC Sampling for Fields
NIFTy supports multi-processing in many calculations via mpi4py (Dalcìn, Paz, and Storti(2005)) but
HMCF needs to restrict each individual Markov chain to one core. If mpi4py is already installed the user should switch multi-processing off by setting the environmentvariables
MKL_NUM_THREADS and
OMP_NUM_THREADS to 1 in a terminal: export MKL_NUM_THREADS = 1export OMP_NUM_THREADS = 1
Installing
HMCF along with all required packages is possible via pip install git+https://gitlab.mpcdf.mpg.de/ift/HMCF
3. Hamiltonian Monte Carlo Sampling
The goal of most statistical problems is to find expectation values h f ( x ) i P ( x ) of a function f ( x ) given a distribution P ( x ), with h f ( x ) i P ( x ) = Z X f ( x ) P ( x ) dx , where X is the space of all possible values for x . However, especially in high dimensionalcases the integral may become intractable. One approach to circumvent this problem is touse a form of Monte Carlo integration. Samples ( x i ) which are distributed like P ( x ) are usedto approximate the expectation value: h f ( x ) i P ( x ) ≈ N N X i =1 f ( x i ) (1)The law of large numbers ensures that this approximation converges to the true expectationvalue in non-pathological situations. Then, the problem is reduced to finding a strategy togenerate the samples ( x i ). While there are straight forward solutions in some cases such asnormal distributions, often more advanced algorithms are needed. A large subset of suchalgorithms are Markov chain Monte Carlo (MCMC) methods which have in common thata Markov process is constructed which ultimately converges to the wanted target distribu-tion P ( x ). HMC sampling is a MCMC method especially applicable for once-differentiableprobability distributions on high-dimensional spaces. The Hamilton Monte Carlo (HMC) approach (first introduced by Duane et al. (1987), goodsummaries: Neal et al. (2011); Betancourt (2017)) uses a variation of the Metropolis-Hastingsalgorithm (Hastings 1970; Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller 1953) withless random walk behavior. The idea is to describe the Markov process as a physical Hamilto-nian time evolution, exploiting for MCMC methods advantageous properties of the dynamicsof this system. The samples ( x i ) can then be thought of as snapshots of the trajectory of ournal of Statistical Software
5a particle moving through an energy landscape Ψ( x ). This energy is defined by the targetdistribution P ( x ) via Ψ( x ) := − log( P ( x ))This convention originates from statistical physics where the probability of a system being instate i with energy E ( i ) is P ( x ) ∝ exp( − βE ( i ))where β is a temperature-dependent scaling parameter.Additionally, a new normal distributed random variable p ∈ X with covariance M called momentum is introduced. The negative logarithm of the joint distribution P ( x, p ) then lookslike a typical physical Hamiltonian: H ( x, p ) = 12 p > M − p + Ψ( x ) + constThe central idea of HMC sampling is to evolve this system in time according to Hamilton’sequations of motion ˙ x k = ∂H∂p k = h M − p i k ˙ p k = − ∂H∂x k = − ∂ Ψ ∂x k (2)for k = 1 , . . . , dim( X ). After some time T the position we arrived at is considered as anew Metropolis-Hastings proposal ( x ( T ) , p ( T )) =: ( x , p ). This is approach is possible sinceHamiltonian dynamics have some convenient properties such as volume preservation. Also, intheory, the new sample is exactly as probable as the starting point since the process is energyconserving. In practice however, the discretization of the integration in time leads to errorsin the energy conservation, which is why a accept-reject step is necessary where the proposalis accepted with probability ρ A = min (1 , exp( − ∆ E )) , (3)where ∆ E = H ( x , p ) − H ( x, p ).The whole algorithm then looks like this:Set initial x for i = 1 to do Generate momentum sample p ∼ N (0 , M )Evolve system for time T New position: ( x , p ) = ( x ( T ) , p ( T )) MH proposalGenerate sample r ∼ U ([0 , if r ≤ ρ A then Set x i = x else Set x i = x i − endend The resampling of p in each iteration ensures that the whole parameter space X can bereached. At this point we omit the full proof that the samples ( x i ) are then distributed like P ( x ). See Neal et al. (2011) for details. HMC Sampling for Fields
For HMC to work, the integrator for the time evolution of the system needs to be symplectic.Most of the time the leapfrog integrator is used, which is a second order symplectic inte-grator. Symplectic integrators of higher orders based on work presented in Yoshida (1990)are possible as well in
HMCF . They have proven to be advantageous for HMC sampling inhigh-dimensional non-linear cases (Blanes, Casas, and Sanz-Serna 2014).One step in time of length (cid:15) with the leapfrog integrator is calculated via p (cid:18) t + (cid:15) (cid:19) = p ( t ) − (cid:15) ∂ Ψ ∂x (cid:12)(cid:12)(cid:12)(cid:12) x ( t ) x ( t + (cid:15) ) = x ( t ) + (cid:15)M − pp ( t + (cid:15) ) = p (cid:18) t + (cid:15) (cid:19) − (cid:15) ∂ Ψ ∂x (cid:12)(cid:12)(cid:12)(cid:12) x ( t + (cid:15) ) . (4)This single leapfrog step is applied L times to generate a new sample such that T = (cid:15)L . Theintegration step size (cid:15) determines the overall acceptance rate of the sampling process. Theadvantages of HMC are only present if the acceptance rate is in the range of 0.6 to 0.9 (Beskos,Pillai, Roberts, Sanz-Serna, Stuart et al. (cid:15) to the acceptance rate we developed an approximation further discussed in appendixA.Finally, for an HMC sampling process to work properly it is crucial to find a good massmatrix M , which serves as covariance of the momentum p . There are several approaches butone very popular strategy is to use samples from the chain itself and base the mass matrixon the variance of these samples. M − = 1 N N X i =1 (cid:16) ( x i − µ )( x i − µ ) > (cid:17) (5)with µ being the mean value. However, in specific cases other approaches might be better,for example as documented in Taylor et al. (2008). We found that using samples from thechain itself is unfeasible in high-dimensional cases (dim( X ) > HMCF the samples are generated by drawing samples from the curvature of Ψ( x )at the current position of the chain. In other words: For the purpose of estimating the massmatrix, we approximate the target distribution P ( x ) with a normal distribution and thendraw samples from this distribution.
4. Using
HMCF
In this section we try to provide some intuition in working with
HMCF . It is assumed thatthe reader is familiar with
Python .We will use a simple Bayesian hierarchical model as an example to introduce the most impor-tant functionalities of
HMCF . For a comprehensive documentation of all features see appendixB. An implementation of this example is part of the
HMCF repository and can be found in demos/multi_field_demo.py . We first introduce the model on a theoretical level, then give ournal of Statistical Software
7a short overview on how the model is implemented within
NIFTy in section 4.2. In section4.3 we show how sampling with
HMCF works and finally introduce features for retrievingand displaying the results in section 4.4.
Consider a telescope observing the large-scale structure of the universe. Billions of galaxiesproducing photons eventually reaching the earth. This photon flux reaching the sensors ofthe telescope is denoted x . The spatial properties of x can be described as a mathematicalfield living on a continuous space.Our telescope measures this photon flux x which means that it converts it into an electronicsignal based on how many photons reach each of its CCD sensors. Errors in lenses and smalldifferences between individual sensors alter the true nature of the original x field but canbe coped with by calibration. In our model, this transformation is represented by a linear response operator R acting on x . Note, that the domain and the target domain of R aredifferent. Whereas, in theory, the domain of x is continuous, the output of the telescope isdefinitely discretized. Additionally to the response of the instrument, we assume a certainGaussian noise due to imperfect electronics with known covariance N . The measured data d produced by the telescope is then described by the measurement equation d = R ( x ) + n . (6)What we are interested in is the “true” signal, based on the data d we got from the telescope.This information is reflected in the conditional probability P ( x | d ). In terms of Bayesianstatistics this is often referred to as the posterior and Bayes formula relates the posterior toour assumed model and prior knowledge we may have on our signal x : P ( x | d ) ∝ P ( d | x ) P ( x )While the likelyhood P ( d | x ) is defined by our model in equation (6) and the noise statistics,the prior P ( x ) needs more consideration. Observations dating back to Hubble (1934) suggestthat for the large-scale structure, at least to some extend, a log-normal prior is sufficient. Wetherefore introduce another field s = log( x ) with a multivariate Gaussian distribution andcovariance S such that x is log-normal distributed. We further want to ensure that our field s is somewhat smooth. This can be enforced by imposing a power law P ( k ) on the covariance S in its harmonic representation, S kk = δ k,k P ( k ) . This power law can be chosen such that high curvatures in the position space get punishedand are therefore improbable. For illustration, we assume a power law P ( k ) = (cid:18) l c l c k (cid:19) (7)with the correlation length l c essentially defining how strong curvature is allowed to be.If we do not know the correlation length, we can treat it as a free hyperparameter making theproblem a full Bayesian hierarchical model. Since l c is strictly positive we assume anotherlog-normal prior P ( l c ) ∝ l c exp − (log l c − µ c ) σ c ! HMC Sampling for Fields where µ c and σ c are appropriate parameters for this hyperprior.Our full posterior then turns into P ( s, l c | d ) ∝ P ( d | s ) P ( s | l c ) P ( l c ) (8)with the likelihood P ( d | s ) = Z δ ( d − Re s − n ) P ( n ) dn ∝ exp (cid:18) −
12 ( d − Re s ) > N − ( d − Re s ) (cid:19) and the prior for s P ( s | l c ) = | πS | − exp (cid:18) − s > S − s (cid:19) The dependence on l c is encoded in the covariance S . The norm of S is defined as thedeterminant | S | = det S = Y k P ( k )However, for HMC sampling we need a potential Ψ( s, l c ) i.e., the negative logarithm of theposterior in (8). For better readability, we divide the potential in a prior and a likelihoodpart as well, such that Ψ( s, l c ) = Ψ l ( s, l c ) + Ψ p ( l c ) (9)with Ψ l ( s, l c ) = − log P ( d | s ) − log P ( s | l c )Ψ p ( l c ) = − log P ( l c )We omit terms constant in l c and s since they are not important for HMC sampling andarrive at Ψ l ( s, l c ) = 12 ( d − Re s ) > N − ( d − Re s ) + 12 s > S − s + X k log( P ( k )) ! Ψ p ( l c ) = 12 (cid:18) σ c (log l c − µ c ) (cid:19) + log( l c ) (10)Additionally, the gradient of the potential Ψ( s, l c ) is needed for the time evolution part duringHMC sampling. For the likelihood part the gradient boils down to the following expressions: ∂ s Ψ l = S − s − ( Re s ) (cid:12) N − ( d − Re s ) ∂ l c Ψ l = s > (cid:16) ∂ l c S − (cid:17) s + 4 X k (cid:18) l c − k l c k (cid:19) (11)where (cid:12) denotes point-wise multiplication of vectors. For deriving the exact expression for ∂ l c S − , observe that (cid:2) S − (cid:3) kk = δ kk ( P ( k )) − and therefore h ∂ l c S − i kk = − l c k ) l δ kk ournal of Statistical Software ∂ s Ψ p = 0 ∂ l c Ψ p = 1 l c (cid:18) σ c (log l c − µ c ) − (cid:19) . (12) NIFTy
In this section we will take a look at how such a model can be implemented in
NIFTy ingeneral. For more details on
NIFTy see Martin Reinecke (2018).In
NIFTy there are a variety of classes representing certain aspects of a typical inferenceproblem, among which the most important ones are:
Domain : A base class representing the underlying space. For example a regular grid can bedefined as a
RGSpace which inherits from
Domain . Field / MultiField : A class representing fields such as x , n or d . It carries informationabout the underlying Domain as well as the field’s value on this domain. The
Field supports every arithmetic operation. Other operations such as trigonometric or expo-nential functions can be applied point-wise using e.g., ift.exp(x) . The notation forthese functions is the same as the one used by
NumPy . MultiField s are essentiallydictionaries of
Field s which also support all operations above. This can be used torepresent all free parameters in a hierarchical model in just one object.
LinearOperator : An abstract base class for explicitly or implicitly defined linear opera-tors such as the response R . The LinearOperator class carries information aboutthe operator’s
Domain as well as its target domain. The operator can be applied to a
Field x in various ways by calling one of the methods times(x) , inverse_times(x) , adjoint_times(x) , or adjoint_inverse_times(x) , although not every of these meth-ods is available for every linear operator. Energy : An abstract base class representing the negative logarithm of a distribution P ( x ).The Energy class is only defined at a certain value of a
Field or MultiField but thesame energy for a different
Field y on the same
Domain can be obtained by calling the at(y) method. Additionally, the
Energy class defines a gradient at the position as wellas the curvature. This class is the most important one for
HMCF since it defines thepotential Ψ( s, l c ) and thereby the whole model.The model introduced in the previous section can be implemented as an NIFTy
Energy butsince this paper is about
HMCF we will not go into detail with this. The curious readercan, however, look into the demonstration script and will find the implementations of thelikelihood and the prior energy from (10) along with their respective gradients in (11) and(12) in /demos/energies.py in the package’s repository.
Implementation of the Hierarchical Model in
NIFTy
This and the following section summarize the demos/multi_field_demo.py script. Thereader may want to have a look at the script, to have an overview.We first import
NIFTy and
HMCF among other packages via0
HMC Sampling for Fields import nifty4 as iftimport hmcf
A mock dataset for our algorithm to operate on is generated using the get_hierarchical_data function which was written for this purpose only. d, sl, R, N = get_hierarchical_data()
It returns
NIFTy objects for the data field d , the free parameters s and l c (as the MultiFieldsl ), the response R and the noise covariance N . d is a Field containing the data, sl is a MultiField with entries ’signal’ for s and ’l_c’ for l c . The mock dataset is generatedby first sampling a signal field s with the power spectrum defined in (7) and a pre-definedvalue for l c . Together with a noise drawn from a normal distribution with covariance N themeasurement equation (6) is applied and the mock data set is available. The signal spaceas well as the data space consist of 100 ×
100 pixels which means that, together with l c , ourproblem has 10 ,
001 free parameters. In figure 1 this mock data set generated with randomseed 41 can be observed. For this simple example the instrument R transforms the photon fluxperfectly to an electrical signal, except for a square in the bottom right region of the imagewhere the instrument is broken and just returns zeros. For our sampling process we wrote x R(x) d 0.50.00.51.01.52.02.53.0 Figure 1: Output of the get_hierarchical_data function. The first image on the leftdisplays the original photon flux x before it hits the detector. For this simple example theinstrument transforms the photon flux perfectly to an electrical signal, except for a square inthe bottom left region of the image where it is broken. The instrument returns just zeros inthat region. The resulting image is displayed in the middle figure. Finally in the right figure,the data field d is displayed where Gaussian random noise was added.an NIFTy
Energy class called
SimpleHierarchicalEnergy which implements the potentialΨ( s, l c ) described in (9). The constructor needs the following arguments: position : NIFTy
MultiField
The position ( s, l c ) where the energy Ψ( s, l c ) is evaluated. The MultiField has twoentries: ’signal’ and ’l_c’ . The
NIFTy
Field in ’signal’ is the harmonic repre-sentation of the s parameter. ournal of Statistical Software d : NIFTy
Field
The data vector. R : NIFTy
LinearOperator
The instrument’s response. N : NIFTy
LinearOperator
The covariance of the noise. l_c_mu : float Hyperparameter for the log-normal distribution of l c . l_c_sig2 : float Hyperparameter for the log-normal distribution of l c . HT : NIFTy
LinearOperator
Harmonic transformation operator, capable of transforming the position[’signal’] field from the harmonic representation to the position space. inverter : NIFTy
Minimizer
Numerical method for inverting
NIFTy
LinearOperator s. This is necessary to be ableto sample from the curvature of the energy which is required if the mass matrix issupposed to be reevaluated.Using a
MultiField as position enables us to sample the signal and the hyperparameter l c at the same time. d , R and N are already given by the get_hierarchical_data function.Values for l_c_mu and l_c_sig2 are chosen such that values for l c in the range of 0 .
05 to 10 . are probable (the true value used during data generation is 0 . s_space = sl['signal'].domainHT = ift.HarmonicTransformOperator(s_space) and as the inverter needed for the mass reevaluation we use a conjugate gradient implemen-tation in NIFTy : ICI = ift.GradientNormController(iteration_limit=2000,tol_abs_gradnorm=1e-3)inverter = ift.ConjugateGradient(controller=ICI)
Finally, for the initial definition of the energy we use the sl as position since it has the correct MultiField structure. An instance of the energy for our model is then created by calling theconstructor energy = SimpleHierarchicalEnergy(sl, d, R, N, HT, -0.3, 2.,inverter=inverter)
HMCF HMC Sampling for Fields
Up to this point we only introduced and used
NIFTy objects. But now that an
Energy isproperly defined we can start using
HMCF . First, we create an instance of the
HMCSampler class. This object represents the HMC sampler and the constructor has only one required ar-gument: The potential Ψ. However, for this example we also set the optional num_processes argument which states the number of chains or CPU cores we use during sampling and the sample_transform argument which is necessary since we sample s but are mainly interestedin the photon flux x = exp( s ). def sample_trafo(s):val = dict(s)val['signal'] = ift.exp(HT(val['signal']))return ift.MultiField(val=val)sampler = hmcf.HMCSampler(energy, num_processes=6,sample_transform=sample_trafo) The sample_transform argument requires a function and represents essentially f in (1). It isimportant that the sample_transform function takes NIFTy
Field s or
MultiField s of thesame kind as the position argument of the
Energy instance which is passed to the constructorof the
HMCSampler class (in particular they need to live on the same domain). The output ofthe function can be any kind of
Field or MultiField regardless of what was put in.Before we start sampling we need to define initial positions for the Markov chains. In principlethis could be completely random but unfortunately, in high-dimensional cases we need tobe somewhere close to non-vanishing parts of the target distribution because otherwise thegradients are so large that numerical integration during time evolution will fail in any case. Inthis example we know the true signal sl and will use small derivations from that, but underreal circumstances one would need to first use an approximating algorithm to get close to themean or maximum-a-posteriori solution of the problem and start from there. This can bedone using algorithms already implemented in NIFTy . Afterwards, we call the run method ofour sampler which has again only one required argument: The number of samples per Markovchain drawn after the burn-in phase has finished. Additionally, we set the optional argument x_initial with a list of initial positions (the length of this list does not have to be the sameas the number of Markov chains). x_initial = [sl*c for c in [.5, .7, 1.5, 2.]]sampler.run(500, x_initial=x_initial)
This will initiate a sampling process where the sampler starts in a burn-in phase during whichthe integration step size (cid:15) from (4) is adjusted to meet the default target acceptance rate of0 .
8. Additionally, the mass matrix is reevaluated once in the beginning and then the samplerwaits until the individual Markov chains have converged with respect to a measure based ondiagnostics first introduced by Gelman and Rubin (1992). All these things can be adjustedto the users needs and specific problem and a detailed description of all options can be foundin appendix B. The whole sampling process may take up to 10 minutes depending on themachine the script is executed on. If the sampling process seems to freeze in the beginningthis is probable due to the mass reevaluation which can take some time. A much shorterexecution time is possible by setting the n_pixels argument of the get_hierarchical_data function to . ournal of Statistical Software After some time the sampler is finished and one can access the mean value (of the transformedsamples) as a
NIFTy
MultiField via the mean attribute of the sampler . To get the photonflux values as a
NumPy array one can write mean_val = sampler.mean['signal'].to_global_data()
The ’signal’ statement selects the
NIFTy
Field representing the signal s and the to_global_data() method returns the Field ’s value on the regular grid as a
NumPy array. To get the inferredmean value of the correlation length l c as a float the following statement does the trick: l_c_val = sampler.mean['l_c'].to_global_data()[0]print(l_c_val) The same thing is possible by loading the mean value from the respective
HDF5 file: mean_val = hmcf.load_mean(path/to/file)['signal']l_c_val = hmcf.load_mean(path/to/file)['l_c'][0]
Obviously, this is possible even if the
HMCSampler instance was deleted already (for exampleafter a reboot of your system).The variance of the samples can be obtained in the same way using the var attribute of the
HMCSampler class or calling the load_var function. The results of the sampling process aredisplayed in figure 2.The most prominent difference between the original flux x and the HMC mean value is wherethe instrument was broken in the bottom right region. In particular, the standard deviationof the samples drawn from the full posterior distribution is remarkably high there as onewould expect since information is missing.The samples itself can be loaded by either using the samples attribute of the HMCSampler class or calling the load function of
HMCF . As an example the shape of the returned
NumPy array can be displayed with print(hmcf.load(path/to/file.h5)['signal'].shape) (6, 500, 100, 100) where the first element reflects the number of independent Markov chains, the second elementis the number of sample each chain generated and the third and fourth element reflect theregular grid on which the whole process was carried out.To have a better overview of these samples we can use the show_trajectories functionof
HMCF which displays the chain trajectories through parameter space by pixel. It is aninteractive GUI consisting of two graphs as depicted in figure 3. The left graph shows theinferred mean value and the right graph trajectories of each chain for one selected pixel. Thepixel can either be set in the top row by defining the coordinates and then clicking on “show”or by just clicking on some pixel in the left graph. The function itself is located in the
HMCF tools sub-package and can be called by executing4
HMC Sampling for Fields x hmc meandifference hmc std0.50.00.51.01.52.02.53.0 0.50.00.51.01.52.02.53.00.10.20.30.40.50.60.70.8 0.080.100.120.140.160.180.200.22
Figure 2: Results of the multi_field_demo.py script. In the upper row the true photon flux x along with the reconstructed picture based on the samples generated by the HMC algorithm.The reconstructed picture gets very blurry where the instrument was broken in the bottomright region and information got lost. In the second row the absolute difference between theoriginal flux and the reconstructed picture and the standard deviation of the HMC samplesis displayed. Again, the region in which the instrument is broken is very prominent in bothcases. from hmcf.tools import show_trajectoriesshow_trajectories(path/to/file, field_name='signal') where the field_name statement defines which element of the MultiField is displayed. ournal of Statistical Software s in our example is displayed. In the right graph the trajectoriesof the six Markov chains at the pixel coordinates ( x = 12 , y = 52) (as stated in the top row)are shown.
5. Summary
Efficient HMC sampling with the high number of degrees of freedom of a numerically repre-sented field is a very complicated task.
HMCF takes care of most challenges arising whileworking on such problems. It provides good default values and adjusting strategies for crucialparameters such as the integration step size or the mass matrix. Nonetheless the user is stillable to customize many details of how the sampler deals with a given problem. These featuresinclude• Simultaneous sampling of all free parameters and hyperparameters• Setting the order of the symplectic integrator• Defining the adjustment strategy for (cid:15) and related properties• Defining the convergence measure strategy and related properties• Defining how often and how well the mass matrix is reevaluated• Providing a clear, in-depth overview of relevant parameters of all chains especially duringburn-in phase6
HMC Sampling for Fields
Apart from a diverse set of different options to choose from the structure of the moduleeven eases the creation of new, customized options. We explained the usage of
HMCF anddemonstrated its performance using the demonstrator coming with the
HMCF package.
Acknowledgments
We would like to thank all members of the Information Field Theory group at MPA for manyuseful discussions in particular, and alphabetical order: Philipp Arras, Jakob Knollmüller,Daniel Pumpe, Reimar Leike and Martin Reinecke. ournal of Statistical Software References
Beskos A, Pillai N, Roberts G, Sanz-Serna JM, Stuart A, et al. (2013). “Optimal tuning ofthe hybrid Monte Carlo algorithm.”
Bernoulli , (5A), 1501–1534.Betancourt M (2017). “A conceptual introduction to Hamiltonian Monte Carlo.” arXivpreprint arXiv:1701.02434 .Betancourt M, Byrne S, Girolami M (2014). “Optimizing the integrator step size for Hamil-tonian Monte Carlo.” arXiv preprint arXiv:1411.6669 .Blanes S, Casas F, Sanz-Serna J (2014). “Numerical integrators for the Hybrid Monte Carlomethod.” SIAM Journal on Scientific Computing , (4), A1556–A1580.Dalcìn L, Paz R, Storti M (2005). “MPI for Python.” Journal of Parallel and Dis-tributed Computing , (9), 1108 – 1115. ISSN 0743-7315. doi:https://doi.org/10.1016/j.jpdc.2005.03.010 . URL .Daniel Buscombe (2016). PySESA : Python framework for spatially explicit spectral analysis.
Python package version 0.0.19, URL https://github.com/dbuscombe-usgs/pysesa .Duane S, Kennedy AD, Pendleton BJ, Roweth D (1987). “Hybrid monte carlo.”
Physicsletters B , (2), 216–222.Enßlin TA, Frommert M, Kitaura FS (2009). “Information field theory for cosmologicalperturbation reconstruction and nonlinear signal analysis.” Physical Review D , (10),105005.Enßlin TA, Weig C (2010). “Inference with minimal Gibbs free energy in information fieldtheory.” Physical Review E , (5), 051112.Gelman A, Rubin DB (1992). “Inference from iterative simulation using multiple sequences.” Statistical science , pp. 457–472.Hanson KM (2001). “Markov Chain Monte Carlo posterior sampling with the Hamiltonianmethod.” In
Medical Imaging 2001: Image Processing , volume 4322, pp. 456–468. Interna-tional Society for Optics and Photonics.Hastings WK (1970). “Monte Carlo sampling methods using Markov chains and their appli-cations.”
Biometrika , (1), 97–109.Hoffman MD, Gelman A (2014). “The No-U-turn sampler: adaptively setting path lengthsin Hamiltonian Monte Carlo.” Journal of Machine Learning Research , (1), 1593–1623.Hubble E (1934). “The distribution of extra-galactic nebulae.” The Astrophysical Journal , , 8.Hunter JD (2007). “Matplotlib: A 2D Graphics Environment.” Computing in Science Engi-neering , (3), 90–95. ISSN 1521-9615. doi:10.1109/MCSE.2007.55 .8 HMC Sampling for Fields
Junklewitz H, Bell M, Selig M, Enßlin T (2016). “RESOLVE: A new algorithm for aperturesynthesis imaging of extended emission in radio astronomy.”
Astronomy & Astrophysics , , A76.Knollmüller J, Steininger T, Enßlin TA (2017). “Inference of signals with unknown correlationstructure from non-linear measurements.” arXiv preprint arXiv:1711.02955 .Martin Reinecke Theo Steininger MS (2018). nifty4 : a package implementing numericalmethods for Information Field Theory . Max-Planck-Society. Python pacakge version 4.1.0,URL http://ift.pages.mpcdf.de/NIFTy .Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953). “Equation ofstate calculations by fast computing machines.”
The journal of chemical physics , (6),1087–1092.Neal RM, et al. (2011). “MCMC using Hamiltonian dynamics.” Handbook of Markov ChainMonte Carlo , (11).Oliphant TE (2007). “Python for Scientific Computing.” Computing in Science Engineering , (3), 10–20. ISSN 1521-9615. doi:10.1109/MCSE.2007.58 .Pumpe D, Reinecke M, Enßlin TA (2018). “Denoising, Deconvolving and Decompos-ing multi-Dimensional Photon Observations-The D4PO Algorithm.” arXiv preprintarXiv:1802.02013 .Python Software Foundation (2018). logging : Python framework for logging messages.
Part of
Python since 2.3 including 3.x, URL https://docs.python.org/3.6/library/logging.html .Salvatier J, Wiecki T, Fonnesbeck C (2018). pyMC3 : Python package for all sorts of MCMCalgorithms . Python package version 3.4.1, URL https://docs.pymc.io .Selig M, Bell MR, Junklewitz H, Oppermann N, Reinecke M, Greiner M, Pachajoa C, EnßlinTA (2013). “NIFTY - Numerical Information Field Theory. A versatile PYTHON libraryfor signal inference.”
Astronomy & Astrophysics , , A26. doi:10.1051/0004-6361/201321236 . .Selig M, Vacca V, Oppermann N, Enßlin TA (2015). “The denoised, deconvolved, and decom-posed Fermi γ -ray sky-An application of the D3PO algorithm.” Astronomy & Astrophysics , , A126.Stan Development Team (2017). PyStan : Python package for HMC and NUTS . Python package version 2.16.0.0, URL http://mc-stan.org .Steininger T, Dixit J, Frank P, Greiner M, Hutschenreuter S, Knollmüller J, Leike R, Por-queres N, Pumpe D, Reinecke M, et al. (2017). “NIFTy 3-Numerical Information FieldTheory-A Python framework for multicomponent signal inference on HPC clusters.” arXivpreprint arXiv:1708.01073 .Taylor J, Ashdown M, Hobson M (2008). “Fast optimal CMB power spectrum estimationwith Hamiltonian sampling.”
Monthly Notices of the Royal Astronomical Society , (3),1284–1292. ournal of Statistical Software HDF5 . URL https://portal.hdfgroup.org .Yoshida H (1990). “Construction of higher order symplectic integrators.”
Physics letters A , (5-7), 262–268.0 HMC Sampling for Fields
A. Dynamic Step-Size Adjustment
Hamiltonian Monte Carlo needs much more computation time per sample proposal than otherMCMC approaches. This is a disadvantage that is compensated by much higher acceptancerates and lower autocorrelation between samples. Because of the energy conserving propertyof Hamiltonian dynamics, the acceptance rate is only dependent on the performance of thenumerical integrator. The numerical integrator has one free parameter: the step size (cid:15) . Inprinciple the bigger it is the bigger is the integration error ∆ E and thereby the smaller theacceptance rate. On the other hand, a too small value for (cid:15) results in a sampler which doesnot explore the typical set on bearable timescales. In practice an acceptance rate of 0.7 to 0.9(Beskos et al. et al. (cid:15) during the burn-in phase such that a user-defined acceptance rateis matched. The first step in constructing a good strategy is to derive a relation betweenacceptance rate r A and the integration error ∆ E . We developed the following approximation.Given the acceptance probability ρ A from equation (3), the expected acceptance rate is givenby r A ( (cid:15) ) = h ρ A (∆ E ) i P (∆ E | (cid:15) ) = D min (cid:16) , e − ∆ E (cid:17)E P (∆ E | (cid:15) ) (13)where P (∆ E | (cid:15) ) is the probability distribution for ∆ E conditioned on (cid:15) .To tackle the min function properly it is assumed that the probability distribution for thesign of ∆ E is not dependent on the absolute value of ∆ E : P (∆ E ) ≈ P ( | ∆ E | ) P (sgn(∆ E ))This reflects the plausible situation that errors are symmetrically probable displacements oftrajectories in regions of the phase space that are dominated by a potential gradient and notby a minimum. In this case we can further assume that P (sgn(∆ E ) = 1) ≈ P (sgn(∆ E ) = − ≈ . . With this, equation (13) can be written as r A ( (cid:15) ) = 12 (cid:18) D e −| ∆ E | E P ( | ∆ E || (cid:15) ) (cid:19) ≈
12 (2 − h| ∆ E |i ) + O ( h| ∆ E | i ) , where the exponential-function was expanded to first order.In practice a certain value for r A like 0 . E h| ∆ E |i ! = 2(1 − r A ( (cid:15) )) =: ∆ E wanted (14)This is the relation that lies at the core of most epsilon-adjusting strategies available in HMCF . Note that even if a step is accepted for sure because ∆ E is negative, adjusting (cid:15) is still possible since only the absolute value of ∆ E is needed. This is of great use in caseswhere nearly every step during burn-in produces a negative ∆ E (This happens sometimes ifthe Markov chains start far off the mean value). For more on how (cid:15) is adjusted exactly seesection B.3. ournal of Statistical Software B. Detailed Software Description
This section describes all features and functionalities of
HMCF . B.1. The
HMCSampler
Class
The
HMCF package is optimized for a fast HMC implementation for a
NIFTy
Energy class.At its heart lies the
HMCSampler class handling the whole sampling process. It is able torun several Markov chains on different CPUs using the
Python multiprocessing module. Thesamples are saved to an HDF5-file which is generated every time a new run is initialized andcan be loaded back as needed via the package’s own load function. During the burn-in phase
HMCSampler takes care of adjusting the integration step size (cid:15) (see equation (4)) such that auser defined acceptance rate is reached, as well as setting and possibly reevaluating the massmatrix M . After a run has finished the mean of the samples is calculated.Of course in practice one may want to fine-tune some of the specific features and parametersimplemented in HMCF . This section is dedicated to introduce and explain those.
Instantiating
An instance of
HMCSampler is created with the following arguments of which only the first ismandatory: potential : NIFTy
Energy
The HMC potential Ψ( x ). Also defines the domain on which the sampling takes placethrough its position attribute. sample_transform : func , optionalIn some cases it is preferable to sample a field not in its position space, but in anotherdomain, such as in its harmonic space representation, or maybe even in a domain wherethere is no linear transformation to the position space. To ensure correct calculationof expectation values such as the mean or the variance the samples are transformed by sample_transform before being saved to disk. The sample_transform function has tohave exactly one argument which is a Field or MultiField similar to the position at-tribute of the
NIFTy
Energy given as the potential argument. The sample_transform function has to return a
Field or MultiField . There are, however, no further restric-tions on the exact structure of this
Field or MultiField . num_processes : int Number of cores involved in the sampling process. This is equal to the number ofindividual Markov chains started when the instance method run is called.Default: 1 sampler_dir_path : str A path where the HDF5-file containing the samples is going to go.Default: a new folder called “samples” in the ‘ __main__ ’ script’s directory
Running the Sampling Process
In principle the sampling process can be started immediately after creating an instance of2
HMC Sampling for Fields
HMCSampler by calling the run() method with the following arguments of which again onlythe first is mandatory. num_samples : int Number of samples to be drawn per chain after burn in. max_burn_in : int , optionalMaximum number of steps for the chain to converge before it is forced into samplingmode. If no value is stated, forced transition is not going to happen. In this case thechain will only start the actual sampling process if it has converged. convergence_tolerance : float , optionalIf the convergence measure for the sampling process falls below this value, the chain isassumed to have converged and starts with the actual sampling process. If no value isstated the tolerance property of the HMCSampler property convergence is used. Thedefault value for said property is 1. For more on this see section B.2. target_acceptance_rate : float , optionalValue between 0. and 1., stating what ratio of accepted / rejected samples is preferred.The integration step size is adjusted during burn-in to approximately match this ratio.If not stated the corresponding property of the epsilon property is used (for which thedefault value is 0.8). order : int The order of the symplectic integrator. The default value corresponds to a simpleleapfrog integration. Default: 2 mass : NIFTy
EndomorphicOperator , optionalHMC mass matrix used during sampling (or until it is reevaluated). For more on themass matrix see section B.4. If no mass is given, an identity matrix is used (at least asinitial guess). x_initial : NIFTy
Field or list of NIFTy
Field s, optionalStarting point(s) for the HMC sampler. If more than one Markov chain needs to beinitialized they get their respective initial positions by iterating through the list. Thelist does not have to have the same length as the number of chains. If there are morechains than elements in the list , some starting positions are reused for the additionalchains. If only a
Field is given, all chains get the same initial position. If no initialfield is passed, a random sample is drawn from a Gaussian distribution centered atthe position of the
Energy instance given to the constructor of
HMCSampler with the
Energy ’s metric as covariance.
Getting the Results of an HMC-Run
After a run has finished, the sampled mean as a
NIFTy
Field or MultiField is accessiblevia the instance’s property ‘ mean ’. In [1]: hmc_sampler = HMCSampler ( nifty_energy , num_processes =5) In [2]: hmc_sampler . run (200) ournal of Statistical Software In [3]: hmc_sampler . mean Out[3]: nifty4 . Field instance - domain = DomainTuple , len : 1 RGSpace ( shape =(256 , 256) , distances =(0.5 , 0.5) , harmonic = True ) - val = array ([[ 0.63 , - 0.16 , ... , 1.04 , - 0.64] , ... , [ 0.03 , 0.02 , ... , 1.22 , 0.21 ]]) Accessing the values of the samples is possible via calling the samples property. It consists ofa 2+ n dimensional NumPy ndarray where the first dimension represents the different MarkovChains, the second dimension represents the individual samples and the other n dimensionsrepresent the value of the sampled NIFTy
Field s. If the sampled objects were
MultiField san dictionary with
NumPy ndarray s is returned. Remember though that calling this will loadall samples into memory which might crash the process if not enough memory is available. In [4]: all_samples = hmc_sampler . samples In [5]: all_samples . shape Out[5]: (5 , 200 , 256 , 256)
Attributes and Properties of
HMCSampler
The
HMCSampler has a number of other properties and attributes which are mostly used forfine-tuning the sampling process. These are: potential : NIFTy
Energy , read-onlyThe potential Ψ for the HMC sampler. sampler_dir : str Setting or getting the path to the directory where the sample-files are stored. Corre-sponds to the parameter of the same name passed in the constructor of
HMCSampler class. save_burn_in_samples : bool Whether or not to save the samples generated during burn-in phase to disk. Be awareof the fact that if set to
True (default value) together with a high or non-existent max_burn_in parameter (in the constructor of
HMCSampler class) could fill your harddrive.Default: True burn_in_samples : NumPy ndarray or dict ( str -> ndarray ), read-onlyThe same as the samples property but with the samples generated during burn-in phase. var : NIFTy
Field or MultiField, read-onlyThe variance of the samples (after sample_transform ). convergence : HMCF
Convergence
For choosing how to calculate the convergence measure (see section B.2).Default:
HansonConvergence if num_processes == 1 else GelmanRubinConvergence HMC Sampling for Fields epsilon : HMCF
Epsilon
For choosing how to adjust the integration step size parameter during burn-in (seesection B.3).Default:
EpsilonPowerLawDivergencemass : HMCF
Mass
For choosing how to handle the HMC mass during sampling. For more on this seesection B.4. display : HMCF
Display
For choosing how to display the progress of the sampling process (see section B.5)Default:
LineByLineDisplayn_limits : list of int sTo avoid periodic trajectories the number of leapfrog integration steps is randomized. n_limits defines the range from which the number of integration steps is drawn uni-formly.Default: [60, 70] B.2.
Convergence
The
Convergence class handles everything related to the convergence of the Markov chain(s).In principle a chain in
HMCF has converged if a convergence measure calculated for eachdegree of freedom in the sampled
NIFTy
Field or MultiField drops below a given tolerance .Additionally
HMCF implements intermediate steps of convergence via so-called convergence levels . For the time being their main purpose is to define a time where the HMC mass isreevaluated during burn-in (See also section B.4). A chain is said to have converged withrespect to its current convergence level, ifmax(measure) < tolerance · level (15)In other words: The level is the number of digits the decimal separator of the tolerance isshifted to the left.The idea behind this is to decrease the level by one each time an intermediate convergence isreached, while at that point recalculating the mass matrix.If the level drops to zero, equation (15) simplifies to max(measure) < tolerance and the nexttime this requirement is met, the Markov chain has finished the burn-in phase.It remains to explain how the convergence measure is calculated. There are several differentapproaches implemented in HMCF as child classes of the
Convergence class. Choosing oneof them is done by setting the convergence property of the
HMCSampler class with one of the
Convergence ’s child classes, e.g.: hmc_sampler.convergence = GelmanRubinConvergence
For now there are four different possibilities:
MeanConvergence
This rather simple way of calculating the convergence needs at least twoMarkov chains. It compares the mean of the samples from all chains ( total_mean ) to the ournal of Statistical Software chain_means ). The measure is defined as abs(chain_mean/ total_mean - 1.) such that it fulfills the non-negativity and the identity of indis-cernibles criteria for metrics. It proves to be rather unstable if e.g. the total mean isclose to zero.
VarConvergence
Very similar to
MeanConvergence only with the variances of individualchains and all chains. Measure is equal to abs(chain_var / total_var - 1.) . HansonConvergence
So far the only convergence measure which can be used even if there isonly one chain. It follows Hanson (2001) (Again: the measure is the absolute value ofthe ratio minus one).
GelmanRubinConvergence
An implementation of the among MCMC folks very popular Gel-man and Rubin convergence criteria Gelman and Rubin (1992) (Again: the measure isthe absolute value of the ratio minus one).
Attributes and Properties of
Convergence
Regardless of which
Convergence child class has been used additional features can be set viaits class properties, e.g. the locality property which defines the number of recent samplesconsidered when calculating the convergence (see below for details): hmc_sampler.convergence = GelmanRubinConvergencehmc_sampler.convergence.locality = 200
For the common user the following properties are the most important ones: locality : int The number of recent samples to be considered in calculating the convergence measure.On the one hand this is a form of ‘forgetting’ very old parts of the chain’s trajectorywhich do not represent the current state of convergence. On the other hand this isnecessary because of memory issues i.e. if the burn-in phase takes very long the memorywould blow up since every sample ever created has to be available to calculate themeasure.Default: 250 tolerance : float Equivalent to the convergence_tolerance parameter of the
HMCSampler ’s run method.In fact, setting this property as described above has only an effect if the (optional) convergence_tolerance parameter is not passed to the run method.In practice the latter approach might be slightly more convenient. If the maximumvalue of a chain’s convergence measure is below this value the chain is said to haveconverged and transitions from the burn-in phase into the sampling phase. See also: converged (below)Default: 1.The following additional properties of Convergence are mostly only important for
HMCSampler itself and not of relevance for the common user:6
HMC Sampling for Fields converged : NumPy ndarray of bool (1 dim)Contains the information of whether the individual chains have converged with respectto the following law: converged = measure_max < tolerance * 10**levelmeasure : NumPy ndarray (1 + n dim)Represents the value of the measure (calculated dependent on which child class of the Convergence class has been used) for each element of the degrees of freedom in thesampled
NIFTy
Field . The first dimension represents the individual chains. measure_max : NumPy ndarray (1 dim)The highest value of the
Convergence class property ‘ measure ’ per chain. level : NumPy ndarray of int (1 dim)See class property converged . The idea is that after a Markov chain has convergedwith respect to its current level the level is decreased by one. There are Convergence class methods dec_level and inc_level for decreasing and increasing the level by 1,respectively. For more details on these methods see below.Setting this property is also possible with a simple int which sets the whole
NumPy ndarray to that value. quota : NumPy ndarray of float (1 dim)The ratio of elements in the sampler’s position NIFTy
Field which have converged withrespect to tolerance and level (i.e. the ’intermediate’ convergence)
Additional Methods of
Convergence
Internally the convergence levels are decreased and increased by calling dec_level(chain_identifier=None)
Decreases the convergence level of chain_identifier ( int ) by one. If chain_identifier is None the level of all chains is decreased by one. Either way if the level of a chain isalready zero it is left unchanged. inc_level(chain_identifier=None)
Increases the convergence level of chain_identifier ( int ) by one. If chain_identifier is None the level of all chains is increased by one.The convergence level is set under the hood dependent on specific properties of the
Mass classin the beginning of the
HMCSampler ’s run method. B.3. Epsilon
The (cid:15) parameter defines the leapfrog integration step size (equation (4)). In principle thebigger it is the bigger is the integration error ∆ E and thereby the smaller the acceptancerate. To achieve an approximate acceptance rate defined via the target_acceptance_rate parameter of HMCSampler ’s run method, (cid:15) has to be adjusted during burn-in. Every adjusting-strategy relies on thoughts presented in appendix A connecting the target_acceptance_rate ournal of Statistical Software E . In HMCF the
Epsilon class, much like the
Convergence class, isjust a base class and much more interesting for the common user are its child classes definingexactly how (cid:15) is adjusted.The class also keeps track of how much (cid:15) has changed in recent steps and how close the meanvalue of recent integration errors h ∆ E i is to ∆ E wanted . If (cid:15) has not changed very much and h ∆ E i ≈ ∆ E wanted , Epsilon is said to have converged.If
Epsilon has converged its value is locked.
Available Adjusting Strategies
EpsilonConst (cid:15) stays constant throughout the whole sampling process. The value can beset by setting its val attribute: hmc_sampler.epsilon = EpsilonConsthmc_sampler.epsilon.val = 0.005EpsilonSimple (cid:15) gets reduced or increased if ∆ E is bigger or smaller than ∆ E wanted respec-tively independent of the absolute value of ∆ E .In practice, EpsilonSimple has an attribute change_range ( float between 0 and 1,Default: 0.1), which can be set via: hmc_sampler.epsilon = EpsilonSimplehmc_sampler.epsilon.change_range = 0.2 This attribute is only available in
EpsilonSimple . Given the change_range the currentvalue of (cid:15) is multiplied by a factor drawn from a uniform distribution U ([ a, b ]) where[ a, b ] = ( [1 − change_range ,
1] if ∆
E > ∆ E wanted [1 , change_range ] if ∆ E < ∆ E wanted The randomness is necessary to prevent recurrent behavior if the integration error ∆ E is very sensitive to (cid:15) . EpsilonPowerLaw (cid:15) gets adjusted just like
EpsilonSimple but the change_range is nowdefined by the relative difference between ∆ E and ∆ E wanted ( EpsilonPowerLaw has noattribute change_range !).Given this class’s attribute power (positive int , Default: 5), set via hmc_sampler.epsilon = EpsilonPowerLawhmc_sampler.epsilon.power = 4 the change_range in EpsilonSimple is defined as: change_range = (cid:12)(cid:12)(cid:12)(cid:12) ∆ E − ∆ E wanted ∆ E + ∆ E wanted (cid:12)(cid:12)(cid:12)(cid:12) power (16) EpsilonPowerLawDivergence
In practice working with Poissonian or log-normal distribu-tions on high dimensional spaces the integration error ∆ E proved to be very sensitiveto small changes in (cid:15) . With this class (cid:15) is adjusted just like EpsilonPowerLaw with the8
HMC Sampling for Fields difference, that in case of a divergent ∆ E (e.g. during integration an overflow occurs)the change_range becomes more sensitive.A divergence_counter keeps track of the number of times a divergent behavior was de-tected and the change_range defined in equation (16) gets a prefactor 2 − divergence_counter EpsilonExponential
In this case a simple connection between ∆ E and (cid:15) is assumed: | ∆ E | ( (cid:15) ) = a · (cid:15) b (17)where a > b > E tends to zero if (cid:15) does and diverges for large (cid:15) . Former ‘measurements’, i.e.sampling steps of ∆ E given (cid:15) are used to calculate a and b . This approach asks for arather large value for the locality property (see below). (cid:15) is adjusted by rearrangingequation (17) such that: (cid:15) new = (cid:18) a | ∆ E wanted | (cid:19) b (18) EpsilonOneOverX
In this case another connection between ∆ E and (cid:15) is assumed: | ∆ E | ( (cid:15) ) = a ( (cid:15) − (cid:15) ) b + const (19)where a > b > | ∆ E | ( (cid:15) =0) = 0. The idea behind this relation is an updated version of EpsilonExponential where there is a finite (cid:15) for which ∆ E diverges already. a and b are again fitted withformer ∆ E s given (cid:15) , whereas (cid:15) is set to the current value of (cid:15) every time a divergentbehavior is detected. (cid:15) gets adjusted by rearranging equation (19), such that (cid:15) new = (cid:15) − (cid:15) b ∆ E wanted a ! b (20) Attributes and Properties of
Epsilon
Regardless of which
Epsilon class has been used, additional features can be set via its classproperties, e.g. the locality property which defines the scope of fitting points for classes like
EpsilonExponential and for calculating the convergence measure (see below for details): hmc_sampler.epsilon = EpsilonPowerLawDivergencehmc_sampler.epsilon.locality = 20
For the common user the following properties and attributes are the most important ones: val : float The (current) value of (cid:15) . It is possible to use this property to set a good initial guess(although most of the time unnecessary).Default: 0.005 ournal of Statistical Software locality : int The number of recent samples to be considered in calculating the convergence measureof epsilon .Default: 50 target_acceptance_rate : float Equivalent to the target_acceptance_rate parameter of the
HMCSampler ’s run method.In fact, setting this property only has an effect if the (optional) target_acceptance_rate parameter is not passed to the run method.Default: 0.8 convergence_tolerance : float Essentially the same thing as the tolerance property of
Convergence (section B.2) butfor epsilon
A value > epsilon is calculated (see below)Default: 0.5 divergence_threshold : float The value of ∆ E for which the integration is said to have diverged.Default: 1 E epsilon_limits : list of float Minimum and maximum value for
Epsilon . If the adjusting algorithm proposes a value‘out of range’ the value gets coerced.Default: [1.E-40, 10]
Under the hood
HMCSampler uses the following additional properties of
Epsilon : converged : bool , read-onlyWhether or not epsilon has converged or not, i.e. whether the convergence_measure is smaller than the convergence_tolerancemeasure : list of float , read-onlyThe convergence measure for epsilon contains two values. The first is the relativevariance of the value (cid:15) , the second represents how close the mean value of ∆ E is to∆ E wanted . If both are smaller than convergence_tolerance , epsilon has converged. B.4. Mass
Finding an appropriate mass matrix is the most challenging task for a good HMC sampler.
HMCF provides the user with the standard evaluation procedure introduced in equation (5)as well as the possibility to define a problem specific mass matrix.(Re-)evaluation is done by drawing samples { x ( i ) } from the curvature at the current position ofthe NIFTy
Energy given as potential parameter in the constructor. To keep the complexityof the problem bearable only the diagonal of the mass matrix in equation (5) is calculatedand used: M jk = δ jk N − N X i =1 x ( i ) j ! − (21)0 HMC Sampling for Fields
This also removes the problem that for a non-degenerate mass matrix in n dimensions at least n independent samples are required. For typical applications in NIFTy n = 10 .. easily.The idea behind the HMCF
Mass class is to easily define a strategy of handling the massmatrix of an HMC process. This includes reevaluating the mass matrix several times duringburn-in phase and defining an initial mass matrix. As default the identity is used as massmatrix but an initial mass matrix can be evaluated without reaching any level of convergence.By default there is one such initial mass evaluation and no reevaluations. get_initial_mass : bool Whether or not to evaluate an initial mass. Setting this to
True / False will increase/de-crease reevaluations by 1 respectively.Default:
Truereevaluations : NumPy ndarray of int The number of reevaluations (including the initial one if set) for each chain. Reevalua-tion takes place if the chain has converged with respect to its current convergence levelintroduced in section B.2. Setting the number of reevaluations is also possible (andrecommended) with a simple int which sets all chains to that int .Default: numpy.ones(num_chains) (i.e. one reevaluation for every chain) operator : NIFTy
EndomorphicOperator
The actual mass operator. If there is a problem specific mass operator it can be set withthis property before initiating the run. If get_initial_mass is True , setting operator will set get_initial_mass to False and decrease reevaluations by 1.Default: Identity (as
NIFTy
DiagonalOperator ) num_samples : int Defines the number of samples drawn from the curvature when reevaluating the massmatrix.Default: 50 shared : bool If True reevaluating a mass matrix is done at the mean of all individual chain positions.All chains have to meet the conditions for evaluation mentioned above. Afterwardseach chain gets the same mass matrix. If
False each chain reevaluates its mass matrixindividually if the chain meets the conditions for evaluation.Default:
False
B.5. Display
Naturally HMC sampling can take quite a while. Disadvantageous settings of parametersmight lead to a malfunctioning sampling process. To be able to discover a pathological runcan save hours or even days. For this reason
HMCF offers a number of display modes fordiagnostic purposes. On the other hand displaying indicators of tens of parallel running chainsmight be overwhelming. All available display options have in common that they display moreor less information based on a logging level just like the logging levels in the logging package(Python Software Foundation 2018) of the
Python standard library. It is in fact possible touse the appropriate module variables from the logging package. These are: ournal of Statistical Software CRITICAL (numerical value: ) ERROR (numerical value: ) WARNING (numerical value: ) INFO (numerical value: ) DEBUG (numerical value: ) NOSET (numerical value: )The logging level can be set using the display s method setLevel e.g., from logging import INFOhmc_sampler.display.setLevel(INFO) Similar to the
Epsilon and
Convergence classes there are several
Display classes whichdefine the three displaying modes.
Display : Displays nothing at all. Serves as base class for the two other display classes.
LineByLineDisplay : If the level of the
HMCF logger is set to
INFO or below certain in-dicators are printed at every sampling step as depicted in figure 4a). Otherwise onlywarnings and errors are printed.
TableDisplay : The most advanced version of displaying indicators. A table is generatedand dynamically updated, containing information for each chain as depicted in figure4b). This class relies heavily on the curses
Python package and therefore alters theterminal behavior during sampling.The columns in
TableDisplay display the following parameters: ch The chain number or ‘identifier’. acc. rate
The acceptance rate for each chain. During burn in this is only the accep-tance rate of the last 10 samples since this highly depends on the value of
Epsilon . After burn in acc. rate displays the overall acceptance rate. dEnergy
The most recent value of the integration error ∆ E . Convergenceconv.
Whether the chain has converged with respect to the current convergencelevel. lev
The current convergence level. meas
The maximum value of the convergence level of the chain. quota
The percentage of points in the sampled
NIFTy
Field which have con-verged with respect to the current convergence level.
Epsilonvalue
The current value of
Epsilon . conv. Whether or not
Epsilon has converged with respect to its measure. meas.
The maximum value of the
Epsilon convergence measure.
Additional Methods of
Display
Regardless of which
Display class has been used the quantity of displayed information canbe changed via the setLevel method.2
HMC Sampling for Fields a) LineByLineDisplay with level = DEBUG during burn inb)
TableDisplay with level = DEBUG during burn in
Figure 4 ournal of Statistical Software setLevel(level) This is equivalent to the setLevel function in the
Python package logging . Default: logging.INFO
B.6. Additional Functions
HMCF provides a number of additional functions targeted at easing the handling of theHDF5-files generated during sampling. These files have rather cryptic default names of theform ‘runYYYY-MM-DD_hh-mm-ss.h5’, where Y, M, D, h, m and s represent year, month,day, hour, minute and second digits at the time of the initialization of that run, respectively.There are three functions handling data stored in these files: load , load_mean , load_var The load
Function
The load function takes the following arguments of which only the first is mandatory. Keepin mind that for high dimensional problems the amount of data loaded to memory can easilybe several GiBytes. path : str Either the path to the HDF5 file or a path to the directory where the HDF5 file(s) arestored. If ( path) is a directory, the latest ‘run’ file (with respect to the file name) isloaded. attr_type : str The ‘attribute’ to load. For files generated with the
HMCSampler class possible val-ues are: ’burn_in’ and ’samples’ . For ’burn_in’ the samples generated duringburn in are loaded (of course, this is only possible if the
HMCSampler class attribute save_burn_in_samples was set to
True ). For ’samples’ the samples generated afterburn in are loaded. Default: ’samples’start : int Only loads the samples from step start onward.Default: 0 stop : int , optionalOnly loads the samples up to step stop . Loads until the end if no value is passed step : int Only loads every n th sample, where n is given by step .Default: 1In all cases the function returns a NumPy ndarray if Field s were sampled and
Python dictionariesif
MultiField s were sampled. In the latter case the respective
NumPy ndarray for each in-dividual field can be obtained by using the key used in the
MultiField . In all cases thearray has 2 + n dimensions, where again, the first and second dimension represent chainsand steps respectively. If not all chains have the same number of samples (e.g. attr_type= ’burn_in’ , every chain needs a different number of steps to reach convergence) shorterchains are filled with numpy.nan s in the output array to match the size of the longest chain.4 HMC Sampling for Fields
The load_mean
Function
The load_mean function calculates the mean value based on the samples saved to a HDF5file. path : str Either the path to the HDF5 file or a path to the directory where the HDF5 file(s) arestored. If ( path) is a directory, the latest ‘run’ file (with respect to the file name) isloaded. domain : NIFTy
Domain or NIFTy
MultiDomain , optionalIf domain is given the function output is a
NIFTy
Field or a
NIFTy
MultiField withthe respective domain and the calculated mean as value.The function returns the mean value either as
NumPy ndarray a Python dictionary of
NumPy ndarray s, a
NIFTy
Field or a
NIFTy
MultiField dependent on what was sampled originallyand whether the domain argument is given.
The load_var
Function
The load_var function calculates the variance value based on the samples saved to an HDF5file. path : str Either the path to the HDF5 file or a path to the directory where the HDF5 file(s) arestored. If ( path) is a directory, the latest ’run’ file (with respect to the file name) isloaded. domain : NIFTy
Domain , optionalIf domain is given the function output is a
NIFTy
Field with domain domain and thecalculated variance as value.The function returns the variance either as
NumPy ndarray a Python dictionary of
NumPy ndarray s, a
NIFTy
Field or a
NIFTy
MultiField dependent on what was sampled originallyand whether the domain argument is given.
B.7. Tools
Tools is a sub-module of
HMCF which provides two handy functions to evaluate an alreadyfinished sampling process. The one, show_trajectories , provides a GUI to quickly visualizethe individual Markov chains, the other, get_autocorrelation , calculates the autocorrela-tion of the chains.
The show_trajectories
Function
A simple GUI for visualizing Markov chains based on one- or two-dimensional problems. TheGUI is divided into two graphs as displayed in figure 3. The left one represents the underlyingproblem space i.e. its geometry with either the ‘mean’ value or a similar user-defined field.The right graph displays the trajectory of the different chains for a selected pixel in the left ournal of Statistical Software path : str Either the path to an HDF5 file or a path to the directory where the HDF5 file(s) arestored. If ( path) is a directory, the latest ‘run’ file (with respect to the file name) insaid directory is loaded. reference_field : NIFTy
Field or MultiField , NumPy ndarray or dict(str -> ndarray) ,optionalThe field displayed on the left graph of the GUI. If none is given, the mean value of thesamples is used as reference. field_name : str, optionalIf MultiFields were used, this argument is mandatory and specifies which of the sub-fields is supposed to be displayed. solution : NIFTy
Field or MultiField , NumPy ndarray or dict(str -> ndarray) , op-tionalIn case a ‘right’ answer is available (e.g. a mock data example) it can passed here andis displayed in the trajectories graph as horizontal line as additional reference. start : int Only loads the samples from step start onward.Default: 0 stop : int , optionalOnly loads the samples up to step stop . Loads until the end if no value is passed step : int Only loads every n th sample, where n is given by step .Default: 1 The get_autocorrelation
Function
Calculates the autocorrelation of the samples (after burn in) for a given t , where t is the shiftin auto_corr[ t ] = X i x ( i )¯ x ( i + t ) (22)The function takes the following two arguments: path : str Either the path to an HDF5 file or a path to the directory where the HDF5 file(s) arestored. If ( path) is a directory, the latest ’run’ file (with respect to the file name) insaid directory is loaded. shift : int The shift t as described above.Default: 16 HMC Sampling for Fields field_name : str, optionalIf MultiFields were used, this argument is mandatory and specifies which of the sub-fields is supposed to be used.The function returns a 1+ n dimensional NumPy ndarray where the first dimension representsthe different chains and the other n dimensions the dimensionality of the problem. Affiliation:
Christoph LienhardMax Planck Institut for AstrophysicsKarl-Schwarzschild-Str. 1D-85741 Garching, Germany and
Ludwig-Maximilians-Universität MünchenGeschwister-Scholl-Platz 180539 Munich, Germany and
Technische Universität MünchenArcisstr. 2180333 Munich, GermanyE-mail: [email protected]
Torsten EnßlinMax Planck Institut for AstrophysicsKarl-Schwarzschild-Str. 1D-85741 Garching, Germany and
Ludwig-Maximilians-Universität MünchenGeschwister-Scholl-Platz 180539 Munich, Germany
Journal of Statistical Software published by the Foundation for Open Access Statistics
MMMMMM YYYY, Volume VV, Issue II
Submitted: yyyy-mm-dd doi:10.18637/jss.v000.i00