Bayesim: a tool for adaptive grid model fitting with Bayesian inference
BBayesim: a tool for adaptive grid model fitting with Bayesian inference
Rachel Kurchin, ∗ Giuseppe Romano, and Tonio Buonassisi † Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
Bayesian inference is a widely used and powerful analytical technique in fields such as astronomyand particle physics but has historically been underutilized in some other disciplines includingsemiconductor devices. In this work, we introduce bayesim , a Python package that utilizes adaptivegrid sampling to efficiently generate a probability distribution over multiple input parameters to aforward model using a collection of experimental measurements. We discuss the implementationchoices made in the code, showcase two examples in photovoltaics, and discuss general prerequisitesfor the approach to apply to other systems.
INTRODUCTION
There are a plethora of examples across diverse scientific and engineering fields of mathematical models used tosimulate the results of experimental observations. In many cases, there are input parameters to these models whichare difficult to determine via direct measurement, and it is desirable to invert the numerical model – that is, use theexperimental observations to determine values of the input parameters. Bayesian inference, or Bayesian parameterestimation (BPE) is a fruitful framework within which to do such fitting, since the resulting posterior probabilitydistribution over the parameters of interest can give rich insights into not just the most likely values of the parameters,but also uncertainty about these values and the potentially complicated ways in which they can covary to give equallygood fits to observations. It has been widely used for decades in fields such as cosmology [17], particle physics [3, 18],and biology [6, 20], with well-developed and widely used code packages specialized to these purposes. [5, 16] However,the use of Bayesian inference in materials science and related contexts is much more recent, having only emerged inthe past few years. [1, 19, 21]We have previously demonstrated the value of a BPE approach in using automated high-throughput temperature-and illumination-dependent current-voltage measurements (
JV T i ) to fit material/interface properties and defectrecombination parameters in photovoltaic (PV) absorbers [1, 9], i.e. as a replacement for direct experimental charac-terization via probabilitistic inversion of a forward model. In cases such as these, when the data model is not a simpleanalytical equation but rather a computationally intensive numerical model, efficient, nonredundant sampling of theparameter space when computing likelihoods becomes critical to making the fit feasible.In this work, we introduce bayesim , a Python-based code that utilizes adaptive grid sampling to perform BPE.It is designed to be easy to use by anyone with basic familiarity with Python. We discuss the structure of thecode, its implementation, and provide several examples of its usage. While the authors’ expertise is in the realmof semiconductor physics and thus the examples herein are drawn from that space, we also discuss the generalcharacteristics of a problem amenable to this approach so that researchers from other fields might adopt it as well.
TECHNICAL BACKGROUND
Bayes’ Theorem is a relationship between conditional probabilities. It states P ( H | E ) = P ( H ) P ( E | H ) P ( E ) , (1)where the notation P ( A | B ) indicates the probability of A being true given that B is true. H is a hypothesis and E the observed evidence . P ( H ) is termed the prior , P ( E | H ) the likelihood , P ( H | E ) the posterior , and P ( E ) is anormalizing constant. If there are n pieces of evidence, this can generalize to an iterative process where P ( H |{ E , E , ...E n } ) = P ( H |{ E , E ...E n − } ) P ( E n | H ) P ( E n ) , (2)i.e. as each new piece of evidence is fed in, the posterior probability distribution is updated to accommodate it andshould in general become more concentrated into one region of parameter space. In a multidimensional parameterestimation problem, each hypothesis H is a tuple of possible values for the fitting parameters, i.e. a point in the a r X i v : . [ phy s i c s . d a t a - a n ] O c t FIG. 1. a) High-level flowchart of inputs and outputs to bayesim . Observed data as a function of experimental conditionsand modeled data over those same conditions as well as fitting parameters are fed in, and the code produces a probabilitydistribution over the fitting parameter values. b) 2D schematic of adaptive grid sampling approach, illustrating progressivesubdivision in higher-probability regions of parameter space. parameter space, while the evidence E is an observed output of a measurement as a function of various experimentalconditions.In order to compute a likelihood, a model capable of simulating that observed output as a function of both thefitting parameters and the experimental conditions is required. In bayesim , likelihoods are calculated for each point inparameter space using a Gaussian where the argument is the difference between observed and simulated output at thatpoint, and the standard deviation is the sum of experimental uncertainty and model uncertainty. The experimentaluncertainty is a number provided by the user that quantifies any noise/irreproducibility inherent to the measurement,while the model uncertainty is calculated by bayesim and reflects the sparseness of the parameter space grid, i.e. howmuch simulated output changes from one grid point to another.A high-level summary of what bayesim does is shown in Figure 1a. The flowchart shows that observed (as a functionof experimental conditions { C } and simulated (as a function of fitting parameters { P } and experimental conditions { C } ) outputs are compared to produce a probability distribution over { P } . Figure 1b schematically indicates theadaptive grid sampling approach for a hypothetical two-dimensional parameter space, wherein grid boxes exceedingsome threshold probability are subdivided and lower-probability regions discarded, allowing attainment of a highfitting precision without needing to sample the entire parameter space at the same high density. SOFTWARE ARCHITECTURE AND INTERFACE
The basic structure of bayesim is shown in Figure 2a. Detailed and up-to-date documentation of classes, functions,and example applications is maintained online. [11] The top-level object with which users interact is implemented inthe
Model class. The params module defines classes to store information about the various types of parameters (fittingparameters, experimental conditions, and measured output) while the
Pmf class stores the probability distributionand implements the manipulations required for Bayesian updates.Figure 2b is a flowchart of using the code, with the main inference loop surrounded by a dotted line. One firstneeds to provide the observed and modeled data. Observed data is accepted in HDF5 format, while modeled datacan be an HDF5 file (in which case bayesim will determine the fitting parameter space grid from this file), or if the
FIG. 2. a) bayesim software structure. b) Detailed flowchart of use of bayesim , with calculation steps and various levels ofiteration indicated. Boxes in blue indicate steps that can or must be done explicitly by the user, grey steps that are done bythe code with behavior tunable by the user, orange diamonds control points for staying in or leaving a loop, and green the finaloutput. fitting parameter space has been predefined, one can also attach a Python function that computes the model data(e.g. an interface to a numerical model). Observed data should have associated experimental uncertainties, or theuser can indicate a fixed value of uncertainty to use for all observations upon attaching the data.Once modeled data is attached, model uncertainty can be computed. At each point in the space of experimentalconditions, a matrix of the simulated output variable as a function of the fitting parameters is constructed and ateach point in this matrix, the largest difference along any index is computed. This is set as the model uncertainty forthis set of experimental conditions at this point in parameter space.At this point, the Bayesian inference loop can proceed. We start with a bounded uniform prior distribution overthe entire parameter space grid and the observed data in a randomized order. We feed in one piece of evidence(i.e. one observed data point) and compute the likelihood using a Gaussian. The standard deviation for each pointin the parameter space is the sum of the experimental uncertainy of that observation and the model uncertaintyat that point. The posterior distribution is computed as a Bayesian update on the prior. Next, we check whetherthe threshold condition for concentration of probability (defined as at least some percentage (by default 90%) of theprobability mass residing in at most some percentage (by default 5%) of the parameter space volume. If this thresholdis satisfied, the current posterior is stored and the prior reset to a uniform distribution. If not, the next observation isfed in. This procedure is repeated until the requisite number (by default, 80%) of observed data points has been used,with the final posterior being the average of every stored posterior. The posterior is calculated in this way (ratherthan a Bayesian multiplication through every single observed point) to avoid numerical errors that can arise if verysmall numbers are multiplied together. This procedure of running the Bayesian loop multiple times until the desirednumber of pieces of evidence are used is encapsulated in the
Model.run() function.Now the grid can be subdivided. The user defines a threshold probability (by default 0.001) for a grid box to possess,and all boxes with greater than this probability, as well as all their immediate neighbors, are divided in two alongevery parameter. bayesim saves a list of the new model calculations that need to be done for inference to proceedagain, and the user provides these, again either as a callable or an HDF5 file. For the next round of inference, theuser can also define a “relaxation” parameter from 0 to 1.0 (default 0) that will mix in some fraction of the posteriorfrom the previous step with the uniform distribution over this new grid.This overall procedure can be repeated as many times as the user wishes to get a more precise fit – eventually, theconcentration of probability becomes limited by experimental rather than model uncertainty. There is also an optionto define a minimum width for a box along any parameter direction, and if a box is already less than this width, bayesim will not subdivide in that direction.The most flexible way to interact with bayesim is via Python scripting or through literate programming in a Jupyternotebook. There is also a command line interface (CLI) for users less familiar with coding in Python planned forfuture release. bayesim relies on a variety of external open-source packages. These include numpy [15] and scipy [8] for a varietyof mathematical functions and vectorized implementations, joblib [4] for simple parallelism, deepdish [13] for savingand loading HDF5 files, pandas [14] for data manipulation, and matplotlib [7] for visualization.
APPLICATION EXAMPLESIdeal Diode Model
As a first example, we fit a simple two-parameter model of solar cell current density J (as a function of voltage V and temperature T ) known as the ideal diode model: J ( V, T ) = J L + J (cid:18) exp qVnkT − (cid:19) , (3)where k = 8 . × − eV/K is Boltzmann’s constant, by convention J L (the light current) is negative and J (the saturation current) is positive but strongly dependent on temperature, a dependence we can approximate as: J ≈ B (cid:48) T /n exp (cid:18) − E g nkT (cid:19) , (4)where E g is the zero-temperature bandgap of the material. For this example, we will set this to 1.2 eV and J L to-0.03 mA/cm . There are thus two parameters to be fit, the coefficient B (cid:48) (which can range over a wide variety ofvalues) and the ideality factor n (which is by definition between 1 and 2).As validation, we generate the “observations” using the same model we’ll use for fitting, with parameter values of n = 1 .
36 and B (cid:48) = 258, and four temperature values of 150, 200, 250, and 300 K. These “observations” are shownin Figure 3a. When bayesim imports observed data, it can optionally discard data points that are very close by toeach other in order to reduce computational load on modeling less informative points – the observed points that wereactually imported (59 out of 305 total “observations”) are shown as overlaid scatter dots.Figure 3b shows the posterior distributions after the first inference run and after each round of subdivision,demonstrating that bayesim gets progressively closer to the “true” values as the loop continues iterating. In addition,on the first inference run (prior to any subdivision), an average of 45% of grid points had larger model uncertaintythan experimental uncertainty, while after the second subdivision, only 1.9% did, while the average absolute error ofthe highest-probability model decreased from 0.03 after the first run to 0.007 after the second subdivision.In addition, the initial grid had 400 boxes, the singly subdivided grid 164, and the final grid (4x initial resolution)had 144, meaning the overall additional computational effort to quadruple the fit precision was less than producing theinitial simulation set. In comparison, using a “brute-force” grid approach, reaching 4x precision would take 4 n timesthe computational load in n dimensions, i.e. 16x for this case. The exact factor of improvement is problem-dependent,but in all cases the load is significantly less than the exponential scaling of the prior approach. SnS Solar Cell
This example demonstrates a more realistic application of bayesim , i.e. when the model being used to computelikelihoods is more computationally expensive. In this case, we will repeat the analysis from Ref. [1], using SCAPS [2](a coupled drift-diffusion/Poisson solver) as the model, which takes 10-15 core-seconds for a single J-V sweep.We start on a significantly coarser grid than Ref. [1] and show only one level of subdivision to make the examplefeasible to run on a personal computer. Figure 4 shows the results from this fit done in bayesim compared tothe higher-resolution grid analysis done previously. Note that the inference step in full-resolution grid took a fewhours, while the adaptive grid fit takes a few minutes. While the fit precision is of course lower at only one levelof subdivision, the extracted values are generally in agreement and more importantly, the relationship between themobility and lifetime (namely, that the product, i.e. the diffusion length), is constrained, emerges just as clearly. The
FIG. 3. a) “Observations” generated by ideal diode model, Equation 3, with overlaid scatter to show which points were actuallyimported and used by bayesim , b) PMF’s after the first inference run, second (one subdivision), and third (two subdivision),in blue, orange, and green, respectively. Values used to generate data indicated in red.
FIG. 4. PMFs produced by bayesim , after initial run and after one subdivision (blue and green, respectively). In red, PMFfrom Reference [1]. Inset: device structure (illuminated from above) with bulk SnS ( µ , τ fit) and SnS/Zn(O,S) interface (∆ E c , S eff fit) indicated. differences in fitted values stem from the fact that the prior published analysis didn’t account for model uncertainty,while this feature is included in bayesim . CONCLUSIONS
We have introduced bayesim , a Python-based adaptive grid sampled Bayesian inference code which is flexible tothe model used to calculated likelihood as well as the parameter space over which to fit. We provided two examples ofapplication of the code in the space of characterizing PV devices using JVT(i) measurements, as this is the authors’area of expertise. However, nothing about the code or the approach limits it to application in PV materials. Anysituation where there is: • a forward model that captures the relevant physics (particularly one that is relatively expensive to evaluate),and • a set of measurements of the output of that model as a function of several experimental conditionswould be amenable to bayesim . The online documentation page [11] maintains a list of examples which will growas the authors become aware of/implement more. The latest version of the code can always be found in the Githubrepository [10], and the most recent stable and thoroughly-tested version is installable via PyPI [12]. The onlinedocumentation also hosts other useful information, such as detailed documentation of every class and function in bayesim , a more extensive technical background section than is included here, a list of planned future features, andan updated manual for usage of the code. ACKNOWLEDGEMENTS
R.C.K. acknowledges the funding of a Blue Waters Graduate Fellowship. This work was supported by a TOTALSA research grant funded through MITei, the MIT Energy Initiative. ∗ [email protected] † [email protected][1] Riley E Brandt, Rachel C Kurchin, Vera Steinmann, Daniil Kitchaev, Chris Roat, Sergiu Levcenco, Gerbrand Ceder,Thomas Unold, Tonio Buonassisi, and Helmholtz Zentrum Berlin. Rapid semiconductor device characterization throughBayesian parameter estimation. Joule , 1(4):843–856, 2017. doi:10 . . joule . . . .[2] Marc Burgelman, P Nollet, and S Degrave. Modelling polycrystalline semiconductor solar cells. Thin Solid Films , 361–362:527–532, 2000. doi:10 . .[3] Jorge de Blas, Marco Ciuchini, Enrico Franco, Diptimoy Ghosh, Satoshi Mishima, Maurizio Pierini, Laura Reina, andLuca Silvestrini. Global bayesian analysis of the higgs-boson couplings. Nuclear and Particle Physics Proceedings , 273-275:834 – 840, 2016. 37th International Conference on High Energy Physics (ICHEP). doi:https://doi . org/10 . . nuclphysbps . . . .[4] Joblib developers. Joblib. https://joblib . readthedocs . io/en/latest/ .[5] F. Feroz, M. P. Hobson, and M. Bridges. Multinest: an efficient and robust bayesian inference toolfor cosmology and particle physics. Monthly Notices of the Royal Astronomical Society , 398(4):1601–1614,2009. arXiv:/oup/backfile/content public/journal/mnras/398/4/10 . . . . . x/2/mnras0398-1601 . pdf , doi:10 . . . . . x .[6] John P. Huelsenbeck, Fredrik Ronquist, Rasmus Nielsen, and Jonathan P. Bollback. Bayesian inference of phylogeny and itsimpact on evolutionary biology. Science , 294(5550):2310–2314, 2001. arXiv:http://science . sciencemag . org/content/294/5550/2310 . full . pdf , doi:10 . . .[7] J. D. Hunter. Matplotlib: A 2d graphics environment. Computing In Science & Engineering , 9(3):90–95, 2007. doi:10 . . . .[8] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001. URL: . scipy . org/ .[9] Rachel C. Kurchin, Jeremy R. Poindexter, Daniil Kitchaev, Ville V¨ah¨anissi, Carlos del Ca˜nizo, Liu Zhe, Hannu S. Laine,Chris Roat, Sergiu Levcenco, Gerbrand Ceder, and Tonio Buonassisi. Semiconductor parameter extraction via current-voltage characterization and bayesian inference methods. In Proceedings of the 2018 7th World Conference on PhotovoltaicEnergy Conversion . IEEE, 6 2018.[10] Rachel C. Kurchin and Giuseppe Romano. Bayesim. https://github . com/pv-lab/bayesim .[11] Rachel C. Kurchin and Giuseppe Romano. bayesim. https://pv-lab . github . io/bayesim/ build/html/index . html , 2018.[12] Rachel C. Kurchin and Giuseppe Romano. bayesim. https://pypi . org/project/bayesim , 2018.[13] Gustav Larsson and Mark Stoehr. deepdish. https://github . com/uchicago-cs/deepdish .[14] Wes McKinney. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference ,pages 51–56, 2010. [15] Travis E. Oliphant.
A guide to NumPy . Trelgol Publishing, 2006.[16] Sjors H.W. Scheres. Relion: Implementation of a bayesian approach to cryo-em structure determination.
Journal ofStructural Biology , 180(3):519 – 530, 2012. doi:https://doi . org/10 . . jsb . . . .[17] Roberto Trotta. Bayes in the sky: Bayesian inference and model selection in cosmology. Contemporary Physics , 49(2):71–104, 2008. arXiv:https://doi . org/10 . , doi:10 . .[18] Roberto Trotta, Farhan Feroz, Mike Hobson, Leszek Roszkowski, and Roberto Ruiz de Austri. The impact of priors andobservables on parameter inferences in the constrained mssm. Journal of High Energy Physics , 2008(12):024, 2008. URL: http://stacks . iop . org/1126-6708/2008/i=12/a=024 .[19] Tsuyoshi Ueno, Trevor David Rhone, Zhufeng Hou, Teruyasu Mizoguchi, and Koji Tsuda. Combo: An efficientbayesian optimization library for materials science. Materials Discovery , 4:18 – 21, 2016. doi:https://doi . org/10 . . md . . . .[20] Darren J. Wilkinson. Bayesian methods in bioinformatics and computational systems biology. Briefings in Bioinformatics ,8(2):109–116, 2007. arXiv:/oup/backfile/content public/journal/bib/8/2/10 . . pdf , doi:10 . .[21] Dezhen Xue, Prasanna V. Balachandran, Ruihao Yuan, Tao Hu, Xiaoning Qian, Edward R. Dougherty, and Turab Look-man. Accelerated search for batio3-based piezoelectrics with vertical morphotropic phase boundary using bayesian learning. Proceedings of the National Academy of Sciences , 113(47):13301–13306, 2016. . pnas . org/content/113/47/13301 . full . pdf , doi:10 . .1607412113