[PDF] Bayesim: a tool for adaptive grid model fitting with Bayesian inference

Abstract

Bayesian inference is a widely used and powerful analytical technique in fields such as astronomy and particle physics but has historically been underutilized in some other disciplines including semiconductor devices. In this work, we introduce Bayesim, a Python package that utilizes adaptive grid sampling to efficiently generate a probability distribution over multiple input parameters to a forward model using a collection of experimental measurements. We discuss the implementation choices made in the code, showcase two examples in photovoltaics, and discuss general prerequisites for the approach to apply to other systems.

Full PDF

BBayesim: a tool for adaptive grid model ﬁtting with Bayesian inference

Rachel Kurchin, ∗ Giuseppe Romano, and Tonio Buonassisi † Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA

Bayesian inference is a widely used and powerful analytical technique in ﬁelds such as astronomyand particle physics but has historically been underutilized in some other disciplines includingsemiconductor devices. In this work, we introduce bayesim , a Python package that utilizes adaptivegrid sampling to eﬃciently generate a probability distribution over multiple input parameters to aforward model using a collection of experimental measurements. We discuss the implementationchoices made in the code, showcase two examples in photovoltaics, and discuss general prerequisitesfor the approach to apply to other systems.

INTRODUCTION

There are a plethora of examples across diverse scientiﬁc and engineering ﬁelds of mathematical models used tosimulate the results of experimental observations. In many cases, there are input parameters to these models whichare diﬃcult to determine via direct measurement, and it is desirable to invert the numerical model – that is, use theexperimental observations to determine values of the input parameters. Bayesian inference, or Bayesian parameterestimation (BPE) is a fruitful framework within which to do such ﬁtting, since the resulting posterior probabilitydistribution over the parameters of interest can give rich insights into not just the most likely values of the parameters,but also uncertainty about these values and the potentially complicated ways in which they can covary to give equallygood ﬁts to observations. It has been widely used for decades in ﬁelds such as cosmology [17], particle physics [3, 18],and biology [6, 20], with well-developed and widely used code packages specialized to these purposes. [5, 16] However,the use of Bayesian inference in materials science and related contexts is much more recent, having only emerged inthe past few years. [1, 19, 21]We have previously demonstrated the value of a BPE approach in using automated high-throughput temperature-and illumination-dependent current-voltage measurements (

JV T i ) to ﬁt material/interface properties and defectrecombination parameters in photovoltaic (PV) absorbers [1, 9], i.e. as a replacement for direct experimental charac-terization via probabilitistic inversion of a forward model. In cases such as these, when the data model is not a simpleanalytical equation but rather a computationally intensive numerical model, eﬃcient, nonredundant sampling of theparameter space when computing likelihoods becomes critical to making the ﬁt feasible.In this work, we introduce bayesim , a Python-based code that utilizes adaptive grid sampling to perform BPE.It is designed to be easy to use by anyone with basic familiarity with Python. We discuss the structure of thecode, its implementation, and provide several examples of its usage. While the authors’ expertise is in the realmof semiconductor physics and thus the examples herein are drawn from that space, we also discuss the generalcharacteristics of a problem amenable to this approach so that researchers from other ﬁelds might adopt it as well.

TECHNICAL BACKGROUND

Bayes’ Theorem is a relationship between conditional probabilities. It states P ( H | E ) = P ( H ) P ( E | H ) P ( E ) , (1)where the notation P ( A | B ) indicates the probability of A being true given that B is true. H is a hypothesis and E the observed evidence . P ( H ) is termed the prior , P ( E | H ) the likelihood , P ( H | E ) the posterior , and P ( E ) is anormalizing constant. If there are n pieces of evidence, this can generalize to an iterative process where P ( H |{ E , E , ...E n } ) = P ( H |{ E , E ...E n − } ) P ( E n | H ) P ( E n ) , (2)i.e. as each new piece of evidence is fed in, the posterior probability distribution is updated to accommodate it andshould in general become more concentrated into one region of parameter space. In a multidimensional parameterestimation problem, each hypothesis H is a tuple of possible values for the ﬁtting parameters, i.e. a point in the a r X i v : . [ phy s i c s . d a t a - a n ] O c t FIG. 1. a) High-level ﬂowchart of inputs and outputs to bayesim . Observed data as a function of experimental conditionsand modeled data over those same conditions as well as ﬁtting parameters are fed in, and the code produces a probabilitydistribution over the ﬁtting parameter values. b) 2D schematic of adaptive grid sampling approach, illustrating progressivesubdivision in higher-probability regions of parameter space. parameter space, while the evidence E is an observed output of a measurement as a function of various experimentalconditions.In order to compute a likelihood, a model capable of simulating that observed output as a function of both theﬁtting parameters and the experimental conditions is required. In bayesim , likelihoods are calculated for each point inparameter space using a Gaussian where the argument is the diﬀerence between observed and simulated output at thatpoint, and the standard deviation is the sum of experimental uncertainty and model uncertainty. The experimentaluncertainty is a number provided by the user that quantiﬁes any noise/irreproducibility inherent to the measurement,while the model uncertainty is calculated by bayesim and reﬂects the sparseness of the parameter space grid, i.e. howmuch simulated output changes from one grid point to another.A high-level summary of what bayesim does is shown in Figure 1a. The ﬂowchart shows that observed (as a functionof experimental conditions { C } and simulated (as a function of ﬁtting parameters { P } and experimental conditions { C } ) outputs are compared to produce a probability distribution over { P } . Figure 1b schematically indicates theadaptive grid sampling approach for a hypothetical two-dimensional parameter space, wherein grid boxes exceedingsome threshold probability are subdivided and lower-probability regions discarded, allowing attainment of a highﬁtting precision without needing to sample the entire parameter space at the same high density. SOFTWARE ARCHITECTURE AND INTERFACE

The basic structure of bayesim is shown in Figure 2a. Detailed and up-to-date documentation of classes, functions,and example applications is maintained online. [11] The top-level object with which users interact is implemented inthe

Model class. The params module deﬁnes classes to store information about the various types of parameters (ﬁttingparameters, experimental conditions, and measured output) while the

Pmf class stores the probability distributionand implements the manipulations required for Bayesian updates.Figure 2b is a ﬂowchart of using the code, with the main inference loop surrounded by a dotted line. One ﬁrstneeds to provide the observed and modeled data. Observed data is accepted in HDF5 format, while modeled datacan be an HDF5 ﬁle (in which case bayesim will determine the ﬁtting parameter space grid from this ﬁle), or if the

FIG. 2. a) bayesim software structure. b) Detailed ﬂowchart of use of bayesim , with calculation steps and various levels ofiteration indicated. Boxes in blue indicate steps that can or must be done explicitly by the user, grey steps that are done bythe code with behavior tunable by the user, orange diamonds control points for staying in or leaving a loop, and green the ﬁnaloutput. ﬁtting parameter space has been predeﬁned, one can also attach a Python function that computes the model data(e.g. an interface to a numerical model). Observed data should have associated experimental uncertainties, or theuser can indicate a ﬁxed value of uncertainty to use for all observations upon attaching the data.Once modeled data is attached, model uncertainty can be computed. At each point in the space of experimentalconditions, a matrix of the simulated output variable as a function of the ﬁtting parameters is constructed and ateach point in this matrix, the largest diﬀerence along any index is computed. This is set as the model uncertainty forthis set of experimental conditions at this point in parameter space.At this point, the Bayesian inference loop can proceed. We start with a bounded uniform prior distribution overthe entire parameter space grid and the observed data in a randomized order. We feed in one piece of evidence(i.e. one observed data point) and compute the likelihood using a Gaussian. The standard deviation for each pointin the parameter space is the sum of the experimental uncertainy of that observation and the model uncertaintyat that point. The posterior distribution is computed as a Bayesian update on the prior. Next, we check whetherthe threshold condition for concentration of probability (deﬁned as at least some percentage (by default 90%) of theprobability mass residing in at most some percentage (by default 5%) of the parameter space volume. If this thresholdis satisﬁed, the current posterior is stored and the prior reset to a uniform distribution. If not, the next observation isfed in. This procedure is repeated until the requisite number (by default, 80%) of observed data points has been used,with the ﬁnal posterior being the average of every stored posterior. The posterior is calculated in this way (ratherthan a Bayesian multiplication through every single observed point) to avoid numerical errors that can arise if verysmall numbers are multiplied together. This procedure of running the Bayesian loop multiple times until the desirednumber of pieces of evidence are used is encapsulated in the

Model.run() function.Now the grid can be subdivided. The user deﬁnes a threshold probability (by default 0.001) for a grid box to possess,and all boxes with greater than this probability, as well as all their immediate neighbors, are divided in two alongevery parameter. bayesim saves a list of the new model calculations that need to be done for inference to proceedagain, and the user provides these, again either as a callable or an HDF5 ﬁle. For the next round of inference, theuser can also deﬁne a “relaxation” parameter from 0 to 1.0 (default 0) that will mix in some fraction of the posteriorfrom the previous step with the uniform distribution over this new grid.This overall procedure can be repeated as many times as the user wishes to get a more precise ﬁt – eventually, theconcentration of probability becomes limited by experimental rather than model uncertainty. There is also an optionto deﬁne a minimum width for a box along any parameter direction, and if a box is already less than this width, bayesim will not subdivide in that direction.The most ﬂexible way to interact with bayesim is via Python scripting or through literate programming in a Jupyternotebook. There is also a command line interface (CLI) for users less familiar with coding in Python planned forfuture release. bayesim relies on a variety of external open-source packages. These include numpy [15] and scipy [8] for a varietyof mathematical functions and vectorized implementations, joblib [4] for simple parallelism, deepdish [13] for savingand loading HDF5 ﬁles, pandas [14] for data manipulation, and matplotlib [7] for visualization.

APPLICATION EXAMPLESIdeal Diode Model

As a ﬁrst example, we ﬁt a simple two-parameter model of solar cell current density J (as a function of voltage V and temperature T ) known as the ideal diode model: J ( V, T ) = J L + J (cid:18) exp qVnkT − (cid:19) , (3)where k = 8 . × − eV/K is Boltzmann’s constant, by convention J L (the light current) is negative and J (the saturation current) is positive but strongly dependent on temperature, a dependence we can approximate as: J ≈ B (cid:48) T /n exp (cid:18) − E g nkT (cid:19) , (4)where E g is the zero-temperature bandgap of the material. For this example, we will set this to 1.2 eV and J L to-0.03 mA/cm . There are thus two parameters to be ﬁt, the coeﬃcient B (cid:48) (which can range over a wide variety ofvalues) and the ideality factor n (which is by deﬁnition between 1 and 2).As validation, we generate the “observations” using the same model we’ll use for ﬁtting, with parameter values of n = 1 .

36 and B (cid:48) = 258, and four temperature values of 150, 200, 250, and 300 K. These “observations” are shownin Figure 3a. When bayesim imports observed data, it can optionally discard data points that are very close by toeach other in order to reduce computational load on modeling less informative points – the observed points that wereactually imported (59 out of 305 total “observations”) are shown as overlaid scatter dots.Figure 3b shows the posterior distributions after the ﬁrst inference run and after each round of subdivision,demonstrating that bayesim gets progressively closer to the “true” values as the loop continues iterating. In addition,on the ﬁrst inference run (prior to any subdivision), an average of 45% of grid points had larger model uncertaintythan experimental uncertainty, while after the second subdivision, only 1.9% did, while the average absolute error ofthe highest-probability model decreased from 0.03 after the ﬁrst run to 0.007 after the second subdivision.In addition, the initial grid had 400 boxes, the singly subdivided grid 164, and the ﬁnal grid (4x initial resolution)had 144, meaning the overall additional computational eﬀort to quadruple the ﬁt precision was less than producing theinitial simulation set. In comparison, using a “brute-force” grid approach, reaching 4x precision would take 4 n timesthe computational load in n dimensions, i.e. 16x for this case. The exact factor of improvement is problem-dependent,but in all cases the load is signiﬁcantly less than the exponential scaling of the prior approach. SnS Solar Cell

This example demonstrates a more realistic application of bayesim , i.e. when the model being used to computelikelihoods is more computationally expensive. In this case, we will repeat the analysis from Ref. [1], using SCAPS [2](a coupled drift-diﬀusion/Poisson solver) as the model, which takes 10-15 core-seconds for a single J-V sweep.We start on a signiﬁcantly coarser grid than Ref. [1] and show only one level of subdivision to make the examplefeasible to run on a personal computer. Figure 4 shows the results from this ﬁt done in bayesim compared tothe higher-resolution grid analysis done previously. Note that the inference step in full-resolution grid took a fewhours, while the adaptive grid ﬁt takes a few minutes. While the ﬁt precision is of course lower at only one levelof subdivision, the extracted values are generally in agreement and more importantly, the relationship between themobility and lifetime (namely, that the product, i.e. the diﬀusion length), is constrained, emerges just as clearly. The

FIG. 3. a) “Observations” generated by ideal diode model, Equation 3, with overlaid scatter to show which points were actuallyimported and used by bayesim , b) PMF’s after the ﬁrst inference run, second (one subdivision), and third (two subdivision),in blue, orange, and green, respectively. Values used to generate data indicated in red.

FIG. 4. PMFs produced by bayesim , after initial run and after one subdivision (blue and green, respectively). In red, PMFfrom Reference [1]. Inset: device structure (illuminated from above) with bulk SnS ( µ , τ ﬁt) and SnS/Zn(O,S) interface (∆ E c , S eﬀ ﬁt) indicated. diﬀerences in ﬁtted values stem from the fact that the prior published analysis didn’t account for model uncertainty,while this feature is included in bayesim . CONCLUSIONS

We have introduced bayesim , a Python-based adaptive grid sampled Bayesian inference code which is ﬂexible tothe model used to calculated likelihood as well as the parameter space over which to ﬁt. We provided two examples ofapplication of the code in the space of characterizing PV devices using JVT(i) measurements, as this is the authors’area of expertise. However, nothing about the code or the approach limits it to application in PV materials. Anysituation where there is: • a forward model that captures the relevant physics (particularly one that is relatively expensive to evaluate),and • a set of measurements of the output of that model as a function of several experimental conditionswould be amenable to bayesim . The online documentation page [11] maintains a list of examples which will growas the authors become aware of/implement more. The latest version of the code can always be found in the Githubrepository [10], and the most recent stable and thoroughly-tested version is installable via PyPI [12]. The onlinedocumentation also hosts other useful information, such as detailed documentation of every class and function in bayesim , a more extensive technical background section than is included here, a list of planned future features, andan updated manual for usage of the code. ACKNOWLEDGEMENTS

R.C.K. acknowledges the funding of a Blue Waters Graduate Fellowship. This work was supported by a TOTALSA research grant funded through MITei, the MIT Energy Initiative. ∗ [email protected] † [email protected][1] Riley E Brandt, Rachel C Kurchin, Vera Steinmann, Daniil Kitchaev, Chris Roat, Sergiu Levcenco, Gerbrand Ceder,Thomas Unold, Tonio Buonassisi, and Helmholtz Zentrum Berlin. Rapid semiconductor device characterization throughBayesian parameter estimation. Joule , 1(4):843–856, 2017. doi:10 . . joule . . . .[2] Marc Burgelman, P Nollet, and S Degrave. Modelling polycrystalline semiconductor solar cells. Thin Solid Films , 361–362:527–532, 2000. doi:10 . .[3] Jorge de Blas, Marco Ciuchini, Enrico Franco, Diptimoy Ghosh, Satoshi Mishima, Maurizio Pierini, Laura Reina, andLuca Silvestrini. Global bayesian analysis of the higgs-boson couplings. Nuclear and Particle Physics Proceedings , 273-275:834 – 840, 2016. 37th International Conference on High Energy Physics (ICHEP). doi:https://doi . org/10 . . nuclphysbps . . . .[4] Joblib developers. Joblib. https://joblib . readthedocs . io/en/latest/ .[5] F. Feroz, M. P. Hobson, and M. Bridges. Multinest: an eﬃcient and robust bayesian inference toolfor cosmology and particle physics. Monthly Notices of the Royal Astronomical Society , 398(4):1601–1614,2009. arXiv:/oup/backfile/content public/journal/mnras/398/4/10 . . . . . x/2/mnras0398-1601 . pdf , doi:10 . . . . . x .[6] John P. Huelsenbeck, Fredrik Ronquist, Rasmus Nielsen, and Jonathan P. Bollback. Bayesian inference of phylogeny and itsimpact on evolutionary biology. Science , 294(5550):2310–2314, 2001. arXiv:http://science . sciencemag . org/content/294/5550/2310 . full . pdf , doi:10 . . .[7] J. D. Hunter. Matplotlib: A 2d graphics environment. Computing In Science & Engineering , 9(3):90–95, 2007. doi:10 . . . .[8] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientiﬁc tools for Python, 2001. URL: . scipy . org/ .[9] Rachel C. Kurchin, Jeremy R. Poindexter, Daniil Kitchaev, Ville V¨ah¨anissi, Carlos del Ca˜nizo, Liu Zhe, Hannu S. Laine,Chris Roat, Sergiu Levcenco, Gerbrand Ceder, and Tonio Buonassisi. Semiconductor parameter extraction via current-voltage characterization and bayesian inference methods. In Proceedings of the 2018 7th World Conference on PhotovoltaicEnergy Conversion . IEEE, 6 2018.[10] Rachel C. Kurchin and Giuseppe Romano. Bayesim. https://github . com/pv-lab/bayesim .[11] Rachel C. Kurchin and Giuseppe Romano. bayesim. https://pv-lab . github . io/bayesim/ build/html/index . html , 2018.[12] Rachel C. Kurchin and Giuseppe Romano. bayesim. https://pypi . org/project/bayesim , 2018.[13] Gustav Larsson and Mark Stoehr. deepdish. https://github . com/uchicago-cs/deepdish .[14] Wes McKinney. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference ,pages 51–56, 2010. [15] Travis E. Oliphant.

A guide to NumPy . Trelgol Publishing, 2006.[16] Sjors H.W. Scheres. Relion: Implementation of a bayesian approach to cryo-em structure determination.

Journal ofStructural Biology , 180(3):519 – 530, 2012. doi:https://doi . org/10 . . jsb . . . .[17] Roberto Trotta. Bayes in the sky: Bayesian inference and model selection in cosmology. Contemporary Physics , 49(2):71–104, 2008. arXiv:https://doi . org/10 . , doi:10 . .[18] Roberto Trotta, Farhan Feroz, Mike Hobson, Leszek Roszkowski, and Roberto Ruiz de Austri. The impact of priors andobservables on parameter inferences in the constrained mssm. Journal of High Energy Physics , 2008(12):024, 2008. URL: http://stacks . iop . org/1126-6708/2008/i=12/a=024 .[19] Tsuyoshi Ueno, Trevor David Rhone, Zhufeng Hou, Teruyasu Mizoguchi, and Koji Tsuda. Combo: An eﬃcientbayesian optimization library for materials science. Materials Discovery , 4:18 – 21, 2016. doi:https://doi . org/10 . . md . . . .[20] Darren J. Wilkinson. Bayesian methods in bioinformatics and computational systems biology. Brieﬁngs in Bioinformatics ,8(2):109–116, 2007. arXiv:/oup/backfile/content public/journal/bib/8/2/10 . . pdf , doi:10 . .[21] Dezhen Xue, Prasanna V. Balachandran, Ruihao Yuan, Tao Hu, Xiaoning Qian, Edward R. Dougherty, and Turab Look-man. Accelerated search for batio3-based piezoelectrics with vertical morphotropic phase boundary using bayesian learning. Proceedings of the National Academy of Sciences , 113(47):13301–13306, 2016. . pnas . org/content/113/47/13301 . full . pdf , doi:10 . .1607412113