[PDF] Removing background and estimating a unit height of atomic steps from a scanning probe microscopy image using a statistical model

Abstract

We present a statistical method to remove background and estimate a unit height of atomic steps of an image obtained using a scanning probe microscope. We adopt a mixture model consisting of multiple statistical distributions to describe an image. This statistical approach provides a comprehensive way to subtract a background surface even in the presence of atomic steps as well as to evaluate terrace heights in a single framework. Moreover, it also enables us to extract further quantitative information by introducing additional prior knowledge about the image. An example of this extension is estimating a unit height of atomic steps together with the terrace heights. We demonstrate the capability of our method for a topographic image of a Cu(111) surface taken using a scanning tunneling microscope. The background subtraction corrects all terraces to be parallel to a horizontal plane and the precision of the estimated unit height reaches the order of a picometer. An open-source implementation of our method is available on the web.

Full PDF

RRemoving background and estimating a unit height of atomic steps from a scanningprobe microscopy image using a statistical model

Yuhki Kohsaka

RIKEN Center for Emergent Matter Science, Wako, Saitama 351-0198,Japan a) (Dated: 2 March 2021) We present a statistical method to remove background and estimate a unit height ofatomic steps of an image obtained using a scanning probe microscope. We adopt amixture model consisting of multiple statistical distributions to describe an image.This statistical approach provides a comprehensive way to subtract a backgroundsurface even in the presence of atomic steps as well as to evaluate terrace heightsin a single framework. Moreover, it also enables us to extract further quantitativeinformation by introducing additional prior knowledge about the image. An exampleof this extension is estimating a unit height of atomic steps together with the terraceheights. We demonstrate the capability of our method for a topographic image ofa Cu(111) surface taken using a scanning tunneling microscope. The backgroundsubtraction corrects all terraces to be parallel to a horizontal plane and the preci-sion of the estimated unit height reaches the order of a picometer. An open-sourceimplementation of our method is available on the web. a) Electronic mail: [email protected] a r X i v : . [ phy s i c s . i n s - d e t ] F e b . INTRODUCTION Scanning probe microscopes are ubiquitously used to measure and analyze various prop-erties . The probe is driven typically by piezo actuators that are deformed by applyingvoltages. This actuation mechanism realizes high spatial resolution down to atomic reso-lution. At the same time, however, observed images are often distorted due to non-linearresponse, hysteresis, and creep of piezo actuators as well as thermal drift and ﬂuctuationsin the environment around a scanning head. Such distortions are obstacles to subsequentanalysis and must be corrected.The image correction in the surface-normal direction, or background subtraction, is usu-ally performed using a least-squares ﬁt. It is simple and well-deﬁned in the absence of atomicsteps. In the presence of atomic steps, however, even such an elementary subtraction cannot be done in a straightforward manner. One needs to classify image pixels as belongingto one of the terraces in advance of the background subtraction. Even if one can do so,a subtracted result is subject to the rather arbitrary classiﬁcation of pixels. Nevertheless,the background subtraction in the presence of atomic steps is of quantitative importance toevaluate the heights of atomic steps, which not only carry a crucial information to identifysurface terminations but also are exploited to calibrate piezo actuators in the surface-normaldirection.Here we present a statistical method for the background subtraction working regardless ofthe presence or absence of atomic steps. The key idea is a mixture model consisting of a linearcombination of multiple statistical distributions to describe an image. By incorporating abackground surface into the model as parameters of the distributions, the labeling of terraces,subtraction of a background surface, and estimation of terrace heights are all simultaneouslyachieved as a result of maximum likelihood estimation. Moreover, the statistical approachenables us to estimate another quantity by introducing a prior information. We estimatea unit height of atomic steps as an exemplary extension. The unit height is an inter-planedistance between adjacent planes normal to the surface and is a dimension necessary forthe calibration of the piezo actuator in the surface-normal direction. We apply our methodto a topographic image of a Cu(111) surface taken with a scanning tunneling microscope.We demonstrate that the background is well subtracted and that the unit height can beestimated with high precision at the picometer order. Our code, implemented not only for2he demonstration but also for generic images and written in Python, is available as anopen-source software . II. METHODSA. Model

To formulate our model, we begin with a simple situation: no step, no tilt, and nodistortion (Fig. 1(a)). Corrugations in an image then solely stem from atoms, impurities,and random noise. The heights distribute around a central value and a histogram of theheights can be approximated by the probability density function of a statistical distribution f (Fig. 1(b)). As a basic assumption, we regard that height at each pixel is independentlygenerated from f . Let t n be height at n -th pixel of an image, where n is numbered from 1to N in the chronological order of a measurement. N is the total number of pixels in theimage. The likelihood function to obtain the image is p ( t | µ, σ ) = N (cid:89) n =1 f ( t n | µ, σ ) , (1)where t = ( t , · · · , t N ) (cid:62) is a vector representation of the image, and µ and σ are the locationand scaling parameters of f , respectively. Although f can generally have more parameters,we consider the normal distribution and the Cauchy distribution in this paper for simplicity,written as f ( t n | µ, σ ) =  πσ ) / exp (cid:26) − ( t n − µ ) σ (cid:27) (normal)1 π σ ( t n − µ ) + σ (Cauchy) (2) µ and σ mean height of the terrace and magnitude of corrugations of the terrace, respectively.The normal distribution is considered because of its relevance to least-squares ﬁt and theCauchy distribution is chosen as a heavy-tail distribution, the importance of which we willsee later.Next we consider steps and terraces. Suppose that there are M terraces diﬀerent in heightin the image (Fig. 1(c) and Fig. 1(d)). Steps and terraces are introduced by extending Eq. (1)to a linear combination of f , p ( t | µ , σ , π ) = N (cid:89) n =1 M (cid:88) m =1 π m f ( t n | µ m , σ m ) , (3)3 σ μ μ μ σ σ σ g log g poly (a)(c)(e) (b)(d) FIG. 1. A schematic introduction of our model. The blue curves in (a), (c), and (e) representtopographic images. (a) No step, no tilt, and no distortion. (c) Steps are included in (a). (e) Atilt and a distortion are included in (c). The orange broken lines depict the tilt and the distortion.(b), (d) Histograms of (a) and (c), respectively. The orange curves indicate the distributions f inEq. (1) and (cid:80) m π m f in Eq. (3), respectively. where µ = ( µ µ · · · µ M ) (cid:62) and σ = ( σ σ · · · σ M ) (cid:62) are the location and scale parametersof terraces, respectively. π = ( π π · · · π M ) (cid:62) are mixing coeﬃcients satisfying constraints π m > , M (cid:88) m =1 π m = 1 . (4)Now we include a tilt and distortions. We consider positional and chronological terms(Fig. 1(e)). As positional terms, we adopt a polynomial surface, g poly ( w ; r n ) = w (cid:62) φ ( r n ),where w = ( w , w , w , · · · ) (cid:62) and φ ( x, y ) = (1 , x, y, x , xy, y , · · · ) (cid:62) are polynomial coeﬃ-cients and bases, respectively, and r n = ( x n , y n ) (cid:62) is the lateral position of the n -th pixel.The linear term denotes a tilt and the higher-order terms represents non-linear distortions.Chronological terms describe creep of a piezo actuator diminishing logarithmically in time , g log ( A , τ ; n ) = (cid:80) Jj =1 A j ln( n + τ j ), where A = ( A A · · · A J ) (cid:62) and τ = ( τ τ · · · τ J ) (cid:62) p ( t | θ ) = N (cid:89) n =1 M (cid:88) m =1 π m f mn , (5) f mn = f ( t n | µ m + g poly ( w ; r n ) + g log ( A , τ ; n ) , σ m ) , (6)where θ is a set of parameters, θ = { µ , σ , w , A , τ , π } , introduced for brevity of notation.The degree that n -th pixel belongs in terrace m is given by the responsibility , γ mn = π m f mn (cid:80) Mm (cid:48) =1 π m (cid:48) f m (cid:48) n . (7)The responsibility not only works to label terraces but also plays an important role forcalculation of the maximum likelihood estimation as described below. B. Background subtraction and unit height of atomic steps

Background subtraction, or removal of the tilt and distortions, is achieved by t n − g poly ( ˆ w ; r n ) − g log ( ˆ A , ˆ τ ; n ) , (8)where the parameters with the hat ( ˆ w , ˆ A , and ˆ τ ) are those maximizing the likelihoodfunction Eq. (5), i.e. , the maximum likelihood estimate,ˆ θ ≡ { ˆ µ , ˆ σ , ˆ w , ˆ A , ˆ τ , ˆ π } = arg max θ p ( t | θ ) . (9)Since all the parameters are simultaneously optimized, we can estimate terrace heights andlabel terraces together with the background subtraction. When M = 1 and f is a normaldistribution, this procedure is equivalent to a standard least-squares ﬁt . In this sense, ourmodel includes least-squares ﬁt as a special case.To estimate a unit height of atomic steps, we introduce a prior probability for µ m . Since µ m is a height of a terrace, it is expected to be close to an integer multiple of a unit height c with an appropriate oﬀset µ , µ m ∼ j m c + µ , (10)where j m is an integer. This is equivalent tocos 2 π ( µ m − µ ) c ∼ . (11)5herefore, a prior probability for µ m suited to estimate c is the von Mises distribution, p ( µ m | c , ϕ , κ ) = 12 πI ( κ ) exp (cid:26) κ cos (cid:18) πµ m c − ϕ (cid:19)(cid:27) , (12)where ϕ and κ are the location parameter and the concentration parameter, respectively,and I the modiﬁed Bessel function of order 0. κ is a hyperparameter controlling strengthof periodicity of µ m . An estimate of c is obtained by maximizing the posterior probability, (cid:110) ˆ θ , ˆ c , ˆ ϕ (cid:111) = arg max θ ,c ,ϕ p ( t | θ ) M (cid:89) m =1 p ( µ m | c , ϕ , κ ) . (13) C. Optimization

The technical goals for the background subtraction and the unit height estimation are toﬁnd a maximum likelihood solution (Eq. (9)) and a maximum posterior solution (Eq. (13)),respectively. The expectation-maximization (EM) algorithm, which is an iterative methodrepeating an expectation (E) step and a maximization (M) step until convergence, is apowerful method for this purpose . We outline the procedure of optimization using theEM algorithm speciﬁcally as follows.1. Choose initial parameters. This is critical to prevent the iteration loop below frombeing caught by a meaningless local maximum. As it is not trivial to directly determinea set of parameters (cid:8) w (1) , µ (1) , σ (1) , π (1) (cid:9) , instead deﬁne an initial responsibility γ (0) mn by setting γ (0) mn = 1 if n -th pixel is likely to belong in terrace m . Details are providedin Appendix A.2. Calculate the objective function, which is log likelihood ln p ( t | θ ) for the backgroundsubtraction and log posterior ln p ( t | θ ) p ( µ | c , ϕ , κ ) for the unit height estimation.3. E step. Calculate the responsibility γ mn (Eq. (7)) with the current set of parameters θ ( i ) , where i is the iteration index starting from 1.4. M step. For the background subtraction, ﬁnd a new set of parameters given by θ ( i +1) = arg max θ L ( θ , γ mn ) , (14) L ( θ , γ mn ) = M (cid:88) m =1 N (cid:88) n =1 γ mn { ln π m + ln f mn } + λ (cid:32) M (cid:88) m =1 π m − (cid:33) , (15)6here λ is a Lagrange multiplier to take account of the constraint Eq. (4). For theunit height estimation, new parameters are (cid:110) θ ( i +1) , c ( i +1)0 , ϕ ( i +1)0 (cid:111) = arg max θ ,c ,ϕ (cid:40) L ( θ , γ mn ) + N M (cid:88) m =1 ln p ( µ m | c , ϕ , κ ) (cid:41) . (16)A solution is obtained using a gradient ascent method. We adopt ADAM . For thebackground subtraction, a solution of Eq. (14) is analytically obtained when g log = 0.This solution is practically important as an initial value for general cases because itcan be obtained quickly. See Appendix B for details.5. Recalculate the objective function. If the value does not satisfy a convergence criteria,return to (3). D. Experiments

The Cu(111) surface for the demonstration below was prepared in a ultra-high vacuumchamber by repeating Ar sputtering and subsequent annealing at 770 K. We measure it witha scanning tunneling microscope at 4.3 K. The tip was electrochemically etched tungstenwire cleaned by electron beam heating and a ﬁeld ion microscope. All heights in the followingare nominal values with a rough calibration using the conventional method of line proﬁles.

III. RESULTS AND DISCUSSION

To test and demonstrate the capability of our method, we apply it to an image measuredwith a scanning tunneling microscope. Figure 2(a) shows a constant-current image of aCu(111) surface taken in a ﬁeld of view of 200 nm . The terraces 5 and 5 (cid:48) are supposedto be the same height but shown in diﬀerent colors due to a tilt of the sample. Creep ofthe piezo actuator in the direction normal to the surface is prominent around the bottomof the image where the scan started. Both the tilt and the creep are highlighted in the lineproﬁle shown in Fig. 2(b). Because of the tilt and the distortion, the histogram of the image(Fig. 2(c)) shows four broad peaks even though ﬁve terraces diﬀerent in height exist in theimage.The initial responsibility γ (0) mn (Sec. II C) is set by clustering pixels such that each clusterconsists of pixels in a terrace. This is done by clustering neighboring pixels if their diﬀerence7 FIG. 2. (a) A 200 nm constant-current image of Cu(111). The sample bias voltage is -0.1 V andthe tunneling current is 10 pA. The numbers denote terraces. The line shows a trajectory of a lineproﬁle of (b) and the arrow indicates the direction in which the line proﬁle is sampled. (b) A lineproﬁle taken along the line shown in (a). (c) A histogram of the image shown in (a). in height is less than a threshold (Fig. 3). γ (0) mn at the pixels of the terrace 5 (cid:48) is intentionallyleft at 0 so that we can verify that the pixels of the terraces 5 and 5 (cid:48) are classiﬁed in thesame class.With the initial responsibility, we examine several diﬀerent models. The models areconstructed by combinations of a normal distribution or a Cauchy distribution for the modeldistribution, a linear plane or a quadratic surface for the polynomial term, and no or twologarithmic decay terms . Figure 4 shows the optimized responsibility γ mn (Eq. (7)) of thediﬀerent models. At colored pixels, γ mn exceeds a threshold for a certain m that is depictedby the color code, whereas γ mn does not exceed the threshold for any m at white pixels. The8 (0)1 n γ (0)2 n γ (0)3 n γ (0)4 n γ (0)5 n FIG. 3. The initial responsibility γ (0) mn obtained by clustering. Pixels set to 1 are shown in black.The clustering threshold is 10 pm. threshold is chosen to be 0.99, meaning that a colored pixel is suggested to belong almostexclusively in a single terrace. We note that the terraces 5 and 5 (cid:48) are shown in the samecolor and the other terraces are shown in diﬀerent colors. The independence of this featureto the details of the models indicates the aptitude of the statistical approach.Meanwhile, the diﬀerences between γ mn shown in Fig. 4 suggest that the model distribu-tion must be chosen carefully. A behavior expected for an appropriate model is that pixelsaround impurities and steps are not classiﬁed exclusively into a terrace. The two distribu-tions are clearly diﬀerent in this regard even for the simplest background, a linear plane andno decay as shown in Fig. 4(a) and 4(b). White pixels are found as expected in Fig. 4(b).Moreover, given the creep of the piezo actuator shown in Fig. 2(b), the white pixels aroundthe bottom of Fig. 4(b) are also expected for models without the decay. In contrast, whitepixels are barely found in Fig. 4(a) and impurities in the terrace 4 (red) are classiﬁed intothe terrace 5 (purple). The deviation from the requisite behaviors found in Fig. 4(a) isascribed to sensitiveness of the normal distribution to outliers. The diﬀerences between thetwo distributions are enhanced by including more parameters in the model. For the Cauchydistribution, the white pixels near the top-right corner of Fig. 4(b) disappear by including aquadratic surface as shown in Fig. 4(d) and those around the bottom disappear by furtherincluding logarithmic decays as shown in Fig. 4(f) while pixels around impurities and steps9 m (a) (b)Normal Cauchy po l y ( ) po l y ( ) po l y ( ) , l og (c) (d)(e) (f) FIG. 4. The optimized responsibility γ mn for diﬀerent models. The model distribution is a normaldistribution for the left column (a, c, and e) and a Cauchy distribution for the right column (b,d, and f). The background is a linear plane for the top row (a and b), a quadratic surface for themiddle row (c and d), and a quadratic surface and log decays for the bottom row (e and f). Thecolor code denotes m satisfying γ mn > .

99. Namely, γ mn > .

99 for a certain m at colored pixels, γ mn ≤ .

99 for all m at white pixels. are kept white. For the normal distribution, however, such systematic improvements are notrealized (Fig. 4(c) and 4(e)). These distinct behaviors suggest that a heavy-tail distributionis suited as a model distribution. Hereafter we focus on the model of Fig. 4(f).A background-subtracted image (Eq. (8)) obtained from the model of Fig. 4(f) is shownin Fig. 5(a). The background is well subtracted as highlighted in the ﬂat line proﬁle of10 .10.0-0.1 H e i gh t ( n m ) P r obab ili t y -0.4 -0.2 0.0 0.2 0.4Height (nm)-0.4-0.20.00.2 H e i gh t ( n m )

50 nm(a)(b)(c)

FIG. 5. (a) A background-subtracted image. The model is the same as that of Fig. 4(e). The rawimage is the same as that of Fig. 2(a). The line shows the same trajectory shown in Fig. 2(a). (b)A line proﬁle sampled along the line and in direction of the arrow shown in (a). The range of thevertical axis is the same as that of Fig. 2(b). (c) A histogram of the image shown in (a). The binwidth is the same as that of Fig. 2(c).

Fig. 5(b). The histogram of the subtracted image (Fig. 5(c)) shows sharp 5 peaks at almostevenly spaced heights. The stark contrast between the histograms of Fig. 2(c) and Fig. 5(c)signiﬁes the importance of background subtraction for estimating the unit height of steps.To ﬁnd an appropriate range of the concentration parameter κ for the estimation ofEq. (13), we performed a grid search of the estimated unit height ˆ c together with theestimated terrace height ˆ µ for various κ and initial value of the unit height c (1)0 . The resultfor ˆ c is shown in Fig. 6(a) and 6(b), and for ˆ µ in Fig. 6(c) where we plot a measure of11 .220.200.18 c ( ) ( n m ) c ( n m )ˆ c ( ) ( n m ) c ( n m )ˆ c ( ) ( n m ) -15 -10 -5 0 5log κ D ( p m ) -15 -10 -5 0 5log κ (a)(b)(c) (d)(e)(f) FIG. 6. (a), (d) The estimated unit height of steps. The models are the same as used for ﬁgures 4(f)and 4(e), respectively. (b), (e) The same data as (a) and (d), respectively. The range of color scaleis adjusted to emphasize variations near the central value. (c), (f) The diﬀerence between ˆ µ and µ (1) deﬁned in Eq. (17). The models are the same as those of (a) and (d), respectively. A commonrange of color scale is used for each row of panels. diﬀerence between ˆ µ and the initial value of µ deﬁned by D = (cid:16)(cid:12)(cid:12) ˆ µ − µ (1) (cid:12)(cid:12) /M (cid:17) / . (17)We used the maximum likelihood estimate of µ obtained for Fig. 4(f) as µ (1) . For small κ ( κ < − ), the estimated results are almost the same as the initial values, ˆ c ∼ c (1)0 and D ∼

0, because little prior information is provided ( p ( µ | c , ϕ , κ ) ∼ I ( κ )). For large κ ( κ > ), ˆ c exhibits preference for c (1)0 probably because the optimization process stoppedat a local maximum as a result of strong coupling between c and µ . In the wide middlerange of κ (10 − < κ < ), ˆ c is insensitive to both κ and c (1)0 . This robustness enablesus to safely conclude that the estimated unit height for this image is ∼ c and D in the range of 10 < κ < reﬂect increase of coupling between c and µ . The conclusion is unchanged even if we use the model of Fig. 4(d), as shown inFig. 6(d)–(f). However, the parameter range where ˆ c is robustly estimated in Fig. 6(d) isnarrower than that in Fig. 6(a). The robustness therefore provides an indication whether12

00 nm 50 nm50 nm50 nm 100 nm 50 nm100 nm50 nm 100 nm50 nm(a) (b) (c) (d) (e)(f) (g) (h) (i) (j)

FIG. 7. Images used to examine variation of the estimated unit height. The top panels (a)-(e) areraw images and the bottom panels (f)-(j) are corresponding subtracted images. The scan areasare 400 nm for (c), (d), (h), and (i), and 200 nm for the rest. The estimated unit heights are0.204 nm for (a), 0.205 nm for (b), and 0.206 nm for (c)-(e). the model is reﬁned enough.We note that the estimated unit height is not accurate because of the rough calibrationof the piezo actuator used for the measurement but indicates the precision of our methodfor a given image. To examine variation of the estimated unit height, we repeated the sameanalysis for diﬀerent images taken using diﬀerent tips but at the same temperature (Fig. 7).We ﬁnd that the unit height is estimated to be 0.205 ± . IV. CONCLUSION

We have demonstrated a statistical method to remove background and to estimate a unitheight of steps from a topographic image of a scanning tunneling microscope. Our methoddescribes an image using a mixture model. Corrugations of a terrace are represented by astatistical distribution and multiple terraces are expressed by a linear combination of thedistributions. This statistical approach enables us to label terraces, subtract background,and estimate terrace heights. Since nothing speciﬁc to scanning tunneling microscopesis assumed, our method can be generally applicable to topographic images taken by any13ther scanning probe microscopes. Another virtue of our method is that additional priorknowledge is easily incorporated into the model. We have estimated a unit height of atomicsteps using this extension and showed that the precision of the estimated unit height reachesthe order of a picometer. This is just an example and further extensions are possible. Forexample, when a thin ﬁlm is grown on a substrate, ﬁlm thickness can be evaluated byintroducing another prior distribution. We therefore believe that our method not onlyimproves the subtraction method and calibration precision of scanning probe microscopesbut also provides a framework to extract quantitative information.

DATA AVAILABILITY

The data that support the ﬁndings of this study are available from the correspondingauthor upon reasonable request.

ACKNOWLEDGMENTS

The author appreciates Christopher J. Butler for valuable suggestions and critical read-ing of the manuscript and also acknowledges discussions with Tetsuo Hanaguri, TadashiMachida, and Yuuki Yasui. This work was supported by JSPS KAKENHI Grant number20H05280, and by MEXT as ”Program for Promoting Researches on the SupercomputerFugaku” (Basic Science for Emergence and Functionality in Quantum Matter –InnovativeStrongly-Correlated Electron Science by Integration of ”Fugaku” and Frontier Experiments–) (Project ID: hp200132).

Appendix A: Initial parameters

Initial parameters for the EM iteration loop are determined from the initial responsibility γ (0) mn as follows.1. Calculate average height of each terrace for pixels where γ (0) mn = 1, µ (0) m = (cid:80) Nn =1 γ (0) mn t n (cid:80) Nn =1 γ (0) mn (A1)14. Calculate the least-squares solution of w for each terrace for pixels where γ (0) mn = 1 byusing Eq. (B4), (B5), and (B6), w (0) m = (cid:0) A (0) m (cid:1) − b (0) m , (A2) A (0) m = N (cid:88) n =1 γ (0) mn φ ( r n ) φ ( r n ) (cid:62) , (A3) b (0) m = N (cid:88) n =1 γ (0) mn (cid:0) t n − µ (0) m (cid:1) φ ( r n ) . (A4)3. An initial value of w is given as a weighted average of w (0) m , w (1) = (cid:80) Mm =1 (cid:80) Nn =1 γ (0) mn w (0) k (cid:80) Mm =1 (cid:80) Nn =1 γ (0) mn (A5)4. Initial values of µ m , σ m , and π m are obtained as the least-squares solution by usingEq. (B1), (B2), and (B3), µ (1) m = (cid:80) Nn =1 γ (0) mn (cid:8) t n − w (1) (cid:62) φ ( r n ) (cid:9)(cid:80) Nn =1 γ (0) mn , (A6) σ (1) m =  (cid:80) Nn =1 γ (0) mn (cid:110) t n − µ (1) m − w (1) (cid:62) φ ( r n ) (cid:111) (cid:80) Nn =1 γ (0) mn  / , (A7) π (1) m = (cid:80) Nn =1 γ (0) mn (cid:80) Mm =1 (cid:80) Nn =1 γ (0) mn , (A8) Appendix B: Analytical solution

Since we consider either a normal distribution or a Cauchy distribution as the modeldistribution f , the solution of Eq. (14) is analytically obtained by solving ∇ θ L = 0 if the15hronological distortion g log = 0. When f is a normal distribution, the solution is µ m = (cid:80) Nn =1 γ mn (cid:8) t n − w (cid:62) φ ( r n ) (cid:9)(cid:80) Nn =1 γ mn , (B1) σ m = (cid:34) (cid:80) Nn =1 γ mn (cid:8) t n − µ m − w (cid:62) φ ( r n ) (cid:9) (cid:80) Nn =1 γ mn (cid:35) / , (B2) π m = (cid:80) Nn =1 γ mn (cid:80) Mm =1 (cid:80) Nn =1 γ mn , (B3) w = A − b , (B4) A = M (cid:88) m =1 N (cid:88) n =1 γ mn φ ( r n ) φ ( r n ) (cid:62) , (B5) b = M (cid:88) m =1 N (cid:88) n =1 γ mn ( t n − µ m ) φ ( r n ) . (B6)Note that µ m and w are mutually included in their solutions. Each element of θ ( i +1) isprovided as µ ( i +1) m = (cid:80) Nn =1 γ mn (cid:8) t n − w ( i ) (cid:62) φ ( r n ) (cid:9)(cid:80) Nn =1 γ mn , (B7) σ ( i +1) m =  (cid:80) Nn =1 γ mn (cid:110) t n − µ ( i +1) m − w ( i ) (cid:62) φ ( r n ) (cid:111) (cid:80) Nn =1 γ mn  / , (B8) w ( i +1) = A − b ( i +1) , (B9) b ( i +1) = M (cid:88) m =1 N (cid:88) n =1 γ mn (cid:0) t n − µ ( i +1) m (cid:1) φ ( r n ) . (B10)When f is a Cauchy distribution, the solution is, µ m = (cid:80) Nn =1 ˜ γ mn (cid:8) t n − w (cid:62) φ ( r n ) (cid:9)(cid:80) Nn =1 ˜ γ mn , (B11) σ m = (cid:80) Nn =1 γ mn π (cid:80) Nn =1 ˜ γ mn , (B12) A = M (cid:88) m =1 N (cid:88) n =1 ˜ γ mn σ m φ ( r n ) φ ( r n ) (cid:62) , (B13) b = M (cid:88) m =1 N (cid:88) n =1 ˜ γ mn σ m ( t n − µ m ) φ ( r n ) , (B14)˜ γ mn = γ mn f (cid:0) t n (cid:12)(cid:12) µ m + w (cid:62) φ ( r n (cid:1) , σ m ) . (B15)16 m and w are given as the same formulae as those of a normal distribution, (B3) and (B4),respectively. Each element of θ ( i +1) is provided as˜ γ ( i +1) mn = γ mn f (cid:0) t n (cid:12)(cid:12) µ ( i ) m + w ( i ) (cid:62) φ ( r n ) , σ ( i ) m (cid:1) (B16) µ ( i +1) m = (cid:80) Nn =1 ˜ γ ( i +1) mn (cid:8) t n − w ( i ) (cid:62) φ ( r n ) (cid:9)(cid:80) Nn =1 ˜ γ ( i +1) mn , (B17) σ ( i +1) m = (cid:80) Nn =1 γ mn π (cid:80) Nn =1 ˜ γ ( i +1) mn , (B18) w ( i +1) = (cid:0) A ( i +1) (cid:1) − b ( i +1) , (B19) A ( i +1) = M (cid:88) m =1 N (cid:88) n =1 ˜ γ ( i +1) mn σ ( i +1) m φ ( r n ) φ ( r n ) (cid:62) , (B20) b ( i +1) = M (cid:88) m =1 N (cid:88) n =1 ˜ γ ( i +1) mn σ ( i +1) m (cid:0) t n − µ ( i +1) m (cid:1) φ ( r n ) . (B21) REFERENCES R. Wiesendanger,

Scanning Probe Microscopy and Spectroscopy: Methods and Applications (Cambridge University Press, Cambridge, 1994). E. Meyer, H. J. Hug, and R. Bennewitz,

Scanning Probe Microscopy: The Lab on a Tip (Springer, New York, 2004). https://github.com/yuksk/sati . C. M. Bishop,

Pattern Recognition and Machine Learning (Springer, New York, 2006). A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from IncompleteData via the EM Algorithm,” J. Roy. Statist. Soc. Ser. B , 1–38 (1977). D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” (2017),arXiv:1412.6980. Two decay terms are considered because the piezo actuator of the scanning tunnelingmicroscope used to take the image is split into two sectors for motion in the surface-normaldirection. Y. J. Song, A. F. Otte, V. Shvarts, Z. Zhao, Y. Kuk, S. R. Blankenship, A. Band, F. M.Hess, and J. A. Stroscio, “A 10 mK scanning probe microscopy facility,” Rev. Sci. Inst. , 121101 (2010). 17 S. C. White, U. R. Singh, and P. Wahl, “A stiﬀ scanning tunneling microscopy head formeasurement at low temperatures and in high magnetic ﬁelds,” Rev. Sci. Inst. , 113708(2011). M. Assig, M. Etzkorn, A. Enders, W. Stiepany, C. R. Ast, and K. Kern, “A 10 mK scanningtunneling microscope operating in ultra high vacuum and high magnetic ﬁelds,” Rev. Sci.Inst. , 033903 (2013). S. Misra, B. B. Zhou, I. K. Drozdov, J. Seo, L. Urban, A. Gyenis, S. C. J. Kingsley,H. Jones, and A. Yazdani, “Design and performance of an ultra-high vacuum scanningtunneling microscope operating at dilution refrigerator temperatures and high magneticﬁelds,” Rev. Sci. Inst. , 103903 (2013). A. Roychowdhury, M. A. Gubrud, R. Dana, J. R. Anderson, C. J. Lobb, F. C. Wellstood,and M. Dreyer, “A 30 mK, 13.5 T scanning tunneling microscope with two independenttips,” Rev. Sci. Inst. , 043706 (2014). H. Von Allw¨orden, A. Eich, E. J. Knol, J. Hermenau, A. Sonntag, J. W. Gerritsen, D. Weg-ner, and A. A. Khajetoorians, “Design and performance of an ultra-high vacuum spin-polarized scanning tunneling microscope operating at 30 mK and in a vector magneticﬁeld,” Rev. Sci. Inst. , 033902 (2018). T. Machida, Y. Kohsaka, and T. Hanaguri, “A scanning tunneling microscope for spectro-scopic imaging below 90 mK in magnetic ﬁelds up to 17.5 T,” Rev. Sci. Inst. , 093707(2018). T. Balashov, M. Meyer, and W. Wulfhekel, “A compact ultrahigh vacuum scanning tun-neling microscope with dilution refrigeration,” Rev. Sci. Inst. , 113707 (2018). I. Battisti, G. Verdoes, K. van Oosten, K. M. Bastiaans, and M. P. Allan, “Deﬁnitionof design guidelines, construction, and performance of an ultra-stable scanning tunnelingmicroscope for spectroscopic imaging,” Rev. Sci. Inst. , 123705 (2018). S. Y. Guan, H. S. Liao, B. J. Juang, S. C. Chin, T. M. Chuang, and C. S. Chang, “Thedesign and the performance of an ultrahigh vacuum 3He fridge-based scanning tunnelingmicroscope with a double deck sample stage for in-situ tip treatment,” Ultramicroscopy196