Ideal-Theoretic Strategies for Asymptotic Approximation of Marginal Likelihood Integrals
aa r X i v : . [ s t a t . C O ] F e b Ideal-Theoretic Strategies forAsymptotic Approximation ofMarginal Likelihood Integrals
Shaowei Lin
Abstract
The accurate asymptotic evaluation of marginal likelihood integrals isa fundamental problem in Bayesian statistics. Following the approach in-troduced by Watanabe, we translate this into a problem of computationalalgebraic geometry, namely, to determine the real log canonical thresholdof a polynomial ideal, and we present effective methods for solving thisproblem. Our results are based on resolution of singularities. They applyto parametric models where the Kullback-Leibler distance is upper andlower bounded by scalar multiples of some sum of squared real analyticfunctions. Such models include finite state discrete models.
Keywords: computational algebra, asymptotic approximation, marginallikelihood, learning coefficient, real log canonical threshold
The evaluation of marginal likelihood integrals is essential in model selection andhas important applications in areas such as machine learning and computationalbiology. The exact evaluation of such integrals is a difficult problem [9, 21] andclassical approximation formulas usually apply only for smooth models. Recentwork by Watanabe and his collaborators [1,27–30] extended these formulas to abroad class of models with singularities. His work also uncovered interesting con-nections with resolution of singularities in algebraic geometry. The goal of thispaper is to systematically study the algebraic geometry behind Watanabe’s for-mulas, and to develop symbolic algebra tools which allow the user to accuratelyevaluate the asymptotics of integrals in Bayesian statistics.Watanabe showed that the key to understanding a singular model is monomi-alizing the Kullback-Leibler function K ( ω ) of the model at the true distribution.While general algorithms exist for monomializing any analytic function [4,7], ap-plying them to non-polynomial functions such as K ( ω ) can be computationallyexpensive. In practice, many singular models are parametrized by polynomials.Therefore, it is natural to ask if this polynomiality can be exploited in the analy-sis of such models. For simplicity, we explore this question for discrete statisticalmodels. Our point of departure is to describe the asymptotics of the likelihood1ntegral by the real log canonical threshold of an ideal in a polynomial ring. Moregenerally, our results will be proved for rings of analytic functions, and they ap-ply to all parametric models where the Kullback-Leibler distance is upper andlower bounded by scalar multiples of a sum of squared real analytic functions.Consider a statistical model M on a finite discrete space [ k ] = { , , . . . , k } parametrized by a real analytic map p : Ω → ∆ k − where Ω is a compact subsetof R d and ∆ k − is the probability simplex { x ∈ R k : x i ≥ P x i = 1 } . Weassume that Ω is semianalytic , i.e. Ω = { x ∈ R d : g ( x ) ≥ , . . . , g l ( x ) ≥ } isdefined by real analytic inequalities. Let q ∈ ∆ k − be a point in the model withnon-zero entries. Suppose a sample of size N is drawn from the true distribution q , and let U = ( U i ) denote the vector of relative frequencies for this sample. Let ϕ : Ω → R be nearly analytic , i.e. ϕ is a product ϕ a ϕ s of functions where ϕ a isreal analytic and ϕ s is positive and smooth. Consider a Bayesian prior definedby | ϕ | . Priors of this form are discussed in Remark 2.7. We are interested in theasymptotics, for large sample sizes N , of the marginal likelihood integral Z ( N ) = Z Ω k Y i =1 p i ( ω ) NU i | ϕ ( ω ) | dω. (1)The first few terms of the asymptotics of the log likelihood integral log Z ( N )was derived by Watanabe. To state his result, we first recall that the Kullback-Leibler distance K ( ω ) between q and p ( ω ) is K ( ω ) = k X i =1 q i log q i p i ( ω ) . This function satisfies K ( ω ) ≥ p ( ω ) = q . Theorem 1.1 (Watanabe [28, § . Asymptotically as N → ∞ , log Z ( N ) = N k X i =1 U i log q i − λ log N + ( θ −
1) log log N + η N (2) where the positive rational number λ is the smallest pole of the zeta function ζ ( z ) = Z Ω K ( ω ) − z | ϕ ( ω ) | dω, z ∈ C , (3) θ is its multiplicity, and η N is a random variable whose expectation E [ η N ] con-verges to a constant. Here, λ is known as the learning coefficient of the model at the distribution q .Because formula (2) generalizes the Bayesian information criterion [13, 28], thenumbers λ and θ are important in model selection. Indeed, the BIC correspondsto the case ( λ, θ ) = ( d ,
1) for smooth models. In algebraic geometry, λ is alsoknown as the real log canonical threshold [23] of K , a term that is motivated by2he more familiar complex log canonical threshold (see Remark 3.1). We denotethis algebraic invariant by ( λ, θ ) = RLCT Ω ( K ; ϕ ).These thresholds may be defined for ideals in rings of real-valued analyticfunctions as well. Given an ideal I = h f , . . . , f r i generated by functions f i ⊂ R d , and a smooth amplitudefunction ϕ : R d → R , we consider the zeta function ζ ( z ) = Z Ω (cid:16) f ( ω ) + · · · + f r ( ω ) (cid:17) − z/ | ϕ ( ω ) | dω. (4)We show that if ϕ is nearly analytic, then ζ ( z ) has an analytic continuation tothe whole complex plane. Its poles are positive rational numbers with a smallestelement λ which we call the real log canonical threshold of I with respect to ϕ over Ω. Let θ be the multiplicity of λ as a pole of ζ ( z ) and define RLCT Ω ( I ; ϕ )to be the pair ( λ, θ ). Order these pairs such that ( λ , θ ) > ( λ , θ ) if λ > λ , or λ = λ and θ < θ . We will show that this pair does not depend on the choiceof generators f , . . . , f r for I . In the literature, real log canonical thresholds ofideals are not well-investigated [23]. For this reason, we formally prove many ofits properties in Section 3.With these definitions on hand, we now state our first main theorem. Thisresult expresses the learning coefficient and its multiplicity directly in terms ofthe functions p , . . . , p k parametrizing the model. Geometrically, it says that thelearning coefficient is the real log canonical threshold of the fiber p − ( q ) ⊂ Ω.The theorem is computationally very useful especially when the p i are polyno-mials or rational functions, and certain special cases have been applied by SumioWatanabe and his collaborators [29, 30]. Our proof in Section 3 was inspired bya discussion with him. Now, recall that ϕ = ϕ a ϕ s is nearly analytic. Theorem 1.2.
Let ( λ, θ ) be the learning coefficient and multiplicity of the model M at q > . Let I denote the ideal h p ( ω ) − q i := h p ( ω ) − q , . . . , p k ( ω ) − q k i , andlet V be its zero-locus { ω ∈ Ω : p ( ω ) = q } = p − ( q ) . Then, (2 λ, θ ) = min x ∈V RLCT Ω x ( I ; ϕ a ) where each Ω x is a sufficiently small neighborhood of x in Ω .More generally, let K ( ω ) be any real analytic function on Ω that is boundedfor some constants c , c > and some real analytic f i ( ω ) over Ω , by c k X i =1 f i ( ω ) ≤ K ( ω ) ≤ c k X i =1 f i ( ω ) . Then, the real log canonical threshold ( λ, θ ) = RLCT Ω ( K ; ϕ ) satisfies (2 λ, θ ) = RLCT Ω ( I ; ϕ a ) where I is the ideal h f ( ω ) , . . . , f k ( ω ) i .
3o prove this theorem and other properties of real log canonical thresholds,we recall Hironaka’s theorem on the resolution of singularities [16] and developuseful lemmas in Section 2. Our treatment differs from that of Watanabe [28] inthe following way: we study the local behavior of real log canonical thresholdsat points x in the parameter space Ω. In particular, we will be interested in thecase where x is on the boundary ∂ Ω. Example 2.8 is an illustration of how thethreshold is affected by the inequalities g i ≥ x . This issuecan be critical in singular model selection because the parameter space of onemodel is often contained in the boundary of another that is more complex.After studying the local thresholds, we then show that the real log canonicalthreshold globally over Ω is the minimum of local thresholds at points x in Ω.Identifying where these minimum thresholds occur is by itself a difficult problemwhich we discuss in Section 2. As a consequence of our results, we write down ex-plicit formulas for the coefficients in asymptotic expansions of Laplace integrals.Our formulas extend those of Arnol’d–Guse˘ın-Zade–Varchenko [2] because theyapply also to parameter spaces with boundary. Using this expansion to improveapproximations of likelihood integrals will be the subject of future work.Our next aim is to develop tools for computing or bounding real log canoni-cal thresholds of ideals. Section 3 summarizes useful fundamental properties ofreal log canonical thresholds. In Section 4, we derive local thresholds in nonde-generate cases using an important tool from toric geometry involving Newtonpolyhedra. This method was invented by Varchenko [25] and applied to statis-tical models by Watanabe and Yamazaki [30]. Their formulas were defined forfunctions, but we develop extensions of these formulas for ideals. We introducea new notion of nondegeneracy for ideals, known as sos-nondegeneracy , and givethe following bound for the real log canonical threshold of an ideal with respectto a monomial amplitude function ω τ := ω τ · · · ω τ d d . These monomial functionsoccur frequently when we apply a change of variables to resolve the singularitiesin a model. Newton polyhedra and their τ -distances are defined in Section 4. Theorem 1.3.
Let I be a finitely generated ideal in the ring of functions whichare real analytic on Ω , and suppose the origin lies in the interior of Ω . Then,for every sufficiently small neighborhood Ω of the origin, RLCT Ω ( I ; ω τ ) ≤ (1 /l τ , θ τ ) where l τ is the τ -distance of the Newton polyhedron P ( I ) and θ τ its multiplicity.Equality occurs when I is monomial or, more generally, sos-nondegenerate. This theorem has two main consequences. Firstly, it tells us that the real logcanonical threshold of an ideal can be computed by finding a change of variableswhich monomializes the ideal. Secondly, due to Theorems 1.1 and 1.2, upperbounds on real log canonical thresholds translate to asymptotic lower bounds onthe likelihood integral of a statistical model, which in turn give upper boundson the stochastic complexity of the model.Currently, there are no programs for computing real log canonical thresholds.There are applications which compute resolutions of singularities, but our statis-tical problems are too big for them. We hope that our work is a step in bridging4he gap. Some of our tools are implemented in a
Singular library at https://w3id.org/people/shaoweilin/public/rlct.html .This library computes the Newton polyhedron of an ideal, computes τ -distances,and checks if an ideal is sos-nondegenerate. Instructions and examples on usingthe library may be found at the above website.In summary, the learning coefficient of a statistical model is a useful measureof the model complexity and plays an important role in model selection. Becausecomputing this coefficient often requires careful analysis of the Kullback-Leiblerfunction, we propose an ideal-theoretic approach to make this calculation moretractable. This method has several advantages. Firstly, it directly exploits poly-nomiality in the model parametrization. Second, the real log canonical thresholdof an ideal is independent of the choice of generators, and this choice providesflexibility to our computations. Thirdly, it is easier to construct Newton polyhe-dra for polynomial ideals and to check their nondegeneracy (Proposition 3.2(3)),than for nonpolynomial Kullback-Leibler functions. We demonstrate these ideasin Section 5 by computing the learning coefficients of a discrete mixture modelwhich comes from a study involving 132 schizophrenic patients.To introduce some notation, given x ∈ R d , let A x ( R d ) be the ring of real-valued functions f : R d → R that are analytic at x . We sometimes shorten thenotation to A x when it is clear that we are working with the space R d . When x = 0, it is convenient of think of A as a subring of the formal power series ring R [[ ω , . . . , ω d ]] = R [[ ω ]]. It consists of power series which are convergent in someneighborhood of the origin. For all x , A x is isomorphic to A by translation.Given a subset Ω ⊂ R d , let A Ω be the ring of real functions analytic at each point x ∈ Ω. Locally, each function can be represented as a power series centered at x . Given f ∈ A Ω , define the analytic variety V Ω ( f ) = { ω ∈ Ω : f ( ω ) = 0 } whilefor an ideal I ⊂ A Ω , we set V Ω ( I ) = ∩ f ∈ I V Ω ( f ). Lastly, given a finite multiset S ⊂ R , let S denote the number of times the minimum is attained in S . In this section, we introduce Hironaka’s theorem on resolutions of singularities.We derive real log canonical thresholds of monomial functions, and demonstratehow such resolutions allow us to find the thresholds of non-monomial functions.We show that the threshold of a function over a compact set is the minimumof local thresholds, and present an example where the threshold at a boundarypoint depend on the boundary inequalities. We discuss the problem of locatingsingularities with the smallest threshold, and end this section with formulas forthe asymptotic expansion of a Laplace integral.Before we explore real log canonical thresholds of ideals, let us study thoseof functions. Given a compact subset Ω of R d , a real analytic function f ∈ A Ω with f
0, and a smooth function ϕ : R d → R , consider the zeta function ζ ( z ) = Z Ω (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω, z ∈ C . (5)5his function is well-defined for z ∈ R ≤ . If ζ ( z ) can be continued analytically tothe whole complex plane C , then all its poles are isolated points in C . Moreover,if all its poles are real, then there exists a smallest positive pole λ . Let θ be themultiplicity of this pole. The pole λ is the real log canonical threshold of f withrespect to ϕ over Ω. If ζ ( z ) has no poles, we set λ = ∞ and leave θ undefined.Let RLCT Ω ( f ; ϕ ) be the pair ( λ, θ ). By abuse of notation, we sometimes refer tothis pair as the real log canonical threshold of f . We order these pairs such that( λ , θ ) > ( λ , θ ) if λ > λ , or λ = λ and θ < θ . Intuitively, consideringthe asymptotics of log Z ( N ) in Theorem 1.1, the ordering is defined in this wayso that ( λ , θ ) > ( λ , θ ) if and only if λ log N − ( θ −
1) log log
N > λ log N − ( θ −
1) log log N for sufficiently large N . Lastly, let RLCT Ω f denote RLCT Ω ( f ; 1) where 1 is theconstant unit function.We start with a simple class of functions for which it is easy to compute thereal log canonical threshold. It is the class of monomials ω κ · · · ω κ d d = ω κ . Proposition 2.1.
Let κ = ( κ , . . . , κ d ) and τ = ( τ , . . . , τ d ) be vectors of non-negative integers. If Ω is the positive orthant R d ≥ and φ : R d → R is compactlysupported and smooth with φ (0) > , then RLCT Ω ( ω κ ; ω τ φ ) = ( λ, θ ) where λ = min ≤ j ≤ d { τ j + 1 κ j } , θ = ≤ j ≤ d { τ j + 1 κ j } . Proof.
See [2, Lemma 7.3]. The idea is to express φ ( ω ) as T s ( ω ) + R s ( ω ) where T s is the s -th degree Taylor polynomial and R s the difference. We then integratethe main term | f | − z T s explicitly and show that the integral of the remainingterm | f | − z R s does not have smaller poles. This process gives the analytic con-tinuation of ζ ( z ) to the whole complex plane, so we have the Laurent expansion ζ ( z ) = X α> d X i =1 d i,α ( z − α ) i + P ( z ) (6)where the poles α are positive rational numbers and P ( z ) is a polynomial.For non-monomial f ( ω ), Hironaka’s celebrated theorem [16] on the resolutionof singularities tells us that we can always reduce to the monomial case. Here,a d -dimensional real analytic manifold is a topological space (second countableand Hausdorff) that can be covered by charts which are homeomorphic to openballs in R d and where the transition maps between charts are real analytic maps. Theorem 2.2 (Resolution of Singularities) . Let f be a non-constant real ana-lytic function in some neighborhood Ω ⊂ R d of the origin with f (0) = 0 . Then,there exists a triple ( M, W, ρ ) wherea. W ⊂ Ω is a neighborhood of the origin,b. M is a d -dimensional real analytic manifold, . ρ : M → W is a real analytic mapsatisfying the following properties.i. ρ is proper, i.e. the inverse image of any compact set is compact.ii. ρ is a real analytic isomorphism between M \ V M ( f ◦ ρ ) and W \ V W ( f ) .iii. For any y ∈ V M ( f ◦ ρ ) , there exists a local chart M y with coordinates µ = ( µ , µ , . . . µ d ) such that y is the origin and f ◦ ρ ( µ ) = a ( µ ) µ κ µ κ · · · µ κ d d = a ( µ ) µ κ where κ , κ , . . . , κ d are non-negative integers and a is a real analytic func-tion with a ( µ ) = 0 for all µ . Furthermore, the Jacobian determinant equals | ρ ′ ( µ ) | = h ( µ ) µ τ µ τ · · · µ τ d d = h ( µ ) µ τ where τ , τ , . . . , τ d are non-negative integers and h is a real analytic func-tion with h ( µ ) = 0 for all µ . We say that (
M, W, ρ ) is a resolution of singularities or a desingularization of f at the origin. The set of points in M where ρ is not one-to-one is the excep-tional divisor . From properties (i) and (ii), it also follows that ρ is surjective: if x ∈ V W ( f ), we pick a compact neighborhood V of x and a sequence x , x , . . . ofpoints in V \ V W ( f ) converging to x . The sequence can be chosen off the varietybecause the variety has measure zero. Then, the preimages ρ − ( x ) , ρ − ( x ) , . . . contain a converging subsequence with limit y , and ρ ( y ) = x by continuity.Now, let us desingularize a list of functions simultaneously. Corollary 2.3 (Simultaneous Resolutions) . Let f , . . . , f l be non-constant realanalytic functions in some neighborhood Ω ⊂ R d of the origin with all f i (0) = 0 .Then, there exists a triple ( M, W, ρ ) that desingularizes each f i at the origin.Proof. The idea is to desingularize the product f ( ω ) · · · f l ( ω ) and to show thatsuch a resolution of singularities is also a resolution for each f i . See [28, Thm11] and [14, Lemma 2.3] for details.For the rest of this section, let Ω = { ω ∈ R d , g ( ω ) ≥ , . . . , g l ( ω ) ≥ } becompact and semianalytic. We also assume that f, ϕ ∈ A Ω , and that f, g , . . . , g l are not constant functions. Lemma 2.4.
For each x ∈ Ω , there is a neighborhood Ω x of x in Ω such thatfor all smooth functions φ on Ω x with φ ( x ) > , RLCT Ω x ( f ; ϕφ ) = RLCT Ω x ( f ; ϕ ) . Proof.
Let x ∈ Ω. If f ( x ) = 0, then by the continuity of f , there exists a smallneighborhood Ω x where 0 < c < | f ( ω ) | < c for some constants c , c . Hence,for all smooth functions φ , the zeta functions Z Ω x (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) φ ( ω ) | dω and Z Ω x (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω
7o not have any poles, so the lemma follows in this case.Suppose f ( x ) = 0. By Corollary 2.3, we have a simultaneous local resolutionof singularities ( M, W, ρ ) for the functions f, ϕ, g , . . . , g l vanishing at x . Foreach point y in the fiber ρ − ( x ), we have a local chart M y satisfying property (iii)of Theorem 2.2. Since ρ is proper, the fiber ρ − ( x ) is compact so there is a finitesubcover { M y } . We claim that the image ρ ( S M y ) contains a neighborhood W x of x in R d . Indeed, otherwise, there exists a bounded sequence { x , x , . . . } ofpoints in W \ ρ ( S M y ) whose limit is x . We pick a sequence { y , y , . . . } where ρ ( y i ) = x i . Since the x i are bounded, the y i lie in a compact set so there is aconvergent subsequence { ˜ y i } with limit y ∗ . The ˜ y i are not in the open set S M y so nor is y ∗ . But ρ ( y ∗ ) = lim ρ (˜ y i ) = x so y ∗ ∈ ρ − ( x ) ⊂ M y , a contradiction.Now, define Ω x = W x ∩ Ω and let {M y } be the collection of all sets M y = M y ∩ ρ − (Ω x ) which have positive measure. Picking a partition of unity { σ y ( µ ) } subordinate to {M y } such that σ y is positive at y for each y [28, Theorem 6.5],we write the zeta function ζ ( z ) = R Ω x | f ( ω ) | − z | ϕ ( ω ) φ ( ω ) | dω as X y Z M y (cid:12)(cid:12) f ◦ ρ ( µ ) (cid:12)(cid:12) − z | ϕ ◦ ρ ( µ ) || φ ◦ ρ ( µ ) || ρ ′ ( µ ) | σ y ( µ ) dµ. For each y , the boundary conditions g i ◦ ρ ( µ ) ≥ M y is the union of closed orthant neighborhoods of y . The integral over M y is then the sum of integrals of the form ζ y ( z ) = Z R d ≥ µ − κz + τ ψ ( µ ) dµ where κ and τ are non-negative integer vectors while ψ is a compactly supportedsmooth function with ψ (0) >
0. Note that κ and τ do not depend on φ nor onthe choice of orthant at y . By Proposition 2.1, the smallest pole of ζ y ( z ) is λ y = min ≤ j ≤ d { τ j + 1 κ j } , θ y = ≤ j ≤ d { τ j + 1 κ j } . Now, RLCT Ω x ( f ; ϕφ ) = min y { ( λ y , θ y ) } . Since this formula is independent of φ ,we set φ = 1 and the lemma follows. Proposition 2.5.
Let φ : Ω → R be positive and smooth. Then, for sufficientlysmall neighborhoods Ω x , the set { RLCT Ω x ( f ; ϕ ) : x ∈ Ω } has a minimum and RLCT Ω ( f ; ϕφ ) = min x ∈ Ω RLCT Ω x ( f ; ϕ ) . Proof.
Lemma 2.4 associates a small neighborhood to each point in the compactset Ω, so there exists a finite subcover { Ω x : x ∈ S } . Let { σ x ( ω ) } be a smoothpartition of unity subordinate to this subcover where σ x ( x ) > x . Then, Z Ω (cid:12)(cid:12) f (Ω) (cid:12)(cid:12) − z | ϕ ( ω ) φ ( ω ) | dω = X x ∈ S Z Ω x (cid:12)(cid:12) f (Ω) (cid:12)(cid:12) − z | ϕ ( ω ) φ ( ω ) | σ x ( ω ) dω. Ω ( f ; ϕφ ) = min x ∈ S RLCT Ω x ( f ; ϕφσ x ) = min x ∈ S RLCT Ω x ( f ; ϕ ) . Now, if y ∈ Ω \ S , let Ω y be a neighborhood of y prescribed by Lemma 2.4 andconsider the cover { Ω x : x ∈ S } ∪ { Ω y } of Ω. After choosing a partition of unitysubordinate to this cover and repeating the above argument, we getRLCT Ω ( f ; ϕφ ) ≤ RLCT Ω y ( f ; ϕ ) for all y ∈ Ω . Combining the two previously displayed equations proves the proposition.Abusing notation, we now let RLCT Ω x ( f ; ϕ ) represent the real log canonicalthreshold for a sufficiently small neighborhood Ω x of x in Ω. If x is an interiorpoint of Ω, we denote the threshold at x by RLCT x ( f ; ϕ ). Corollary 2.6 (See also [28, § . Given a compact semianalytic set Ω ⊂ R d ,a nearly analytic function ϕ : Ω → R , and f ∈ A Ω satisfying f ( x ) = 0 for some x ∈ Ω , the zeta function (5) can be continued analytically to C . It has a Laurentexpansion (6) whose poles are positive rational numbers with a smallest element.Proof. The proofs of Lemma 2.4 and Proposition 2.5 outline a way to computethe Laurent expansion of the zeta function (5).
Remark 2.7.
In our definition of real log canonical thresholds, we consideredintegrals with respect to densities | ϕ ( ω ) | dω for some nearly analytic function ϕ , while Watanabe only considers the special case where the density is ϕ ( ω ) dω for some smooth positive function ϕ . Our general case includes the situationwhere the absolute value of a Jacobian determinant is multiplied to the densityunder a change of variables. To prove the basic properties of real log canonicalthresholds, we need to resolve the singularities of the variety ϕ = 0 together withthose cut out by f, g , . . . , g l , as demonstrated in Lemma 2.4. Example 2.8.
We now show that the threshold at a boundary point dependson the boundary inequalities. Consider the following two small neighborhoodsof the origin in some larger compact set.Ω = { ( x, y ) ∈ R : 0 ≤ x ≤ y ≤ ε } Ω = { ( x, y ) ∈ R : 0 ≤ y ≤ x ≤ ε } To compute the real log canonical threshold of the function xy over these sets,we have the corresponding zeta functions below. ζ ( z ) = Z ε Z y x − z y − z dx dy = ε − z +2 ( − z + 1)( − z + 2) ζ ( z ) = Z ε Z x x − z y − z dy dx = ε − z +2 ( − z + 1)( − z + 2)This shows that RLCT Ω ( xy ) = 2 / Ω ( xy ) = 1 / ⊂ R d is the minimumof thresholds at points x ∈ Ω, we want to know where this minimum is achieved.Let us study this problem topologically. Consider a locally finite collection S ofpairwise disjoint submanifolds S ⊂ Ω such that Ω = ∪ S ∈S S and each S is locallyclosed, i.e. the intersection of an open and a closed subset. Let S be the closureof S . We say S is a stratification of Ω if S ∩ T = ∅ implies S ⊂ T for all S, T ∈ S .A stratification S of Ω is a refinement of another stratification T if S ∩ T = ∅ implies S ⊂ T for all S ∈ S and T ∈ T .Let the amplitude ϕ : Ω → R be nearly analytic. Let S ( λ,θ ) , , . . . , S ( λ,θ ) ,r bethe connected components of the set { x ∈ Ω : RLCT Ω x ( f ; ϕ ) = ( λ, θ ) } , and let S denote the collection { S ( λ,θ ) ,i } where we vary over all λ , θ and i . Now, definethe order ord x f to be the smallest degree of a monomial appearing in a seriesexpansion of f at x ∈ Ω [10, § ω , . . . , ω d because it is the largest integer k such that f ∈ m kx where m x = { g ∈ A x : g ( x ) = 0 } is the vanishing ideal of x . Define T l, , . . . , T l,s to be the connected components of the set { x ∈ Ω : ord x f = l } and let T bethe collection { T l,j } where we vary over all l and j . We conjecture the followingrelationship between S and T . It implies that the minimum real log canonicalthreshold over a set must occur at a point of highest order. Conjecture 2.9.
The collections S and T are stratifications of Ω . Furthermore,if the amplitude ϕ is a positive smooth function, then S refines T . Laplace integrals such as (1) occur frequently in physics, statistics and otherapplications. At first, the relationship between their asymptotic expansions andthe zeta function (3) seems strange. The key is to write these integrals as Z ( N ) = Z Ω e − N | f ( ω ) | | ϕ ( ω ) | dω = Z ∞ e − Nt v ( t ) dtζ ( z ) = Z Ω (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω = Z ∞ t − z v ( t ) dt where v ( t ) is the state density function [28] or Gelfand-Leray function [2] v ( t ) = ddt Z < | f ( ω ) | 4] and Greenblatt [15].Using this strategy, we now give explicit formulas for the asymptotic expan-sion of an arbitrary Laplace integral. Our formulas generalize those of Arnol’d–Guse˘ın-Zade–Varchenko [2, § c α,i and the Laurent coefficients d α,i in termsof derivatives Γ ( i ) of Gamma functions. Theorem 2.10. Let Ω ⊂ R d be a compact semianalytic subset and ϕ : Ω → R be nearly analytic. If f ∈ A Ω with f ( x ) = 0 for some x ∈ Ω , the Laplace integral Z ( N ) = Z Ω e − N | f ( ω ) | | ϕ ( ω ) | dω has the asymptotic expansion X α d X i =1 c α,i N − α (log N ) i − . (10) The α in this expansion range over positive rational numbers which are poles of ζ ( z ) = Z Ω δ (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω (11) for any δ > and Ω δ = { ω ∈ Ω : | f ( ω ) | < δ } . The coefficients c α,i satisfy c α,i = ( − i ( i − d X j = i Γ ( j − i ) ( α )( j − i )! d α,j (12) where d α,j is the coefficient of ( z − α ) − j in the Laurent expansion of ζ ( z ) .Proof. First, set δ = 1. We split the integral Z ( N ) into two parts: Z ( N ) = Z | f ( ω ) | < e − N | f ( ω ) | | ϕ ( ω ) | dω + Z | f ( ω ) |≥ e − N | f ( ω ) | | ϕ ( ω ) | dω. The second integral is bounded above by Ce − N for some non-negative constant C , so asymptotically it goes to zero more quickly than any N − α . For the firstintegral, we write ζ ( z ) as the Mellin transform of the state density function v ( t ). ζ ( z ) = Z | f ( ω ) | < (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω = Z t − z v ( t ) dt. 11y Corollary 2.6, ζ ( z ) has a Laurent expansion (6). Since | f ( ω ) | < 1, by domin-ated convergence ζ ( z ) → z → −∞ , so the polynomial part P ( z ) is iden-tically zero. Applying the inverse Mellin transform [3] to ζ ( z ), we get a seriesexpansion (8) of the state density function v ( t ). Applying the Laplace transformto v ( t ) in turn gives the asymptotic expansion (7) of Z ( N ). The formulas Z ∞ e − Nt t α − (log t ) i dt ≈ i X j =0 (cid:18) ij (cid:19) ( − j Γ ( i − j ) ( α ) N − α (log N ) j Z t − z t α − (log t ) i dt = − i ! ( z − α ) − ( i +1) from [2, Thm 7.4] and [28, Ex 4.7] give us the relations c α,i = ( − i − d X j = i (cid:18) j − i − (cid:19) Γ ( j − i ) ( α ) b α − ,j , d α,j = − ( j − b α − ,j . Equation (12) follows immediately. Finally, for all other values of δ , we write Z Ω (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω = Z Ω δ (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω + Z | f ( ω ) |≥ δ (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω. The last integral does not have any poles, so the principal parts of the Laurentexpansions of the first two integrals are the same for all δ . In this section, we prove fundamental properties of real log canonical thresholds(RLCTs) which will allow us to calculate these thresholds more efficiently. Thelearning coefficient of a statistical model is shown to be the RLCT of the idealgenerated by its defining equations.In this section, let Ω ⊂ R d be a compact semianalytic subset and let ϕ : Ω → R be nearly analytic. Given functions f , . . . , f r ∈ A Ω , let RLCT Ω ( f , . . . , f r ; ϕ )be the smallest pole and multiplicity of the zeta function (4). Recall that thesepairs are ordered by the rule ( λ , θ ) > ( λ , θ ) if λ > λ , or λ = λ and θ < θ . For x ∈ Ω, we define RLCT Ω x ( f , . . . , f r ; ϕ ) to be the threshold for asufficiently small neighborhood Ω x of x in Ω. Remark 3.1. The (complex) log canonical threshold may be defined in a similarfashion. It is the smallest pole of the zeta function ζ ( z ) = Z Ω (cid:16) | f ( ω ) | + · · · + | f r ( ω ) | (cid:17) − z dω. Note that the f i have been replaced by | f i | and the exponent − z/ − z . Crudely, this factor of 2 comes from the fact that C d is a real vector spaceof dimension 2 d . The complex threshold is often different from the RLCT [23].12rom the algebraic geometry point of view, more is known about complex logcanonical thresholds than about real log canonical thresholds. Many results inthis paper were motivated by their complex analogs [6, 17, 18, 20].Now, we give several equivalent definitions of RLCT Ω ( f , . . . , f r ; ϕ ) whichare helpful in proofs of the fundamental properties. Proposition 3.2. Given functions f , . . . , f r ∈ A Ω such that each f i and V Ω ( h f , . . . , f r i ) is nonempty, the pairs ( λ, θ ) defined below are all equal.a. The logarithmic Laplace integral log Z ( N ) = log Z Ω exp (cid:16) − N r X i =1 f i ( ω ) (cid:17) | ϕ ( ω ) | dω is asymptotically − λ log N + ( θ − 1) log log N + O (1) .b. The zeta function ζ ( z ) = Z Ω (cid:16) r X i =1 f i ( ω ) (cid:17) − z/ | ϕ ( ω ) | dω has a smallest pole λ of multiplicity θ .c. The pair ( λ, θ ) is the minimum min x ∈ Ω RLCT Ω x ( f , . . . , f r ; ϕ ) . In fact, it is enough to vary x over V Ω ( h f , . . . , f r i ) .Proof. Item (b) is the original definition of the RLCT. The equivalence of (a)and (b) follows from Theorem 2.10, and that of (b) and (c) from Proposition2.5. The last statement of (c) follows from the fact that the RLCT is ∞ forpoints x / ∈ V Ω ( h f , . . . , f r i ). See also [28, Thm 7.1].Our first property describes the effect of the boundary on the RLCT. Proposition 3.3. Let x be a boundary point of Ω ⊂ R d . Then, for every neigh-borhood W of x in R d , RLCT W ( f ; ϕ ) ≤ RLCT Ω x ( f ; ϕ ) . Proof. For a sufficiently small neighborhood Ω x of x in Ω, we have Ω x ⊂ W , sothe corresponding Laplace integrals satisfy Z Ω x ( N ) ≤ Z W ( N ). By Proposition3.2, this gives the opposite inequality on the RLCTs.If the function whose RLCT we are finding is complicated, we may replaceit with a simpler function that bounds it. Given f, g ∈ A Ω , we say that f and g are equivalent in Ω if c f ≤ g ≤ c f in Ω for some c , c > roposition 3.4 ([28, Remark 7.2]) . Given f, g ∈ A Ω , suppose that ≤ cf ≤ g in Ω for some c > . Then, RLCT Ω ( f ; ϕ ) ≤ RLCT Ω ( g ; ϕ ) . Corollary 3.5. If f, g are equivalent in Ω , then RLCT Ω ( f ; ϕ ) = RLCT Ω ( g ; ϕ ) . RLCT Ω ( f + · · · + f r ; ϕ ) = ( λ, θ ) implies RLCT Ω ( f , . . . , f r ; ϕ ) = (2 λ, θ ).From this, it seems that we should restrict ourselves to RLCTs of single and notmultiple functions. However, as the next proposition shows, multiple functionsare important because they allow us to work with ideals for which different gen-erating sets can be chosen. This gives us freedom to switch between single andmultiple functions in powerful ways. For instance, special cases of this proposi-tion such as Lemmas 3 and 4 of [1] have been used to simplify computations. Proposition 3.6. If two sets { f , . . . , f r } and { g , . . . , g s } of functions generatethe same ideal I ⊂ A Ω , then RLCT Ω ( f , . . . , f r ; ϕ ) = RLCT Ω ( g , . . . , g s ; ϕ ) . Define this pair to be RLCT Ω ( I ; ϕ ) .Proof. Each g j can be written as a combination h f + · · · + h r f r of the f i wherethe h i are real analytic over Ω. By the Cauchy-Schwarz inequality, g j ≤ (cid:0) h + · · · + h r ) (cid:0) f + · · · + f r (cid:1) . Because Ω is compact, the h i are bounded. Thus, summing over all the g j , thereis some constant c > s X j =1 g j ≤ c r X i =1 f i . By Proposition 3.4, RLCT Ω ( g , . . . , g r ; ϕ ) ≤ RLCT Ω ( f , . . . , f r ; ϕ ) and by sym-metry, the reverse is also true, so we are done. See also [23, § f , . . . , f r ∈ A X and g , . . . , g s ∈ A Y where X ⊂ R m and Y ⊂ R n are compact semianalytic subsets. This occurs, for instance, whenthe f i and g j are polynomials with disjoint sets of indeterminates { x , . . . , x m } and { y , . . . , y n } . Let ϕ X : X → R and ϕ Y : Y → R be nearly analytic. Define( λ X , θ X ) = RLCT X ( f , . . . , f r ; ϕ X ) and ( λ Y , θ Y ) = RLCT Y ( g , . . . , g s ; ϕ Y ).By composing with projections X × Y → X and X × Y → Y , we may regardthe f i and g j as functions analytic over X × Y . Let I X and I Y be ideals in A X × Y generated by the f i and g j respectively. Recall that the sum I X + I Y is generatedby all the f i and g j while the product I X I Y is generated by f i g j for all i, j . Proposition 3.7. The RLCTs for the sum and product of ideals I X and I Y are RLCT X × Y ( I X + I Y ; ϕ X ϕ Y ) = ( λ X + λ Y , θ X + θ Y − , RLCT X × Y ( I X I Y ; ϕ X ϕ Y ) = ( λ X , θ X ) if λ X < λ Y , ( λ Y , θ Y ) if λ X > λ Y , ( λ X , θ X + θ Y ) if λ X = λ Y . roof. Define f ( x ) = f + · · · + f r and g ( y ) = g + · · · + g s , and let Z X ( N ) and Z Y ( N ) be the corresponding Laplace integrals. By Proposition 3.2,log Z X ( N ) = − λ X log N + ( θ X − 1) log log N + O (1)log Z Y ( N ) = − λ Y log N + ( θ Y − 1) log log N + O (1)asymptotically. If ( λ, θ ) = RLCT X × Y ( I X + I Y ; ϕ X ϕ Y ), then − λ log N + ( θ − 1) log log N + O (1)= log R X × Y e − Nf ( x ) − Ng ( y ) | ϕ X || ϕ Y | dx dy = log (cid:0) R X e − Nf ( x ) | ϕ X | dx (cid:1)(cid:0) R Y e − Ng ( y ) | ϕ Y | dy (cid:1) = log Z X ( N ) + log Z Y ( N )= − ( λ X + λ Y ) log N + ( θ X + θ Y − 2) log log N + O (1)and the first result follows. For the second result, note that f ( x ) g ( y ) = f g + f g + · · · + f r g s . Let ζ X ( z ) and ζ Y ( z ) be the zeta functions corresponding to f ( x ) and g ( y ). ByProposition 3.2, ( λ X , θ X ) and ( λ Y , θ Y ) are the smallest poles of ζ X ( z ) and ζ Y ( z )while RLCT X × Y ( I X I Y ; ϕ X ϕ Y ) is the smallest pole of ζ ( z ) = R X × Y (cid:0) f ( x ) g ( y ) (cid:1) − z/ | ϕ X || ϕ Y | dx dy = (cid:0) R X f ( x ) − z/ | ϕ X | dx (cid:1)(cid:0) R Y g ( y ) − z/ | ϕ Y | dy (cid:1) = ζ X ( z ) ζ Y ( z ) . The second result then follows from the relationship between the poles.Our last property tells us the behavior of RLCTs under a change of variables.Consider an ideal I ⊂ A W where W is a neighborhood of the origin. Let M be areal analytic manifold and ρ : M → W be a proper real analytic map. Then, the pullback ρ ∗ I is locally the ideal of real analytic functions on M that is generatedby f ◦ ρ for all f ∈ I (also called the inverse image ideal sheaf [19, § ρ isan isomorphism between M \ V ( ρ ∗ I ) and W \ V ( I ), we say that ρ is a change ofvariables away from V ( I ). Let | ρ ′ | denote the Jacobian determinant of ρ . Wecall ( ρ ∗ I ; ( ϕ ◦ ρ ) | ρ ′ | ) the pullback pair . Proposition 3.8. Let W be a neighborhood of the origin and I ⊂ A W a finitelygenerated ideal. If M is a real analytic manifold, ρ : M → W is a change ofvariables away from V ( I ) and M = ρ − (Ω ∩ W ) , then RLCT Ω ( I ; ϕ ) = min x ∈ ρ − (0) RLCT M x ( ρ ∗ I ; ( ϕ ◦ ρ ) | ρ ′ | ) . Proof. Let f , . . . , f r generate I and let f = f + · · · + f r . Then, RLCT Ω ( I ; ϕ )is the smallest pole and multiplicity of the zeta function ζ ( z ) = Z Ω f ( ω ) − z/ | ϕ ( ω ) | dω ⊂ W is a sufficiently small neighborhood of the origin in Ω. Applyingthe change of variables ρ , we have ζ ( z ) = Z ρ − (Ω ) f ◦ ρ ( µ ) − z/ | ϕ ◦ ρ ( µ ) || ρ ′ ( µ ) | dµ. The proof of Lemma 2.4 shows that if Ω is sufficiently small, there are finitelymany points y ∈ ρ − (0) and a cover {M y } of M = ρ − (Ω ) such that ζ ( z ) = X y Z M y f ◦ ρ ( µ ) − z/ | ϕ ◦ ρ ( µ ) || ρ ′ ( µ ) | σ y ( µ ) dµ where { σ y } is a partition of unity subordinate to {M y } . Furthermore, the f i ◦ ρ generate the pullback ρ ∗ I and f ◦ ρ = ( f ◦ ρ ) + · · · + ( f r ◦ ρ ) . Therefore,RLCT M y ( f ◦ ρ ; ( ϕ ◦ ρ ) | ρ ′ | σ y ) = RLCT M y ( ρ ∗ I ; ( ϕ ◦ ρ ) | ρ ′ | )and the result follows from the two previously displayed equations.We are now ready to prove Theorem 1.2 which was inspired by Watanabe. Proof of Theorem 1.2. Let Q ( ω ) = P ki =1 ( p i ( ω ) − q i ) . The learning coefficientis the RLCT of the Kullback-Leibler distance K ( ω ), so it is enough to show thatRLCT Ω x K = RLCT Ω x Q for each x ∈ V ( K ) = V ( Q ). By Corollary 3.5, we onlyneed to show that K and Q are equivalent in a sufficiently small neighborhoodof x . Now, the Taylor expansion − log t = (1 − t ) + (1 − t ) + · · · implies thereare constants c , c > t near 1, c ( t − ≤ − log t + t − ≤ c ( t − . (13)Choosing a sufficiently small W x such that p i ( ω ) /q i is near 1, we have c ( p i ( ω ) q i − ≤ − log p i ( ω ) q i + p i ( ω ) q i − ≤ c ( p i ( ω ) q i − for all ω ∈ W x . Multiplying by q i , summing from i = 1 to k and observing thatthe p i and the q i add up to 1, we get c k X i =1 q i (cid:16) p i ( ω ) q i − (cid:17) ≤ K ( ω ) ≤ c k X i =1 q i (cid:16) p i ( ω ) q i − (cid:17) . Again, using the fact that the q i are non-zero, we have c max i q i k X i =1 (cid:0) p i ( ω ) − q i (cid:1) ≤ K ( ω ) ≤ c min i q i k X i =1 (cid:0) p i ( ω ) − q i (cid:1) which completes the claim. The more general statement for a real analytic K ( ω )which is bounded by scalar multiples of a sum of squared functions follows fromProposition 2.5, Corollary 3.5 and the definition of RLCT Ω ( I ; ϕ ).16 Newton Polyhedra and Nondegeneracy Given an analytic function f ∈ A ( R d ), we pick local coordinates { w , . . . , w d } in a neighborhood of the origin. This allows us to represent f as a power series P α c α ω α where ω = ( ω , . . . , ω d ) and each α = ( α , . . . , α d ) ∈ N d . Let [ ω α ] f denote the coefficient c α of ω α in this expansion. Define its Newton polyhedron P ( f ) ⊂ R d to be the convex hull P ( f ) = conv { α + α ′ : [ ω α ] f = 0 , α ′ ∈ R d ≥ } . A subset γ ⊂ P ( f ) is a face if there exists β ∈ R d such that γ = { α ∈ P ( f ) : h α, β i ≤ h α ′ , β i for all α ′ ∈ P ( f ) } . where h , i is the standard dot product. Dually, the normal cone at γ is the set ofall β ∈ R d satisfying the above condition. Each β lies in the non-negative orthant R d ≥ because otherwise, the linear function h · , β i does not have a minimum overthe unbounded set P ( f ). As a result, the union of all the normal cones gives apartition F ( f ) of the non-negative orthant called the normal fan . Now, given acompact subset γ ⊂ R d , define the face polynomial f γ = X α ∈ γ ∩ N d c α ω α . Recall that f γ is singular at a point x ∈ R d if ord x f γ ≥ 2, i.e. f γ ( x ) = ∂f γ ∂ω ( x ) = · · · = ∂f γ ∂ω d ( x ) = 0 . We say that f is nondegenerate if f γ is non-singular at all points in the torus( R ∗ ) d for all compact faces γ of P ( f ), otherwise we say f is degenerate . Now,we define the distance l of P ( f ) to be the smallest t ≥ t, t, . . . , t ) ∈P ( f ). Let the multiplicity θ of l be the codimension of the lowest-dimensionalface of P ( f ) at this intersection of the diagonal with P ( f ). However, if l = 0,we leave θ undefined. These notions of nondegeneracy, distance and multiplicitywere first coined and studied by Varchenko [25].We now extend the above notions to ideals. For any ideal I ⊂ A , define P ( I ) = conv { α ∈ R d : [ ω α ] f = 0 for some f ∈ I } . Related to this geometric construction is the monomial idealmon( I ) = h ω α : [ ω α ] f = 0 for some f ∈ I i . Note that I and mon( I ) have the same Newton polyhedron, and if I is generatedby f , . . . , f r , then mon( I ) is generated by monomials ω α appearing in the f i .One consequence is that P ( f + · · · + f r ) is the scaled polyhedron 2 P ( I ). Moreimportantly, the threshold of I is bounded by that of mon( I ). To prove thisresult, we need the following lemma. Recall that by the Hilbert Basis Theoremor by Dickson’s Lemma [10], mon( I ) is finitely generated.17 emma 4.1. Given f ∈ A ( R d ) , let S be a finite set of exponents α of mono-mials ω α which generate mon( h f i ) . Then, there is a constant c > such that | f ( ω ) | ≤ c X α ∈ S | ω | α in a sufficiently small neighborhood of the origin.Proof. Let P α c α ω α be the power series expansion of f . Because f is analyticat the origin, there exists ε > X α | c α | ε α + ··· + α d < ∞ . Let S = { α (1) , . . . , α ( s ) } where the monomials ω α ( i ) generate mon( h f i ). Then, f ( ω ) = ω α (1) g ( ω ) + · · · + ω α ( s ) g s ( ω )for some power series g i ( ω ). Each series g i ( ω ) is absolutely convergent in the ε -neighborhood U of the origin because f is absolutely convergent in U . Thus, the g i ( ω ) are analytic. Their absolute values are bounded above by some constant c in U , and the lemma follows.Below, RLCT ( I ; ϕ ) denotes the RLCT of I with respect to ϕ at the origin. Proposition 4.2. Let I ⊂ A be a finitely generated ideal and ϕ : R d → R benearly analytic at the origin. Then, RLCT ( I ; ϕ ) ≤ RLCT (mon( I ); ϕ ) . Proof. Given f ∈ A ( R d ), let S be a finite set of generating exponents α formon( h f i ). By Lemma 4.1 and the Cauchy-Schwarz inequality, there exist con-stants c, c ′ > f ≤ (cid:16) c X α ∈ S | ω | α (cid:17) ≤ c ′ X α ∈ S ω α in a sufficiently small neighborhood of the origin. Therefore, if f , . . . , f r gener-ate I , then f + . . . + f r is bounded by a constant multiple of the sum of squares ofmonomials generating mon( I ). The result now follows from Propostion 3.4.Given a compact subset γ ⊂ R d , define the face ideal I γ = h f γ : f ∈ I i . The next result tells us how to compute I γ for an ideal I = h f , . . . , f r i . Proposition 4.3. For all compact faces γ ∈ P ( I ) , I γ = h f γ , . . . , f rγ i . roof. By definition, h f γ , . . . , f rγ i ⊂ I γ . For the other inclusion, it is enough toshow that f γ ∈ h f γ , . . . , f rγ i for all f ∈ I . First, we claim that if ω α = ω α ′ ω α ′′ with α ∈ γ and ω α ′ ∈ mon( I ), then ω α ′′ = 1. Indeed, for all β ∈ R d ≥ in thenormal cone dual to γ , we have h α, β i = h α ′ , β i + h α ′′ , β i , but h α, β i ≤ h α ′ , β i so h α ′′ , β i = 0. This implies that α ′ + kα ′′ ∈ γ for all integers k > 0. Since γ iscompact, α ′′ must be the zero vector so ω α ′′ = 1.Now, if f ∈ I , then f = h f + · · · + h r f r for some analytic functions h , . . . , h r . Clearly, f γ = ( h f ) γ + · · · + ( h r f r ) γ . By the above claim, ( h i f i ) γ = h i f iγ where h i is the constant term in h i . Hence, f γ = h f γ + · · · + h r f rγ ∈h f γ , . . . , f rγ i as required. Remark 4.4. Let β be a vector in the normal cone dual to the face γ of P ( I ).Now, consider the weight order associated to β , and let in β f be the sum of allthe terms of f that are maximal with respect to this (partial) order [10, § β I be the initial ideal in β I = h in β f : f ∈ I i . A set of functions f , . . . , f r ∈ I is a Gr¨obner basis for I if and only ifin β I = h in β f , . . . , in β f r i . Comparing this statement with the previous result, one could ask why the gen-erators f , . . . , f r of I need not be a Gr¨obner basis for Proposition 4.3 to hold.This confusion comes from incorrectly equating the face ideal I γ with the initialideal in β I when we only have containment I γ ⊂ in β I . For instance, if I = h f , f , f i = h xy − z , xz − y , yz − x i and γ is the convex hull of { (1 , , , (1 , , , (0 , , } , then I γ = h xy, xz, yz i .Meanwhile, β = (1 , , 1) and in β I contains y − z = zf − yf but y − z / ∈ I γ .Lastly, we give several equivalent definitions of nondegeneracy for ideals. Ifan ideal I satisfies these conditions, then we say that I is sos-nondegenerate ,where sos stands for sum-of-squares . Proposition 4.5. Let I ⊂ A be an ideal. The following are equivalent:1. For some generating set { f , . . . , f r } for I , f + · · · + f r is nondegenerate.2. For all generating sets { f , . . . , f r } for I , f + · · · + f r is nondegenerate.3. For all compact faces γ ⊂ P ( I ) , the variety V ( I γ ) ⊂ R d does not intersectthe torus ( R ∗ ) d .Proof. Let f , . . . , f r generate I and let f = f + · · · + f r . If γ is a compactface of P ( I ), then the set (2 γ ) is a compact face of P ( f ) = 2 P ( I ). Furthermore, f (2 γ ) = f γ + · · · + f rγ and ∂f (2 γ ) ∂ω i = 2 f γ ∂f γ ∂ω i + · · · + 2 f rγ ∂f rγ ∂ω i . Now, f γ + · · · + f rγ = 0 if and only if f γ = · · · = f rγ = 0. It follows that f isnondegenerate if and only if V ( h f γ , . . . , f rγ i ) ∩ ( R ∗ ) d = V ( I γ ) ∩ ( R ∗ ) d = ∅ forall compact faces γ ⊂ P ( I ). This proves (1) ⇔ (3) and (2) ⇔ (3).19 emark 4.6. The nondegeneracy of a function f need not imply the sos-non-degeneracy of the ideal h f i , e.g. f = x + y . Remark 4.7. After finishing this paper, we discovered another notion of non-degeneracy for ideals of complex formal power series due to Saia [22], which wasshown to be equivalent to the complex version of Proposition 4.5(3) [5, § σ isgenerated by vectors v , . . . , v k ∈ R d if σ = { P i λ i v i : λ i ≥ } . If σ is generatedby lattice vectors v i ∈ Z d , then σ is rational . If the origin is a face of σ , then σ is pointed . A ray is a pointed one-dimensional cone. Every rational ray hasa lattice generator of minimal length called the minimal generator , and everypointed rational polyhedral cone σ is generated by the minimal generators ofits one-dimensional faces. If these minimal generators are linearly independentover R , then σ is simplicial . A simplicial cone is smooth if its minimal generatorsalso form part of a Z -basis of Z d . A collection F of pointed rational polyhedralcones in R d is a fan if the faces of every cone in F are in F and the intersectionof any two cones in F are again in F . The support of F is the union of itscones as subsets of R d . If the support of F is the non-negative orthant, then F is locally complete . If every cone of F is simplicial (resp. smooth), then F is simplicial (resp. smooth ). A fan F is a refinement of another fan F if thecones of F come from partitioning the cones of F . See [12] for more details.Given a smooth simplicial locally complete fan F , we have a smooth toricvariety P ( F ) covered by open charts U σ ≃ R d , one for each cone σ of F that ismaximal under inclusion. Furthermore, the blow-up ρ F : P ( F ) → R d is definedas follows: for each maximal cone σ of F minimally generated by v , . . . , v d with v i = ( v i , . . . , v id ), we have monomial maps ρ σ : U σ → R d on the open charts.( µ , . . . , µ d ) ( ω , . . . , ω d ) ω = µ v µ v · · · µ v d d ω = µ v µ v · · · µ v d d ... ω d = µ v d µ v d · · · µ v dd d Let v = v σ be the matrix ( v ij ) where each minimal generator v i forms a row of v . We represent the above monomial map by ω = µ v . If v i + represents the i -throw sum of v , the Jacobian determinant of this map is(det v ) µ v − · · · µ v d + − d . We are now ready to connect these concepts. The next two theorems are dueto Varchenko, see [25] and [2, § f γ = 0, but his proof [2, Lemma 8.9]actually supports the stronger notion. The set up is as follows: suppose f isanalytic in a neighborhood W of the origin. Let F be any smooth simplicialrefinement of the normal fan F ( f ) and ρ F be the blow-up associated to F . Set M = ρ − F ( W ). Let l be the distance of P ( f ) and θ its multiplicity.20 heorem 4.8. ( M, W, ρ F ) desingularizes f at if f is nondegenerate. Theorem 4.9. RLCT f = (1 /l, θ ) if ( M, W, ρ F ) desingularizes f at . We extend Theorem 4.9 to compute RLCT ( f ; ω τ ) for monomials ω τ . Givena polyhedron P ( f ) ⊂ R d and a vector τ = ( τ , . . . , τ d ) of non-negative integers,let the τ -distance l τ be the smallest t ≥ t ( τ + 1 , . . . , τ d + 1) ∈ P ( f )and let the multiplicity θ τ be the codimension of the face at this intersection. Theorem 4.10. RLCT ( f ; ω τ ) = (1 /l τ , θ τ ) if ( M, W, ρ F ) desingularizes f at .Proof. We follow roughly the proof in [2, § 8] of Theorem 4.9. Let σ be a maximalcone of F . Because F refines F ( f ), σ is a subset of some maximal cone σ ′ of F ( f ). Let α ∈ R d be the vertex of P ( f ) dual to σ ′ . Let v be the matrix whoserows are minimal generators of σ and ρ the monomial map µ µ v . Under thismap, the term c α w α in f becomes the leading monomial, so f ( ρ ( µ )) = g ( µ ) µ vα for some function g satisfying g ( µ ) = 0 for all µ ∈ U σ . Then, | f ( ω ) | − z | ω τ | dω = | f ( ρ ( ν )) | − z | ρ ( µ ) | τ | ρ ′ ( µ ) | dµ = (det v ) | g ( µ ) | − z | µ | − vαz | µ vτ µ v − · · · µ v d + − d | Thus, for the cone σ ,( λ σ , θ σ ) = (min S, S ) , S = n v i · ( τ + 1) v i · α : 1 ≤ i ≤ d o where τ +1 = ( τ +1 , . . . , τ d +1). We now give an interpretation for the elementsof S . Fixing i , let P be the affine hyperplane normal to v i passing through α .Then, ( v i · α ) / ( v i · ( τ + 1)) is the distance of P from the origin along the ray { t ( τ + 1) : t ≥ } . Since RLCT ( f ; ω τ ) = min σ ( λ σ , θ σ ), the result follows. Remark 4.11. After finishing this paper, the author discovered that a similarresult was proved by Vasil’ev [26] for complex analytic functions.Monomial ideals play in special role in the theory of real log canonical thresh-olds of ideals. The proof of this next result is due to Piotr Zwiernik. Proposition 4.12. Monomial ideals are sos-nondegenerate.Proof. Let f = f + · · · + f r where f , . . . , f r are monomials generating I . Foreach face γ of P ( I ), f γ is also a sum of squares of monomials, so f γ does nothave any zeros in ( R ∗ ) d and the result now follows from Proposition 4.5(3).Our tools now allow us to prove Theorem 1.3. As a special case, we have aformula for the RLCT of a monomial ideal with respect to a monomial ampli-tude function. The analogous formula for complex log canonical thresholds ofmonomial ideals was discovered and proved by Howald [17]. Proof of Theorem 1.3. If the ideal I is sos-nondegenerate, then the equality fol-lows from Proposition 4.5, Theorem 4.8 and Theorem 4.10. For all other ideals,the inequality is the result of Proposition 4.2 and Proposition 4.12.21 emark 4.13. Define the principal part f P of f to be P α c α ω α where the sumis over all α lying in some compact face γ of P ( f ). The above theorems implythat if f is nondegenerate, then RLCT f = RLCT f P . However, the latter isnot true in general. For instance, if f = ( x + y ) + y , then f P = ( x + y ) butRLCT f = (3 / , 1) and RLCT f P = (1 / , Corollary 4.14. If f ∈ A ( R d ) has a local minimum at the origin with f (0) = 0 and its Hessian ( ∂ f /∂ω i ∂ω j ) is full rank, then RLCT f = ( d/ , .Proof. Because its Hessian is full rank, there is a linear change of variables suchthat f = ω + · · · + ω d + O ( ω ). Thus, f is nondegenerate and the Newtonpolyhedron P ( f ) has distance l = 2 /d with θ = 1. Corollary 4.15. Let I be the ideal h f , . . . , f s i , and suppose the Jacobian matrix ( ∂f i /∂ω j ) has rank r at . Then, RLCT I ≤ ( ( r + d ) , .Proof. Because the rank of ( ∂f i /∂ω j ) is r , there is a linear change of variablessuch that the only linear monomials appearing in I are ω , . . . , ω r . It followsthat P ( I ) lies in the halfspace α + · · · + α r + ( α r +1 + · · · + α d ) ≥ / ( r + d − r ) = 2 / ( r + d ). In this section, we use our tools to compute the learning coefficients of a na¨ıveBayesian network M with two ternary random variables and two hidden states.It was designed by Evans, Gilula and Guttman [11] for investigating connectionsbetween the recovery time of 132 schizophrenic patients and the frequency ofvisits by their relatives. Their data is summarized in the 3 × ≤ Y < 10 10 ≤ Y < 20 20 ≤ Y Totals Visited regularly 43 16 3 Visited rarely 6 11 10 Visited never 9 18 16 which we store as a 3 × q of relative frequencies. The model is given by p : Ω = ∆ × ∆ × ∆ × ∆ × ∆ → ∆ ω = ( t, a , a , b , b , c , c , d , d ) ( p ij ) p ij = ta i b j + (1 − t ) c i d j , i, j ∈ { , , } where a = 1 − a − a , a = ( a , a , a ) ∈ ∆ and similarly for b, c and d . Hence, a3 × I = Z Ω p p p p p p p p p dω which was computed exactly by Sturmfels, Xu and the author [21].We now estimate this integral using Watanabe’s asymptotic formula for thelog likelihood integral in Theorem 1.1. We assume that the data ˆ q was generatedby some true distribution q = ( q ij ) ∈ R × in the model. Ideally, we want q tobe equal to the matrix ˆ q of relative frequencies, but in general, the data ˆ q rarelylies in the model. In this example, the matrix ˆ q is not in the model because itis full rank. However, we should be able to find a distribution q in the modelthat is close to ˆ q , because in practice, we want to study models which describethe data well. A good candidate for q is the maximum likelihood distribution.Using the EM algorithm, this distribution is q = 1132 . . . . . . . . . which comes from the maximum likelihood estimate t = 0 . a , a ) = (0 . , . , ( b , b ) = (0 . , . , ( c , c ) = (0 . , . , ( d , d ) = (0 . , . . Note that the ML distribution q is indeed very close to the data ˆ q .Our next theorem summarizes how the asymptotics of log Z ( N ) depend on q .Let S i denote the set of rank i matrices in p (Ω), and S ∗ i ⊂ S i be the matrices withpositive entries. Before we prove this theorem, let us apply it to our statisticalproblem. Using the exact value of I computed by Lin–Sturmfels–Xu [21], we get( log I ) exact = − . . Meanwhile, if the BIC was erroneously applied with the dimension d = 9 of theparameter space, we would have( log I ) BIC = − . . On the other hand, by calculating the real log canonical threshold of the poly-nomial ideal h p ( ω ) − q i , we find that the learning coefficient of the model at theML distribution q is ( λ, θ ) = (7 / , I ) RLCT ≈ − . I .Our proposal to use the ML distribution as the true distribution q is admit-tedly simplistic, given that noise in the data will almost surely bring us to some q ∈ S ∗ . Nonetheless, our next theorem proves that the learning coefficient isalways smaller than the (9 / , 1) prescribed by the BIC. For deeper statisticaldiscussions, the reader should turn to Drton and Plummer [8] where they ad-dressed the paradox of circular reasoning in requiring true parameter values forthe asymptotic approximation of Bayesian integrals. They also proposed a novelalgorithm where the marginal likelihood is estimated as a weighted average ofthe contributions from all true distributions. We hope that mathematical anal-yses such as our next theorem will help inform these kinds of discussions, andprovide useful estimates and bounds for a variety of statistical computations. Theorem 5.1. The learning coefficient ( λ, θ ) of the model at q > is given by ( λ, θ ) = (cid:26) (5 / , if q ∈ S ∗ , (7 / , if q ∈ S ∗ . Therefore, asymptotically as N → ∞ , log Z ( N ) = N X i,j ˆ q ij log q ij − λ log N + ( θ − 1) log log N + η N where ˆ q is the matrix of relative frequencies of the data and η N is a randomvariable whose expectation E [ η N ] converges to a constant. We postpone the proof of this theorem to the end of the section. Let us beginwith a few remarks about our approach to this problem. Firstly, Theorem 1.2states that the learning coefficient ( λ, θ ) of the statistical model is given by(2 λ, θ ) = min ω ∗ ∈V RLCT Ω ω ∗ h p ( ω ) − q i where V is the fiber p − ( q ) = { ω ∈ Ω : p ( ω ) = q } over q . Instead of focusing ona fixed q and its fiber V , let us vary the parameter ω ∗ over all of Ω. For each ω ∗ ∈ Ω, we translate Ω so that ω ∗ is the origin and compute the RLCT of theideal h p ( ω + ω ∗ ) − p ( ω ∗ ) i . This is the content of Proposition 5.2. The proof ofTheorem 5.1 will then consist of minimizing these RLCTs over the fiber V foreach q in the model.Secondly, in our computations, we will often be choosing different generatorsfor our ideal and making appropriate changes of variables. Generators with fewterms and small total degree are often highly desired. Another useful trick is tomultiply or divide the generators by functions f ( ω ) satisfying f (0) = 0. Suchfunctions are units in the ring A of real analytic functions so this multiplicationor division will not change the ideal generated.We will perform many of the computations by hand to demonstrate how thevarious properties from Section 3 can be applied. At points in the proof whereRLCTs of monomial ideals are required, the Singular library from Section 124omes in useful. We hope that some day the computation of learning coefficientsfor statistical models will be fully automated.Thirdly, for the full proof of Proposition 5.2, we will have to analyze interac-tions between the model singularities and the boundary of the parameter space.Some of these interactions are messy. To improve the readability of the paper,we moved the detailed proof to the appendix while retaining some interestingcomputations in this section.Finally, we come to our main proposition. Let us define the following subsetsof Ω. These subsets stratify Ω according to the real log canonical threshold inthe manner described in Conjecture 2.9.Ω u = { ω ∗ ∈ Ω : t ∗ ∈ { , }} Ω m = { ω ∗ ∈ Ω : t ∗ / ∈ { , }} Ω m = { ω ∗ ∈ Ω m : a ∗ = c ∗ , b ∗ = d ∗ } Ω m kl = { ω ∗ ∈ Ω m : { i : a ∗ i = 0 } = k, { i : b ∗ i = 0 } = l } Ω m = { ω ∗ ∈ Ω m : ( b ∗ = d ∗ , a ∗ = c ∗ ) or ( a ∗ = c ∗ , b ∗ = d ∗ ) } Ω m = { ω ∗ ∈ Ω m : ( a ∗ = c ∗ , ∃ i a ∗ i = 0) or ( b ∗ = d ∗ , ∃ i b ∗ i = 0) } Ω m = { ω ∗ ∈ Ω m : a ∗ = c ∗ , b ∗ = d ∗ } Ω m ad = { ω ∗ ∈ Ω m : ∃ i, j a ∗ i = d ∗ j = 0 , c ∗ i = 0 , b ∗ j = 0 } Ω m bc = { ω ∗ ∈ Ω m : ∃ i, j b ∗ i = c ∗ j = 0 , d ∗ i = 0 , a ∗ j = 0 } Ω m = Ω m ad ∪ Ω m bc Ω m = Ω m ad ∩ Ω m bc . Proposition 5.2. Given ω ∗ ∈ Ω , let I be the ideal h p ( ω + ω ∗ ) − p ( ω ∗ ) i . Then, RLCT I = (5 , if ω ∗ ∈ Ω u , (6 , if ω ∗ ∈ Ω m , (6 , if ω ∗ ∈ Ω m ∪ Ω m ∪ Ω m ∪ Ω m , (7 , if ω ∗ ∈ Ω m , (7 , if ω ∗ ∈ Ω m ∪ Ω m , (8 , if ω ∗ ∈ Ω m , (6 , if ω ∗ ∈ Ω m \ Ω m , (7 , if ω ∗ ∈ Ω m , (7 , if ω ∗ ∈ Ω m \ Ω m , (8 , if ω ∗ ∈ Ω m \ Ω m , (9 , if ω ∗ ∈ Ω m . Proof Idea. We give a shortened analysis that ignores the effect of the boundaryof Ω on the RLCTs. The derived RLCTs will be smaller than the actual ones byProposition 3.3. A full proof involving boundary effects is given in the appendix.Our ideal I is generated by g ij = f ij ( ω + ω ∗ ) − f ij ( ω ∗ ) where f ij = ta i b j + (1 − t ) c i d j , i, j ∈ { , , } and a = b = c = d = 1. One can check that I is also generated by g , g ,25 , g , and g ij − ( d j + d ∗ j ) g i − ( a i + a ∗ i ) g j , i, j ∈ { , } which expand to give c ( t ∗ − t ) + a ( t ∗ + t ) + tu ∗ c ( t ∗ − t ) + a ( t ∗ + t ) + tu ∗ d ( t ∗ − t ) + b ( t ∗ + t ) + tv ∗ d ( t ∗ − t ) + b ( t ∗ + t ) + tv ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ where t ∗ = t ∗ , t ∗ = 1 − t ∗ , u ∗ i = a ∗ i − c ∗ i , v ∗ i = b ∗ i − d ∗ i . Note that P ( a i + a ∗ i ) = 1and P a ∗ i = 1 so P a i = 0 and similarly for b, c, d . Also, P u ∗ i = P a ∗ i − c ∗ i = 0.The same is true for v ∗ . We now do a case-by-case analysis. Case 1: ω ∗ ∈ Ω m . This implies t ∗ = 0 and t ∗ = 0. Since the indeterminates b , b , c , c appearonly in the first four polynomials, this suggests the change of variables c i = ( c ′ i − tu ∗ i − a i ( t ∗ + t )) / ( t ∗ − t ) , i = 1 , b i = ( b ′ i − tv ∗ i − d i ( t ∗ − t )) / ( t ∗ + t ) , i = 1 , t, a , a , b ′ , b ′ , c ′ , c ′ , d , d . In view of Proposition 3.8,the Jacobian determinant of this substitution is a constant, while the pullbackideal can be written as I + I where I = h b ′ , b ′ , c ′ , c ′ i and I is generated by a d − a t ∗ v ∗ + d t ∗ u ∗ ,a d − a t ∗ v ∗ + d t ∗ u ∗ ,a d − a t ∗ v ∗ + d t ∗ u ∗ ,a d − a t ∗ v ∗ + d t ∗ u ∗ . The indeterminates in I and I are disjoint, so we may apply Proposition 3.7.The RLCT of I is (4 , I . Case 1.1: ω ∗ ∈ Ω m . This implies u ∗ = 0 , v ∗ = 0 or u ∗ = 0 , v ∗ = 0. Without loss of generality, weassume v ∗ = 0 , u ∗ = 0 , u ∗ = 0 ( u ∗ + u ∗ + u ∗ = 0 so at most one of them is zero)and substitute d i = ( d ′ i + a t ∗ v ∗ i ) / ( t ∗ u ∗ + a ) , i = 1 , . The resulting pullback of I is h d ′ , d ′ i . If ω ∗ lies in the interior of Ω, we use eitherNewton polyhedra or Proposition 3.7 to show that the RLCT of this monomialideal is (2 , ω ∗ lies on the boundary of Ω, the situation is more complicatedand we analyze it in detail in the appendix. Case 1.2: ω ∗ ∈ Ω m . u ∗ = 0 , v ∗ = 0. Without loss of generality, suppose that u ∗ = 0.If ω ∗ ∈ Ω m , we further assume that a ∗ = d ∗ j = 0 , u ∗ = 0 , v ∗ j = 0. Substituting d i = ( d ′ i + a t ∗ v ∗ i ) / ( a + t ∗ u ∗ ) , i = 1 , a = ( a ′ + a u ∗ ) /u ∗ , the pullback ideal is h a ′ , d ′ , d ′ i so the RLCT at an interior point is (3 , Case 1.3: ω ∗ ∈ Ω m . This implies u ∗ i = v ∗ i = 0 for all i . The pullback ideal can be written as h a , a ih d , d i whose RLCT over an interior point of Ω is (2 , 2) by Proposition 3.7. Case 2: ω ∗ ∈ Ω u . Without loss of generality, assume t ∗ = 0 and substitute c i = ( c ′ i − t ( a i + u ∗ i )) / (1 − t ) i = 1 , d i = ( d ′ i − t ( b i + v ∗ i )) / (1 − t ) i = 1 , . The pullback ideal is the sum of h c ′ , c ′ , d ′ , d ′ i and h t ih a + u ∗ , a + u ∗ ih b + v ∗ , b + v ∗ i . The RLCT of the first summand is (4 , h t i is (1 , 1) while that of h a + u ∗ , a + u ∗ i and h b + v ∗ , b + v ∗ i are at least (2 , 1) each. By Proposition 3.7,the RLCT of their product is (1 , 1) and that of the pullback ideal is (5 , Proof of Theorem 5.1. Given a matrix q = ( q ij ), the learning coefficient ( λ, θ )of the model at q is the minimum of RLCTs at points ω ∗ ∈ Ω where p ( ω ∗ ) = q .The theorem then follows from Proposition 5.2, Theorem 1.1 and the claims p (Ω u ) = S , p (Ω m ) ⊂ S , p (Ω m ) ⊂ S , p (Ω m ) / ∈ S ∗ . These four claims are easy to check from the definitions of the subsets of Ω. In this section, we give a full proof of Proposition 5.2 that considers the effect ofthe boundary of the parameter space Ω on the RLCTs. The next lemma comesin handy in dealing with boundary issues. It helps us in computing the RLCTsof monomial ideals at boundary points where the parameter space contains anice neighborhood Ω × Ω . Here, Ω is an orthant in the coordinates involvedin the monomials of I , while Ω is a small cone in the remaining coordinates.27 emma 6.1. Let Ω ⊂ { ( x , . . . , x d ) ∈ R d } be semianalytic. Let I be a monomialideal and ϕ a monomial function in x , . . . , x r . If there exists a vector ξ ∈ R d − r such that Ω × Ω ⊂ Ω for sufficiently small ε , Ω = { ( x , . . . , x r ) ∈ [0 , ε ] r } Ω = { ( x r +1 , . . . , x d ) = t ( ξ + ξ ′ ) for t ∈ [0 , ε ] , ξ ′ ∈ [ − ε, ε ] d − r } , then RLCT Ω ( I ; ϕ ) = RLCT ( I ; ϕ ) .Proof. Because I and | ϕ | remain unchanged by the flipping of signs of x , . . . , x r ,their threshold does not depend on the choice of orthant, so RLCT Ω ( I ; ϕ ) =RLCT ( I ; ϕ ). The lemma now follows from Proposition 3.7 and the fact thatthe threshold of the zero ideal over the cone neighborhood Ω is ( ∞ , − ). Detailed Proof of Proposition 5.2. Recall that the ideal I is generated by c ( t ∗ − t ) + a ( t ∗ + t ) + tu ∗ c ( t ∗ − t ) + a ( t ∗ + t ) + tu ∗ d ( t ∗ − t ) + b ( t ∗ + t ) + tv ∗ d ( t ∗ − t ) + b ( t ∗ + t ) + tv ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ We do a case-by-case analysis of the structure of I and the boundary of Ω. Case 1: ω ∗ ∈ Ω m . This implies t ∗ = 0 and t ∗ = 0. Since the indeterminates b , b , c , c appearonly in the first four polynomials, this suggests the change of variables c i = ( c ′ i − tu ∗ i − a i ( t ∗ + t )) / ( t ∗ − t ) , i = 1 , b i = ( b ′ i − tv ∗ i − d i ( t ∗ − t )) / ( t ∗ + t ) , i = 1 , t, a , a , b ′ , b ′ , c ′ , c ′ , d , d . In view of Proposition 3.8,the Jacobian determinant of this substitution is a constant. Case 1.1: ω ∗ ∈ Ω m . This implies u ∗ = 0 , v ∗ = 0 or u ∗ = 0 , v ∗ = 0. Without loss of generality, weassume v ∗ = 0 , u ∗ > d i = ( d ′ i + a t ∗ v ∗ i ) / ( t ∗ u ∗ + a ) , i = 1 , . The resulting pullback ideal is h b ′ , b ′ , c ′ , c ′ , d ′ , d ′ i . If ω ∗ lies in the interior ofΩ, we use either Newton polyhedra or Proposition 3.7 to show that the RLCTof this monomial ideal is (6 , ω ∗ lies on the boundary of Ω, the situationis more complicated. Since we are considering a subset of a neighborhood of ω ∗ , the corresponding Laplace integral from Proposition 3.2a is smaller so the28hreshold is at least (6 , − u ∗ = u ∗ + u ∗ , we cannot have u ∗ = u ∗ = 0. Suppose u ∗ = 0 and u ∗ = 0. We consider a blowup where one of the charts is given by the monomialmap t = s, a i = sa ′ i , c ′ = rs, c ′ = rsc ′′ , b ′ i = rsb ′′ i , d ′ i = rsd ′′ i . Here, the pullbackpair is ( h rs i ; r s ). Now, we study the inequalities which are active at ω ∗ . Forinstance, if b ∗ = 0, then ω ∗ lies on the boundary defined by 0 ≤ b + b ∗ . Afterthe various changes of variables, the inequalities are as shown below, where b ′′ = − b ′′ − b ′′ and similarly for c ′′ , d ′′ and a ′ . Note that the inequality for a ∗ = 0 is omitted because a ∗ = 0 implies u ∗ = − c ∗ ≤ 0. Similar conditions onthe u ∗ i , v ∗ i hold for the other inequalities. b ∗ i = 0 : 0 ≤ rs ( b ′′ i − d ′′ i ( t ∗ − s ) / ( t ∗ u ∗ + sa ′ )) / ( t ∗ + s ) d ∗ i = 0 : 0 ≤ rsd ′′ i / ( t ∗ u ∗ + sa ′ ) c ∗ = 0 : 0 ≤ s ( − u ∗ + a ′ ( t ∗ + s ) + r ) / ( t ∗ − s ) c ∗ = 0 : 0 ≤ s ( − u ∗ + a ′ ( t ∗ + s ) + rc ′′ ) / ( t ∗ − s ) u ∗ > c ∗ = 0 : 0 ≤ s ( − u ∗ + a ′ ( t ∗ + s ) − r − rc ′′ ) / ( t ∗ − s ) u ∗ > a ∗ = 0 : 0 ≤ sa ′ u ∗ < a ∗ = 0 : 0 ≤ sa ′ u ∗ < b ∗ = b ∗ = 0, we choose coordinates b ′′ and b ′′ and set b ′′ = − b ′′ − b ′′ . The sameis done for the d ′′ i . The pullback pair is unchanged by these choices. Now, withcoordinates ( r, s ) and ( b ′′ i , b ′′ i , d ′′ j , d ′′ j , c ′′ , a ′ , a ′ ), we apply the lemma with thevector ξ = (2 , , u ∗ , u ∗ , , , ( rs ; r s ) = (6 , u ∗ , u ∗ is zero, suppose u ∗ = 0 , u ∗ = 0 without loss ofgenerality. If a ∗ = c ∗ = 0, then the arguments of the previous paragraph showthat the RLCT is again (6 , a ∗ = c ∗ = 0, we blow up the origin in R andconsider the chart where a = s, c ′ i = sc ′′ i , b ′ i = sb ′′ i , d ′ i = sd ′′ i . The pullback pairis ( h sb ′′ , sb ′′ , sc ′′ , sc ′′ , sd ′′ , sd ′′ i ; s ). The active inequalities for a ∗ = c ∗ = 0 are c ∗ = 0 : 0 ≤ s ( c ′′ − t ∗ + t ) / ( t ∗ − t ) a ∗ = 0 : 0 ≤ s. Near the origin in ( s, b ′′ , b ′′ , c ′′ , c ′′ , d ′′ , d ′′ ) ∈ R , these inequalities imply s = 0so the new region M defined by the active inequalities is not full at the origin.Thus, we can ignore the origin in computing the RLCT. All other points on theexceptional divisor of this blowup lie on some other chart of the blowup wherethe pullback pair is ( s ; s ), so the RLCT is at least (7 , c = s, c = sc ′′ , a = sa ′ , b ′ i = sb ′′ i , d ′ i = sd ′′ i , we have the active inequalitiesbelow. Note that c ∗ = 0 because u ∗ = − u ∗ < b ∗ i = 0 : 0 ≤ s ( b ′′ i − d ′′ i ( t ∗ − t ) / ( t ∗ u ∗ − ( sa ′ + a )) / ( t ∗ + t ) d ∗ i = 0 : 0 ≤ sd ′′ i / ( t ∗ u ∗ − ( sa ′ + a )) c ∗ = 0 : 0 ≤ ( sc ′′ − tu ∗ + ( sa ′ + a )( t ∗ + t )) / ( t ∗ − t ) c ∗ = 0 : 0 ≤ s (1 − a ′ ( t ∗ + t )) / ( t ∗ − t ) a ∗ = 0 : 0 ≤ sa ′ a ∗ = 0 : 0 ≤ a b ′′ i and d ′′ i , we find that the RLCTis (7 , 1) by using Lemma 6.1 with ξ = (2 , , u ∗ , u ∗ , , , , − 1) in coordinates( b ′′ i , b ′′ i , d ′′ j , d ′′ j , a ′ , a , c ′′ , t ). Case 1.2: ω ∗ ∈ Ω m . This implies u ∗ = 0 , v ∗ = 0. Without loss of generality, suppose that u ∗ = 0.If ω ∗ ∈ Ω m , we further assume that a ∗ = d ∗ j = 0 , u ∗ = 0 , v ∗ j = 0. Substituting d i = ( d ′ i + a t ∗ v ∗ i ) / ( a + t ∗ u ∗ ) , i = 1 , a = ( a ′ + a u ∗ ) /u ∗ , the pullback ideal is h a ′ , b ′ , b ′ , c ′ , c ′ , d ′ , d ′ i so the RLCT is at least (7 , a i = ( a ′ w ∗ i + a u ∗ i ) /u ∗ for i = 1 , , w ∗ i = 0 , , − ω ∗ is not in Ω m , we consider the blowup chart a ′ = s, b ′ i = sb ′′ i , c ′ i = sc ′′ i , d ′ i = sd ′′ i .The active inequalities are as follows. The symbol v − denotes v ∗ i ≤ b ∗ i = 0 : 0 ≤ [ sb ′′ i − tv ∗ i − ( sd ′′ i + a t ∗ v ∗ i )( t ∗ − t ) / ( t ∗ u ∗ + a )] / ( t ∗ + t ) v − c ∗ i = 0 : 0 ≤ [ sc ′′ i − tu ∗ i − ( sw ∗ i + a u ∗ i )( t ∗ + t ) /u ∗ ] / ( t ∗ − t ) u + a ∗ i = 0 : 0 ≤ ( sw ∗ i + a u ∗ i ) /u ∗ u − d ∗ i = 0 : 0 ≤ ( sd ′′ i + a t ∗ v ∗ i ) / ( t ∗ u ∗ + a ) v +The crux to understanding the inequalities is this: if a ∗ i = d ∗ j = 0 , u ∗ i = 0 , v ∗ j = 0,the coefficient of a appears with different signs in the inequalities for a ∗ i = 0and d ∗ j = 0. This makes it difficult to choose a suitable vector ξ for Lemma 6.1.Similarly, if b ∗ i = c ∗ j = 0 , v ∗ i = 0 , u ∗ j = 0, the coefficient of u ∗ t + t ∗ a appears withdifferent signs. Fortunately, since ω ∗ / ∈ Ω m , we do not have such obstructionsand it is an easy exercise to find the vector ξ . Thus, the RLCT is (7 , ω ∗ ∈ Ω m \ Ω m , we blow up a = s, a ′ = sa ′′ , b ′ i = sb ′′ i , c i = sc ′′ i , d i = sd ′′ i . The active inequalities for a ∗ = d ∗ j = 0 imply that the new region M isnot full at the origin of this chart. Thus, we shift our focus to the other chartsof the blowup where the pullback pair is ( s ; s ), so the RLCT is at least (8 , a ′ = s, a = sa ′ , b ′ i = sb ′′ i , c i = sc ′′ i , d i = sd ′′ i , we do not haveobstructions coming from any b ∗ i = c ∗ j = 0 , v ∗ i = 0 , u ∗ j = 0 so it is again easy tofind the vector ξ for Lemma 6.1. The threshold is exactly (8 , ω ∗ ∈ Ω m , consider the following two charts out of the nine charts in theblowup of the origin in R .Chart 1: a = s, t = st ′ , a ′ = sa ′′ , b ′ i = sb ′′ i , c i = sc ′′ i , d i = sd ′′ i Chart 2: t = s, a = sa ′ , a ′ = sa ′′ , b ′ i = sb ′′ i , c i = sc ′′ i , d i = sd ′′ i The inequalities for a ∗ i = d ∗ j = 0 , u ∗ i = 0 , v ∗ j = 0 and b ∗ i = c ∗ j = 0 , v ∗ i = 0 , u ∗ j = 0imply that the new region M is not full at points outside of the other sevencharts, so we may ignore these two charts in computing the RLCT. Indeed, forChart 1, the active inequalities a ∗ i = 0 : 0 ≤ s ( a ′′ w ∗ i + u ∗ i ) /u ∗ u − d ∗ i = 0 : 0 ≤ s ( d ′′ i + t ∗ v ∗ i ) / ( t ∗ u ∗ + s ) v +30ell us that a ′′ or d ′′ must be non-zero for M to be full. In Chart 2, suppose M is full at some point x where a ′′ = b ′′ = b ′′ = c ′′ = c ′′ = d ′′ = d ′′ = 0. Then, a ∗ i = 0 : 0 ≤ s ( a ′′ w ∗ i + a ′ u ∗ i ) /u ∗ u − d ∗ i = 0 : 0 ≤ s ( d ′′ i + a ′ t ∗ v ∗ i ) / ( t ∗ u ∗ + sa ′ ) v +imply that a ′ = 0 at x . However, if this is the case, the inequalities b ∗ i = 0 : 0 ≤ s [ b ′′ i − v ∗ i − ( d ′′ i + a ′ t ∗ v ∗ i )( t ∗ − s ) / ( t ∗ u ∗ + sa ′ )] / ( t ∗ + s ) v − c ∗ i = 0 : 0 ≤ s [ c ′′ i − u ∗ i − ( a ′′ w ∗ i + a ′ u ∗ i )( t ∗ + s ) /u ∗ ] / ( t ∗ − s ) u +forces b ′′ i or c ′′ i to be non-zero for some i , a contradiction. Thus, we shift our focusto the other seven charts where the pullback pair is ( s ; s ) and the RLCT is atleast (9 , a ′ = s, a = sa ′ , t = st ′ , b ′ i = sb ′′ i , c ′ i = sc ′′ i , d ′ i = sd ′′ i ,note that we cannot have both a ∗ = 0 and a ∗ = 0 because we assumed a ∗ = 0.It is now easy to find the vector ξ for Lemma 6.1, so the threshold is (9 , Case 1.3: ω ∗ ∈ Ω m . This implies u ∗ i = v ∗ i = 0 for all i . The pullback ideal can be written as h b ′ , b ′ , c ′ , c ′ i + h a , a ih d , d i whose RLCT over an interior point of Ω is (6 , 2) by Proposition 3.7. This occursin Ω m where none of the inequalities are active. Now, suppose the only activeinequalities come from a ∗ = c ∗ = 0. We blow up the origin in { ( a , c ′ ) ∈ R } . Inthe chart given by a = a ′ , c ′ = a ′ c ′′ , the new region M is not full at the origin,so we only need to study the chart where c ′ = c ′′ , a = c ′′ a ′ . The pullback pairbecomes ( h c ′′ i + h b ′ , b ′ , c ′ i + h a ih d , d i ; c ′′ ), and a simple application of Lemma6.1 and Proposition 3.7 shows that the threshold is (6 , − ( h b ′ , b ′ , c ′ , c ′ i + h a , a ih d , d i ; 1) (6 , a ∗ = 0 ( h b ′ , b ′ , c ′′ , c ′ i + h a ih d , d i ; c ′′ ) (6 , a ∗ = 0 , b ∗ = 0 ( h b ′′ , b ′ , c ′′ , c ′ i + h a ih d i ; b ′′ c ′′ ) (7 , a ∗ = 1 ( h b ′ , b ′ , c ′′ , c ′′ i ; c ′′ c ′′ ) (6 , a ∗ = 1 , b ∗ = 0 ( h b ′′ , b ′ , c ′′ , c ′′ i ; b ′′ c ′′ c ′′ ) (7 , a ∗ = 1 , b ∗ = 1 ( h b ′′ , b ′′ , c ′′ , c ′′ i ; b ′′ b ′′ c ′′ c ′′ ) (8 , a ∗ = c ∗ = 1 corresponds to a ∗ = a ∗ = c ∗ = c ∗ = 0. Here,we blow up the origins in { ( a , c ′ ) ∈ R } and { ( a , c ′ ) ∈ R } . As before, we canignore the other charts and just consider the one where a = c ′′ a ′ , c ′ = c ′′ , a = c ′′ a ′ , c ′ = c ′′ . The pullback pair is ( h c ′′ i + h c ′′ i + h b ′ , b ′ i , c ′′ c ′′ ). If b ∗ i = 0 for all i , the RLCT is (6 , 1) by Lemma 6.1 and Proposition 3.7. Case 2: ω ∗ ∈ Ω u . t ∗ = 0 and substitute c i = ( c ′ i − t ( a i + u ∗ i )) / (1 − t ) i = 1 , d i = ( d ′ i − t ( b i + v ∗ i )) / (1 − t ) i = 1 , . The pullback ideal is the sum of h c ′ , c ′ , d ′ , d ′ i and h t ih a + u ∗ , a + u ∗ ih b + v ∗ , b + v ∗ i . Since c ′ = − c ′ − c ′ and similarly for the d ′ i , a i , b i , u ∗ i and v ∗ i , it is useful to writethis ideal more symmetrically as the sum of h c ′ , c ′ , c ′ i , h d ′ , d ′ , d ′ i and h t ih a + u ∗ , a + u ∗ , a + u ∗ ih b + v ∗ , b + v ∗ , b + v ∗ i . Meanwhile, the inequalities are a ∗ i = 0 : 0 ≤ a i c ∗ i = 0 : 0 ≤ ( c ′ i − t ( a i + u ∗ i )) / (1 − t ) u ∗ i ≥ b ∗ j = 0 : 0 ≤ b j d ∗ j = 0 : 0 ≤ ( d ′ j − t ( b j + v ∗ j )) / (1 − t ) v ∗ j ≥ . We now relabel the indices of the a i and c ′ i , without changing the b j and d ′ j , sothat the active inequalities are among those from a ∗ = 0 , a ∗ = 0 , c ∗ i = 0 , c ∗ i = 0.The b j and d ′ j are thereafter also relabeled so that the inequalities come from b ∗ = 0 , b ∗ = 0 , d ∗ j = 0 , d ∗ j = 0. We claim that the new region M contains, forsmall ε , the orthant neighborhood { ( a , a , b , b , c i , c i , d j , d j , − t ) ∈ [0 , ε ] } . Indeed, the only problematic inequalities are c ∗ = 0 : 0 ≤ ( c ′ − t ( − a − a + u ∗ i )) / (1 − t ) u ∗ = 0 d ∗ = 0 : 0 ≤ ( d ′ − t ( − b − b + v ∗ j )) / (1 − t ) v ∗ = 0 . However, these inequalities cannot occur because for instance, u ∗ = 0 and c ∗ = 0implies a ∗ = 0, a contradiction since the a i were relabeled to avoid this. Finally,the threshold of h t i is (1 , 1) while that of h a + u ∗ , a + u ∗ i and h b + v ∗ , b + v ∗ i are at least (2 , 1) each. By Proposition 3.7, the RLCT of their product is (1 , , Acknowledgements. The author wishes to thank Christine Berkesch, MathiasDrton, Anton Leykin, Bernd Sturmfels, Zach Teitler, Sumio Watanabe and PiotrZwiernik, as well as the anonymous reviewers for their many useful suggestions,discussions and corrections. 32 eferences [1] M. Aoyagi and S. Watanabe: Stochastic complexities of reduced rank re-gression in Bayesian estimation, Neural Networks (2005) 924–933.[2] V. I. Arnol’d, S. M. Guse˘ın-Zade and A. N. Varchenko: Singularities ofDifferentiable Maps , Vol. II, Birkh¨auser, Boston, 1985.[3] J. Bertrand, P. Bertrand and J. Ovarlez: The Mellin Transform, in TheTransforms and Applications Handbook: Second Edition , Chapter 12, Ed.A. D. Poularikas, CRC Press, Boca Raton, 2010.[4] E. Bierstone and P. D. Milman: Resolution of singularities, Several complexvariables , MSRI Publications (1999) 43–78.[5] C. Bivi`a-Ausina: Nondegenerate ideals in formal power series rings, RockyMountain J. Math. (2004) 495–511.[6] M. Blickle and R. Lazarsfeld: An informal introduction to multiplier ideals, Trends in Commutative Algebra , MSRI Publications (2004) 87–114.[7] A. Bravo, S. Encinas and O. Villamayor: A simplified proof of desingulari-sation and applications, Rev. Math. Iberoamericana (2005) 349–458.[8] M. Drton and M. Plummer: A Bayesian Information Criterion for SingularModels, J. R. Statist. Soc. B (2017) 1–38.[9] M. Drton, B. Sturmfels and S. Sullivant: Lectures on Algebraic Statistics ,Oberwolfach Seminars , Birkh¨auser, Basel, 2009.[10] D. Eisenbud: Commutative Algebra with a view towards Algebraic Geome-try , Graduate Texts in Mathematics , Springer-Verlag, New York, 1995.[11] M. Evans, Z. Gilula and I. Guttman: Latent class analysis of two-waycontingency tables by Bayesian methods, Biometrika (1989) 557–563.[12] W. Fulton: Introduction to Toric Varieties , Annals of Mathematics Studies , Princeton University Press, Princeton, 1993.[13] D. Geiger and D. Rusakov: Asymptotic model selection for naive Bayesiannetworks, J. Mach. Learn. Res. (2005) 1–35.[14] M. Greenblatt: An elementary coordinate-dependent local resolution ofsingularities and applications, J. Funct. Anal. (2008) 1957–1994.[15] M. Greenblatt: Resolution of singularities, asymptotic expansions of inte-grals, and applications, J. Analyse Math. (2010) 221–245.[16] H. Hironaka: Resolution of singularities of an algebraic variety over a fieldof characteristic zero I, II, Ann. of Math. (2) (1964) 109–203.3317] J. A. Howald: Multiplier ideals of monomial ideals, Trans. Amer. Math.Soc (2001) 2665–2671.[18] J. Koll´ar: Singularities of pairs, in Algebraic geometry—Santa Cruz 1995 ,221–287, Proc. Symp. Pure Math. , Amer. Math. Soc., Providence, 1997.[19] J. Koll´ar: Lectures on Resolution of Singularities (AM-166), Princeton Uni-versity Press, 2009.[20] R. Lazarsfeld: Positivity in Algebraic Geometry I, II , A Series of ModernSurveys in Mathematics , Springer-Verlag, Berlin, 2004.[21] S. Lin, B. Sturmfels and Z. Xu: Marginal likelihood integrals for mixturesof independence models, J. Mach. Learn. Res. (2009) 1611–1631.[22] M. J. Saia: The integral closure of ideals and the Newton filtration, J.Algebraic Geom. (1996) 1–11.[23] M. Saito: On real log canonical thresholds, arXiv:math.AG/0707.2308 .[24] B. Sturmfels: Gr¨obner Bases and Convex Polytopes , University LectureSeries , Amer. Math. Soc., Providence, 1996.[25] A. N. Varchenko: Newton polyhedra and estimation of oscillating integrals, Funct. Anal. Appl. (1977) 175–196.[26] V. A. Vasil’ev: Asymptotic behavior of exponential integrals in the complexdomain, Funktsional. Anal. i Prilozhen. :4 (1979) 1–12.[27] S. Watanabe: Algebraic analysis for nonidentifiable learning machines, Neural Computation (2001) 899–933.[28] S. Watanabe: Algebraic Geometry and Statistical Learning Theory , Cam-bridge Monographs on Applied and Computational Mathematics , Cam-bridge University Press, Cambridge, 2009.[29] K. Yamazaki and S. Watanabe: Singularities in mixture models and upperbounds of stochastic complexity, International Journal of Neural Networks (2003) 1029–1038.[30] K. Yamazaki and S. Watanabe: Newton diagram and stochastic complexityin mixture of binomial distributions, Algorithmic Learning Theory , 350–364,Lecture Notes in Comput. Sci.3244