[PDF] Ideal-Theoretic Strategies for Asymptotic Approximation of Marginal Likelihood Integrals

Abstract

The accurate asymptotic evaluation of marginal likelihood integrals is a fundamental problem in Bayesian statistics. Following the approach introduced by Watanabe, we translate this into a problem of computational algebraic geometry, namely, to determine the real log canonical threshold of a polynomial ideal, and we present effective methods for solving this problem. Our results are based on resolution of singularities. They apply to parametric models where the Kullback-Leibler distance is upper and lower bounded by scalar multiples of some sum of squared real analytic functions. Such models include finite state discrete models.

Full PDF

aa r X i v : . [ s t a t . C O ] F e b Ideal-Theoretic Strategies forAsymptotic Approximation ofMarginal Likelihood Integrals

Shaowei Lin

Abstract

The accurate asymptotic evaluation of marginal likelihood integrals isa fundamental problem in Bayesian statistics. Following the approach in-troduced by Watanabe, we translate this into a problem of computationalalgebraic geometry, namely, to determine the real log canonical thresholdof a polynomial ideal, and we present eﬀective methods for solving thisproblem. Our results are based on resolution of singularities. They applyto parametric models where the Kullback-Leibler distance is upper andlower bounded by scalar multiples of some sum of squared real analyticfunctions. Such models include ﬁnite state discrete models.

Keywords: computational algebra, asymptotic approximation, marginallikelihood, learning coeﬃcient, real log canonical threshold

The evaluation of marginal likelihood integrals is essential in model selection andhas important applications in areas such as machine learning and computationalbiology. The exact evaluation of such integrals is a diﬃcult problem [9, 21] andclassical approximation formulas usually apply only for smooth models. Recentwork by Watanabe and his collaborators [1,27–30] extended these formulas to abroad class of models with singularities. His work also uncovered interesting con-nections with resolution of singularities in algebraic geometry. The goal of thispaper is to systematically study the algebraic geometry behind Watanabe’s for-mulas, and to develop symbolic algebra tools which allow the user to accuratelyevaluate the asymptotics of integrals in Bayesian statistics.Watanabe showed that the key to understanding a singular model is monomi-alizing the Kullback-Leibler function K ( ω ) of the model at the true distribution.While general algorithms exist for monomializing any analytic function [4,7], ap-plying them to non-polynomial functions such as K ( ω ) can be computationallyexpensive. In practice, many singular models are parametrized by polynomials.Therefore, it is natural to ask if this polynomiality can be exploited in the analy-sis of such models. For simplicity, we explore this question for discrete statisticalmodels. Our point of departure is to describe the asymptotics of the likelihood1ntegral by the real log canonical threshold of an ideal in a polynomial ring. Moregenerally, our results will be proved for rings of analytic functions, and they ap-ply to all parametric models where the Kullback-Leibler distance is upper andlower bounded by scalar multiples of a sum of squared real analytic functions.Consider a statistical model M on a ﬁnite discrete space [ k ] = { , , . . . , k } parametrized by a real analytic map p : Ω → ∆ k − where Ω is a compact subsetof R d and ∆ k − is the probability simplex { x ∈ R k : x i ≥ P x i = 1 } . Weassume that Ω is semianalytic , i.e. Ω = { x ∈ R d : g ( x ) ≥ , . . . , g l ( x ) ≥ } isdeﬁned by real analytic inequalities. Let q ∈ ∆ k − be a point in the model withnon-zero entries. Suppose a sample of size N is drawn from the true distribution q , and let U = ( U i ) denote the vector of relative frequencies for this sample. Let ϕ : Ω → R be nearly analytic , i.e. ϕ is a product ϕ a ϕ s of functions where ϕ a isreal analytic and ϕ s is positive and smooth. Consider a Bayesian prior deﬁnedby | ϕ | . Priors of this form are discussed in Remark 2.7. We are interested in theasymptotics, for large sample sizes N , of the marginal likelihood integral Z ( N ) = Z Ω k Y i =1 p i ( ω ) NU i | ϕ ( ω ) | dω. (1)The ﬁrst few terms of the asymptotics of the log likelihood integral log Z ( N )was derived by Watanabe. To state his result, we ﬁrst recall that the Kullback-Leibler distance K ( ω ) between q and p ( ω ) is K ( ω ) = k X i =1 q i log q i p i ( ω ) . This function satisﬁes K ( ω ) ≥ p ( ω ) = q . Theorem 1.1 (Watanabe [28, § . Asymptotically as N → ∞ , log Z ( N ) = N k X i =1 U i log q i − λ log N + ( θ −

1) log log N + η N (2) where the positive rational number λ is the smallest pole of the zeta function ζ ( z ) = Z Ω K ( ω ) − z | ϕ ( ω ) | dω, z ∈ C , (3) θ is its multiplicity, and η N is a random variable whose expectation E [ η N ] con-verges to a constant. Here, λ is known as the learning coeﬃcient of the model at the distribution q .Because formula (2) generalizes the Bayesian information criterion [13, 28], thenumbers λ and θ are important in model selection. Indeed, the BIC correspondsto the case ( λ, θ ) = ( d ,

1) for smooth models. In algebraic geometry, λ is alsoknown as the real log canonical threshold [23] of K , a term that is motivated by2he more familiar complex log canonical threshold (see Remark 3.1). We denotethis algebraic invariant by ( λ, θ ) = RLCT Ω ( K ; ϕ ).These thresholds may be deﬁned for ideals in rings of real-valued analyticfunctions as well. Given an ideal I = h f , . . . , f r i generated by functions f i ⊂ R d , and a smooth amplitudefunction ϕ : R d → R , we consider the zeta function ζ ( z ) = Z Ω (cid:16) f ( ω ) + · · · + f r ( ω ) (cid:17) − z/ | ϕ ( ω ) | dω. (4)We show that if ϕ is nearly analytic, then ζ ( z ) has an analytic continuation tothe whole complex plane. Its poles are positive rational numbers with a smallestelement λ which we call the real log canonical threshold of I with respect to ϕ over Ω. Let θ be the multiplicity of λ as a pole of ζ ( z ) and deﬁne RLCT Ω ( I ; ϕ )to be the pair ( λ, θ ). Order these pairs such that ( λ , θ ) > ( λ , θ ) if λ > λ , or λ = λ and θ < θ . We will show that this pair does not depend on the choiceof generators f , . . . , f r for I . In the literature, real log canonical thresholds ofideals are not well-investigated [23]. For this reason, we formally prove many ofits properties in Section 3.With these deﬁnitions on hand, we now state our ﬁrst main theorem. Thisresult expresses the learning coeﬃcient and its multiplicity directly in terms ofthe functions p , . . . , p k parametrizing the model. Geometrically, it says that thelearning coeﬃcient is the real log canonical threshold of the ﬁber p − ( q ) ⊂ Ω.The theorem is computationally very useful especially when the p i are polyno-mials or rational functions, and certain special cases have been applied by SumioWatanabe and his collaborators [29, 30]. Our proof in Section 3 was inspired bya discussion with him. Now, recall that ϕ = ϕ a ϕ s is nearly analytic. Theorem 1.2.

Let ( λ, θ ) be the learning coeﬃcient and multiplicity of the model M at q > . Let I denote the ideal h p ( ω ) − q i := h p ( ω ) − q , . . . , p k ( ω ) − q k i , andlet V be its zero-locus { ω ∈ Ω : p ( ω ) = q } = p − ( q ) . Then, (2 λ, θ ) = min x ∈V RLCT Ω x ( I ; ϕ a ) where each Ω x is a suﬃciently small neighborhood of x in Ω .More generally, let K ( ω ) be any real analytic function on Ω that is boundedfor some constants c , c > and some real analytic f i ( ω ) over Ω , by c k X i =1 f i ( ω ) ≤ K ( ω ) ≤ c k X i =1 f i ( ω ) . Then, the real log canonical threshold ( λ, θ ) = RLCT Ω ( K ; ϕ ) satisﬁes (2 λ, θ ) = RLCT Ω ( I ; ϕ a ) where I is the ideal h f ( ω ) , . . . , f k ( ω ) i .

3o prove this theorem and other properties of real log canonical thresholds,we recall Hironaka’s theorem on the resolution of singularities [16] and developuseful lemmas in Section 2. Our treatment diﬀers from that of Watanabe [28] inthe following way: we study the local behavior of real log canonical thresholdsat points x in the parameter space Ω. In particular, we will be interested in thecase where x is on the boundary ∂ Ω. Example 2.8 is an illustration of how thethreshold is aﬀected by the inequalities g i ≥ x . This issuecan be critical in singular model selection because the parameter space of onemodel is often contained in the boundary of another that is more complex.After studying the local thresholds, we then show that the real log canonicalthreshold globally over Ω is the minimum of local thresholds at points x in Ω.Identifying where these minimum thresholds occur is by itself a diﬃcult problemwhich we discuss in Section 2. As a consequence of our results, we write down ex-plicit formulas for the coeﬃcients in asymptotic expansions of Laplace integrals.Our formulas extend those of Arnol’d–Guse˘ın-Zade–Varchenko [2] because theyapply also to parameter spaces with boundary. Using this expansion to improveapproximations of likelihood integrals will be the subject of future work.Our next aim is to develop tools for computing or bounding real log canoni-cal thresholds of ideals. Section 3 summarizes useful fundamental properties ofreal log canonical thresholds. In Section 4, we derive local thresholds in nonde-generate cases using an important tool from toric geometry involving Newtonpolyhedra. This method was invented by Varchenko [25] and applied to statis-tical models by Watanabe and Yamazaki [30]. Their formulas were deﬁned forfunctions, but we develop extensions of these formulas for ideals. We introducea new notion of nondegeneracy for ideals, known as sos-nondegeneracy , and givethe following bound for the real log canonical threshold of an ideal with respectto a monomial amplitude function ω τ := ω τ · · · ω τ d d . These monomial functionsoccur frequently when we apply a change of variables to resolve the singularitiesin a model. Newton polyhedra and their τ -distances are deﬁned in Section 4. Theorem 1.3.

Let I be a ﬁnitely generated ideal in the ring of functions whichare real analytic on Ω , and suppose the origin lies in the interior of Ω . Then,for every suﬃciently small neighborhood Ω of the origin, RLCT Ω ( I ; ω τ ) ≤ (1 /l τ , θ τ ) where l τ is the τ -distance of the Newton polyhedron P ( I ) and θ τ its multiplicity.Equality occurs when I is monomial or, more generally, sos-nondegenerate. This theorem has two main consequences. Firstly, it tells us that the real logcanonical threshold of an ideal can be computed by ﬁnding a change of variableswhich monomializes the ideal. Secondly, due to Theorems 1.1 and 1.2, upperbounds on real log canonical thresholds translate to asymptotic lower bounds onthe likelihood integral of a statistical model, which in turn give upper boundson the stochastic complexity of the model.Currently, there are no programs for computing real log canonical thresholds.There are applications which compute resolutions of singularities, but our statis-tical problems are too big for them. We hope that our work is a step in bridging4he gap. Some of our tools are implemented in a

Singular library at https://w3id.org/people/shaoweilin/public/rlct.html .This library computes the Newton polyhedron of an ideal, computes τ -distances,and checks if an ideal is sos-nondegenerate. Instructions and examples on usingthe library may be found at the above website.In summary, the learning coeﬃcient of a statistical model is a useful measureof the model complexity and plays an important role in model selection. Becausecomputing this coeﬃcient often requires careful analysis of the Kullback-Leiblerfunction, we propose an ideal-theoretic approach to make this calculation moretractable. This method has several advantages. Firstly, it directly exploits poly-nomiality in the model parametrization. Second, the real log canonical thresholdof an ideal is independent of the choice of generators, and this choice providesﬂexibility to our computations. Thirdly, it is easier to construct Newton polyhe-dra for polynomial ideals and to check their nondegeneracy (Proposition 3.2(3)),than for nonpolynomial Kullback-Leibler functions. We demonstrate these ideasin Section 5 by computing the learning coeﬃcients of a discrete mixture modelwhich comes from a study involving 132 schizophrenic patients.To introduce some notation, given x ∈ R d , let A x ( R d ) be the ring of real-valued functions f : R d → R that are analytic at x . We sometimes shorten thenotation to A x when it is clear that we are working with the space R d . When x = 0, it is convenient of think of A as a subring of the formal power series ring R [[ ω , . . . , ω d ]] = R [[ ω ]]. It consists of power series which are convergent in someneighborhood of the origin. For all x , A x is isomorphic to A by translation.Given a subset Ω ⊂ R d , let A Ω be the ring of real functions analytic at each point x ∈ Ω. Locally, each function can be represented as a power series centered at x . Given f ∈ A Ω , deﬁne the analytic variety V Ω ( f ) = { ω ∈ Ω : f ( ω ) = 0 } whilefor an ideal I ⊂ A Ω , we set V Ω ( I ) = ∩ f ∈ I V Ω ( f ). Lastly, given a ﬁnite multiset S ⊂ R , let S denote the number of times the minimum is attained in S . In this section, we introduce Hironaka’s theorem on resolutions of singularities.We derive real log canonical thresholds of monomial functions, and demonstratehow such resolutions allow us to ﬁnd the thresholds of non-monomial functions.We show that the threshold of a function over a compact set is the minimumof local thresholds, and present an example where the threshold at a boundarypoint depend on the boundary inequalities. We discuss the problem of locatingsingularities with the smallest threshold, and end this section with formulas forthe asymptotic expansion of a Laplace integral.Before we explore real log canonical thresholds of ideals, let us study thoseof functions. Given a compact subset Ω of R d , a real analytic function f ∈ A Ω with f

0, and a smooth function ϕ : R d → R , consider the zeta function ζ ( z ) = Z Ω (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω, z ∈ C . (5)5his function is well-deﬁned for z ∈ R ≤ . If ζ ( z ) can be continued analytically tothe whole complex plane C , then all its poles are isolated points in C . Moreover,if all its poles are real, then there exists a smallest positive pole λ . Let θ be themultiplicity of this pole. The pole λ is the real log canonical threshold of f withrespect to ϕ over Ω. If ζ ( z ) has no poles, we set λ = ∞ and leave θ undeﬁned.Let RLCT Ω ( f ; ϕ ) be the pair ( λ, θ ). By abuse of notation, we sometimes refer tothis pair as the real log canonical threshold of f . We order these pairs such that( λ , θ ) > ( λ , θ ) if λ > λ , or λ = λ and θ < θ . Intuitively, consideringthe asymptotics of log Z ( N ) in Theorem 1.1, the ordering is deﬁned in this wayso that ( λ , θ ) > ( λ , θ ) if and only if λ log N − ( θ −

1) log log

N > λ log N − ( θ −

1) log log N for suﬃciently large N . Lastly, let RLCT Ω f denote RLCT Ω ( f ; 1) where 1 is theconstant unit function.We start with a simple class of functions for which it is easy to compute thereal log canonical threshold. It is the class of monomials ω κ · · · ω κ d d = ω κ . Proposition 2.1.

Let κ = ( κ , . . . , κ d ) and τ = ( τ , . . . , τ d ) be vectors of non-negative integers. If Ω is the positive orthant R d ≥ and φ : R d → R is compactlysupported and smooth with φ (0) > , then RLCT Ω ( ω κ ; ω τ φ ) = ( λ, θ ) where λ = min ≤ j ≤ d { τ j + 1 κ j } , θ = ≤ j ≤ d { τ j + 1 κ j } . Proof.

See [2, Lemma 7.3]. The idea is to express φ ( ω ) as T s ( ω ) + R s ( ω ) where T s is the s -th degree Taylor polynomial and R s the diﬀerence. We then integratethe main term | f | − z T s explicitly and show that the integral of the remainingterm | f | − z R s does not have smaller poles. This process gives the analytic con-tinuation of ζ ( z ) to the whole complex plane, so we have the Laurent expansion ζ ( z ) = X α> d X i =1 d i,α ( z − α ) i + P ( z ) (6)where the poles α are positive rational numbers and P ( z ) is a polynomial.For non-monomial f ( ω ), Hironaka’s celebrated theorem [16] on the resolutionof singularities tells us that we can always reduce to the monomial case. Here,a d -dimensional real analytic manifold is a topological space (second countableand Hausdorﬀ) that can be covered by charts which are homeomorphic to openballs in R d and where the transition maps between charts are real analytic maps. Theorem 2.2 (Resolution of Singularities) . Let f be a non-constant real ana-lytic function in some neighborhood Ω ⊂ R d of the origin with f (0) = 0 . Then,there exists a triple ( M, W, ρ ) wherea. W ⊂ Ω is a neighborhood of the origin,b. M is a d -dimensional real analytic manifold, . ρ : M → W is a real analytic mapsatisfying the following properties.i. ρ is proper, i.e. the inverse image of any compact set is compact.ii. ρ is a real analytic isomorphism between M \ V M ( f ◦ ρ ) and W \ V W ( f ) .iii. For any y ∈ V M ( f ◦ ρ ) , there exists a local chart M y with coordinates µ = ( µ , µ , . . . µ d ) such that y is the origin and f ◦ ρ ( µ ) = a ( µ ) µ κ µ κ · · · µ κ d d = a ( µ ) µ κ where κ , κ , . . . , κ d are non-negative integers and a is a real analytic func-tion with a ( µ ) = 0 for all µ . Furthermore, the Jacobian determinant equals | ρ ′ ( µ ) | = h ( µ ) µ τ µ τ · · · µ τ d d = h ( µ ) µ τ where τ , τ , . . . , τ d are non-negative integers and h is a real analytic func-tion with h ( µ ) = 0 for all µ . We say that (

M, W, ρ ) is a resolution of singularities or a desingularization of f at the origin. The set of points in M where ρ is not one-to-one is the excep-tional divisor . From properties (i) and (ii), it also follows that ρ is surjective: if x ∈ V W ( f ), we pick a compact neighborhood V of x and a sequence x , x , . . . ofpoints in V \ V W ( f ) converging to x . The sequence can be chosen oﬀ the varietybecause the variety has measure zero. Then, the preimages ρ − ( x ) , ρ − ( x ) , . . . contain a converging subsequence with limit y , and ρ ( y ) = x by continuity.Now, let us desingularize a list of functions simultaneously. Corollary 2.3 (Simultaneous Resolutions) . Let f , . . . , f l be non-constant realanalytic functions in some neighborhood Ω ⊂ R d of the origin with all f i (0) = 0 .Then, there exists a triple ( M, W, ρ ) that desingularizes each f i at the origin.Proof. The idea is to desingularize the product f ( ω ) · · · f l ( ω ) and to show thatsuch a resolution of singularities is also a resolution for each f i . See [28, Thm11] and [14, Lemma 2.3] for details.For the rest of this section, let Ω = { ω ∈ R d , g ( ω ) ≥ , . . . , g l ( ω ) ≥ } becompact and semianalytic. We also assume that f, ϕ ∈ A Ω , and that f, g , . . . , g l are not constant functions. Lemma 2.4.

For each x ∈ Ω , there is a neighborhood Ω x of x in Ω such thatfor all smooth functions φ on Ω x with φ ( x ) > , RLCT Ω x ( f ; ϕφ ) = RLCT Ω x ( f ; ϕ ) . Proof.

Let x ∈ Ω. If f ( x ) = 0, then by the continuity of f , there exists a smallneighborhood Ω x where 0 < c < | f ( ω ) | < c for some constants c , c . Hence,for all smooth functions φ , the zeta functions Z Ω x (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) φ ( ω ) | dω and Z Ω x (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω

7o not have any poles, so the lemma follows in this case.Suppose f ( x ) = 0. By Corollary 2.3, we have a simultaneous local resolutionof singularities ( M, W, ρ ) for the functions f, ϕ, g , . . . , g l vanishing at x . Foreach point y in the ﬁber ρ − ( x ), we have a local chart M y satisfying property (iii)of Theorem 2.2. Since ρ is proper, the ﬁber ρ − ( x ) is compact so there is a ﬁnitesubcover { M y } . We claim that the image ρ ( S M y ) contains a neighborhood W x of x in R d . Indeed, otherwise, there exists a bounded sequence { x , x , . . . } ofpoints in W \ ρ ( S M y ) whose limit is x . We pick a sequence { y , y , . . . } where ρ ( y i ) = x i . Since the x i are bounded, the y i lie in a compact set so there is aconvergent subsequence { ˜ y i } with limit y ∗ . The ˜ y i are not in the open set S M y so nor is y ∗ . But ρ ( y ∗ ) = lim ρ (˜ y i ) = x so y ∗ ∈ ρ − ( x ) ⊂ M y , a contradiction.Now, deﬁne Ω x = W x ∩ Ω and let {M y } be the collection of all sets M y = M y ∩ ρ − (Ω x ) which have positive measure. Picking a partition of unity { σ y ( µ ) } subordinate to {M y } such that σ y is positive at y for each y [28, Theorem 6.5],we write the zeta function ζ ( z ) = R Ω x | f ( ω ) | − z | ϕ ( ω ) φ ( ω ) | dω as X y Z M y (cid:12)(cid:12) f ◦ ρ ( µ ) (cid:12)(cid:12) − z | ϕ ◦ ρ ( µ ) || φ ◦ ρ ( µ ) || ρ ′ ( µ ) | σ y ( µ ) dµ. For each y , the boundary conditions g i ◦ ρ ( µ ) ≥ M y is the union of closed orthant neighborhoods of y . The integral over M y is then the sum of integrals of the form ζ y ( z ) = Z R d ≥ µ − κz + τ ψ ( µ ) dµ where κ and τ are non-negative integer vectors while ψ is a compactly supportedsmooth function with ψ (0) >

0. Note that κ and τ do not depend on φ nor onthe choice of orthant at y . By Proposition 2.1, the smallest pole of ζ y ( z ) is λ y = min ≤ j ≤ d { τ j + 1 κ j } , θ y = ≤ j ≤ d { τ j + 1 κ j } . Now, RLCT Ω x ( f ; ϕφ ) = min y { ( λ y , θ y ) } . Since this formula is independent of φ ,we set φ = 1 and the lemma follows. Proposition 2.5.

Let φ : Ω → R be positive and smooth. Then, for suﬃcientlysmall neighborhoods Ω x , the set { RLCT Ω x ( f ; ϕ ) : x ∈ Ω } has a minimum and RLCT Ω ( f ; ϕφ ) = min x ∈ Ω RLCT Ω x ( f ; ϕ ) . Proof.

Lemma 2.4 associates a small neighborhood to each point in the compactset Ω, so there exists a ﬁnite subcover { Ω x : x ∈ S } . Let { σ x ( ω ) } be a smoothpartition of unity subordinate to this subcover where σ x ( x ) > x . Then, Z Ω (cid:12)(cid:12) f (Ω) (cid:12)(cid:12) − z | ϕ ( ω ) φ ( ω ) | dω = X x ∈ S Z Ω x (cid:12)(cid:12) f (Ω) (cid:12)(cid:12) − z | ϕ ( ω ) φ ( ω ) | σ x ( ω ) dω. Ω ( f ; ϕφ ) = min x ∈ S RLCT Ω x ( f ; ϕφσ x ) = min x ∈ S RLCT Ω x ( f ; ϕ ) . Now, if y ∈ Ω \ S , let Ω y be a neighborhood of y prescribed by Lemma 2.4 andconsider the cover { Ω x : x ∈ S } ∪ { Ω y } of Ω. After choosing a partition of unitysubordinate to this cover and repeating the above argument, we getRLCT Ω ( f ; ϕφ ) ≤ RLCT Ω y ( f ; ϕ ) for all y ∈ Ω . Combining the two previously displayed equations proves the proposition.Abusing notation, we now let RLCT Ω x ( f ; ϕ ) represent the real log canonicalthreshold for a suﬃciently small neighborhood Ω x of x in Ω. If x is an interiorpoint of Ω, we denote the threshold at x by RLCT x ( f ; ϕ ). Corollary 2.6 (See also [28, § . Given a compact semianalytic set Ω ⊂ R d ,a nearly analytic function ϕ : Ω → R , and f ∈ A Ω satisfying f ( x ) = 0 for some x ∈ Ω , the zeta function (5) can be continued analytically to C . It has a Laurentexpansion (6) whose poles are positive rational numbers with a smallest element.Proof. The proofs of Lemma 2.4 and Proposition 2.5 outline a way to computethe Laurent expansion of the zeta function (5).

Remark 2.7.

In our deﬁnition of real log canonical thresholds, we consideredintegrals with respect to densities | ϕ ( ω ) | dω for some nearly analytic function ϕ , while Watanabe only considers the special case where the density is ϕ ( ω ) dω for some smooth positive function ϕ . Our general case includes the situationwhere the absolute value of a Jacobian determinant is multiplied to the densityunder a change of variables. To prove the basic properties of real log canonicalthresholds, we need to resolve the singularities of the variety ϕ = 0 together withthose cut out by f, g , . . . , g l , as demonstrated in Lemma 2.4. Example 2.8.

We now show that the threshold at a boundary point dependson the boundary inequalities. Consider the following two small neighborhoodsof the origin in some larger compact set.Ω = { ( x, y ) ∈ R : 0 ≤ x ≤ y ≤ ε } Ω = { ( x, y ) ∈ R : 0 ≤ y ≤ x ≤ ε } To compute the real log canonical threshold of the function xy over these sets,we have the corresponding zeta functions below. ζ ( z ) = Z ε Z y x − z y − z dx dy = ε − z +2 ( − z + 1)( − z + 2) ζ ( z ) = Z ε Z x x − z y − z dy dx = ε − z +2 ( − z + 1)( − z + 2)This shows that RLCT Ω ( xy ) = 2 / Ω ( xy ) = 1 / ⊂ R d is the minimumof thresholds at points x ∈ Ω, we want to know where this minimum is achieved.Let us study this problem topologically. Consider a locally ﬁnite collection S ofpairwise disjoint submanifolds S ⊂ Ω such that Ω = ∪ S ∈S S and each S is locallyclosed, i.e. the intersection of an open and a closed subset. Let S be the closureof S . We say S is a stratiﬁcation of Ω if S ∩ T = ∅ implies S ⊂ T for all S, T ∈ S .A stratiﬁcation S of Ω is a reﬁnement of another stratiﬁcation T if S ∩ T = ∅ implies S ⊂ T for all S ∈ S and T ∈ T .Let the amplitude ϕ : Ω → R be nearly analytic. Let S ( λ,θ ) , , . . . , S ( λ,θ ) ,r bethe connected components of the set { x ∈ Ω : RLCT Ω x ( f ; ϕ ) = ( λ, θ ) } , and let S denote the collection { S ( λ,θ ) ,i } where we vary over all λ , θ and i . Now, deﬁnethe order ord x f to be the smallest degree of a monomial appearing in a seriesexpansion of f at x ∈ Ω [10, § ω , . . . , ω d because it is the largest integer k such that f ∈ m kx where m x = { g ∈ A x : g ( x ) = 0 } is the vanishing ideal of x . Deﬁne T l, , . . . , T l,s to be the connected components of the set { x ∈ Ω : ord x f = l } and let T bethe collection { T l,j } where we vary over all l and j . We conjecture the followingrelationship between S and T . It implies that the minimum real log canonicalthreshold over a set must occur at a point of highest order. Conjecture 2.9.

The collections S and T are stratiﬁcations of Ω . Furthermore,if the amplitude ϕ is a positive smooth function, then S reﬁnes T . Laplace integrals such as (1) occur frequently in physics, statistics and otherapplications. At ﬁrst, the relationship between their asymptotic expansions andthe zeta function (3) seems strange. The key is to write these integrals as Z ( N ) = Z Ω e − N | f ( ω ) | | ϕ ( ω ) | dω = Z ∞ e − Nt v ( t ) dtζ ( z ) = Z Ω (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω = Z ∞ t − z v ( t ) dt where v ( t ) is the state density function [28] or Gelfand-Leray function [2] v ( t ) = ddt Z < | f ( ω ) |

4] and Greenblatt [15].Using this strategy, we now give explicit formulas for the asymptotic expan-sion of an arbitrary Laplace integral. Our formulas generalize those of Arnol’d–Guse˘ın-Zade–Varchenko [2, § c α,i and the Laurent coeﬃcients d α,i in termsof derivatives Γ ( i ) of Gamma functions. Theorem 2.10.

Let Ω ⊂ R d be a compact semianalytic subset and ϕ : Ω → R be nearly analytic. If f ∈ A Ω with f ( x ) = 0 for some x ∈ Ω , the Laplace integral Z ( N ) = Z Ω e − N | f ( ω ) | | ϕ ( ω ) | dω has the asymptotic expansion X α d X i =1 c α,i N − α (log N ) i − . (10) The α in this expansion range over positive rational numbers which are poles of ζ ( z ) = Z Ω δ (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω (11) for any δ > and Ω δ = { ω ∈ Ω : | f ( ω ) | < δ } . The coeﬃcients c α,i satisfy c α,i = ( − i ( i − d X j = i Γ ( j − i ) ( α )( j − i )! d α,j (12) where d α,j is the coeﬃcient of ( z − α ) − j in the Laurent expansion of ζ ( z ) .Proof. First, set δ = 1. We split the integral Z ( N ) into two parts: Z ( N ) = Z | f ( ω ) | < e − N | f ( ω ) | | ϕ ( ω ) | dω + Z | f ( ω ) |≥ e − N | f ( ω ) | | ϕ ( ω ) | dω. The second integral is bounded above by Ce − N for some non-negative constant C , so asymptotically it goes to zero more quickly than any N − α . For the ﬁrstintegral, we write ζ ( z ) as the Mellin transform of the state density function v ( t ). ζ ( z ) = Z | f ( ω ) | < (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω = Z t − z v ( t ) dt.

11y Corollary 2.6, ζ ( z ) has a Laurent expansion (6). Since | f ( ω ) | <

1, by domin-ated convergence ζ ( z ) → z → −∞ , so the polynomial part P ( z ) is iden-tically zero. Applying the inverse Mellin transform [3] to ζ ( z ), we get a seriesexpansion (8) of the state density function v ( t ). Applying the Laplace transformto v ( t ) in turn gives the asymptotic expansion (7) of Z ( N ). The formulas Z ∞ e − Nt t α − (log t ) i dt ≈ i X j =0 (cid:18) ij (cid:19) ( − j Γ ( i − j ) ( α ) N − α (log N ) j Z t − z t α − (log t ) i dt = − i ! ( z − α ) − ( i +1) from [2, Thm 7.4] and [28, Ex 4.7] give us the relations c α,i = ( − i − d X j = i (cid:18) j − i − (cid:19) Γ ( j − i ) ( α ) b α − ,j , d α,j = − ( j − b α − ,j . Equation (12) follows immediately. Finally, for all other values of δ , we write Z Ω (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω = Z Ω δ (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω + Z | f ( ω ) |≥ δ (cid:12)(cid:12) f ( ω ) (cid:12)(cid:12) − z | ϕ ( ω ) | dω. The last integral does not have any poles, so the principal parts of the Laurentexpansions of the ﬁrst two integrals are the same for all δ . In this section, we prove fundamental properties of real log canonical thresholds(RLCTs) which will allow us to calculate these thresholds more eﬃciently. Thelearning coeﬃcient of a statistical model is shown to be the RLCT of the idealgenerated by its deﬁning equations.In this section, let Ω ⊂ R d be a compact semianalytic subset and let ϕ : Ω → R be nearly analytic. Given functions f , . . . , f r ∈ A Ω , let RLCT Ω ( f , . . . , f r ; ϕ )be the smallest pole and multiplicity of the zeta function (4). Recall that thesepairs are ordered by the rule ( λ , θ ) > ( λ , θ ) if λ > λ , or λ = λ and θ < θ . For x ∈ Ω, we deﬁne RLCT Ω x ( f , . . . , f r ; ϕ ) to be the threshold for asuﬃciently small neighborhood Ω x of x in Ω. Remark 3.1.

The (complex) log canonical threshold may be deﬁned in a similarfashion. It is the smallest pole of the zeta function ζ ( z ) = Z Ω (cid:16) | f ( ω ) | + · · · + | f r ( ω ) | (cid:17) − z dω. Note that the f i have been replaced by | f i | and the exponent − z/ − z . Crudely, this factor of 2 comes from the fact that C d is a real vector spaceof dimension 2 d . The complex threshold is often diﬀerent from the RLCT [23].12rom the algebraic geometry point of view, more is known about complex logcanonical thresholds than about real log canonical thresholds. Many results inthis paper were motivated by their complex analogs [6, 17, 18, 20].Now, we give several equivalent deﬁnitions of RLCT Ω ( f , . . . , f r ; ϕ ) whichare helpful in proofs of the fundamental properties. Proposition 3.2.

Given functions f , . . . , f r ∈ A Ω such that each f i and V Ω ( h f , . . . , f r i ) is nonempty, the pairs ( λ, θ ) deﬁned below are all equal.a. The logarithmic Laplace integral log Z ( N ) = log Z Ω exp (cid:16) − N r X i =1 f i ( ω ) (cid:17) | ϕ ( ω ) | dω is asymptotically − λ log N + ( θ −

1) log log N + O (1) .b. The zeta function ζ ( z ) = Z Ω (cid:16) r X i =1 f i ( ω ) (cid:17) − z/ | ϕ ( ω ) | dω has a smallest pole λ of multiplicity θ .c. The pair ( λ, θ ) is the minimum min x ∈ Ω RLCT Ω x ( f , . . . , f r ; ϕ ) . In fact, it is enough to vary x over V Ω ( h f , . . . , f r i ) .Proof. Item (b) is the original deﬁnition of the RLCT. The equivalence of (a)and (b) follows from Theorem 2.10, and that of (b) and (c) from Proposition2.5. The last statement of (c) follows from the fact that the RLCT is ∞ forpoints x / ∈ V Ω ( h f , . . . , f r i ). See also [28, Thm 7.1].Our ﬁrst property describes the eﬀect of the boundary on the RLCT. Proposition 3.3.

Let x be a boundary point of Ω ⊂ R d . Then, for every neigh-borhood W of x in R d , RLCT W ( f ; ϕ ) ≤ RLCT Ω x ( f ; ϕ ) . Proof.

For a suﬃciently small neighborhood Ω x of x in Ω, we have Ω x ⊂ W , sothe corresponding Laplace integrals satisfy Z Ω x ( N ) ≤ Z W ( N ). By Proposition3.2, this gives the opposite inequality on the RLCTs.If the function whose RLCT we are ﬁnding is complicated, we may replaceit with a simpler function that bounds it. Given f, g ∈ A Ω , we say that f and g are equivalent in Ω if c f ≤ g ≤ c f in Ω for some c , c > roposition 3.4 ([28, Remark 7.2]) . Given f, g ∈ A Ω , suppose that ≤ cf ≤ g in Ω for some c > . Then, RLCT Ω ( f ; ϕ ) ≤ RLCT Ω ( g ; ϕ ) . Corollary 3.5. If f, g are equivalent in Ω , then RLCT Ω ( f ; ϕ ) = RLCT Ω ( g ; ϕ ) . RLCT Ω ( f + · · · + f r ; ϕ ) = ( λ, θ ) implies RLCT Ω ( f , . . . , f r ; ϕ ) = (2 λ, θ ).From this, it seems that we should restrict ourselves to RLCTs of single and notmultiple functions. However, as the next proposition shows, multiple functionsare important because they allow us to work with ideals for which diﬀerent gen-erating sets can be chosen. This gives us freedom to switch between single andmultiple functions in powerful ways. For instance, special cases of this proposi-tion such as Lemmas 3 and 4 of [1] have been used to simplify computations. Proposition 3.6.

If two sets { f , . . . , f r } and { g , . . . , g s } of functions generatethe same ideal I ⊂ A Ω , then RLCT Ω ( f , . . . , f r ; ϕ ) = RLCT Ω ( g , . . . , g s ; ϕ ) . Deﬁne this pair to be

RLCT Ω ( I ; ϕ ) .Proof. Each g j can be written as a combination h f + · · · + h r f r of the f i wherethe h i are real analytic over Ω. By the Cauchy-Schwarz inequality, g j ≤ (cid:0) h + · · · + h r ) (cid:0) f + · · · + f r (cid:1) . Because Ω is compact, the h i are bounded. Thus, summing over all the g j , thereis some constant c > s X j =1 g j ≤ c r X i =1 f i . By Proposition 3.4, RLCT Ω ( g , . . . , g r ; ϕ ) ≤ RLCT Ω ( f , . . . , f r ; ϕ ) and by sym-metry, the reverse is also true, so we are done. See also [23, § f , . . . , f r ∈ A X and g , . . . , g s ∈ A Y where X ⊂ R m and Y ⊂ R n are compact semianalytic subsets. This occurs, for instance, whenthe f i and g j are polynomials with disjoint sets of indeterminates { x , . . . , x m } and { y , . . . , y n } . Let ϕ X : X → R and ϕ Y : Y → R be nearly analytic. Deﬁne( λ X , θ X ) = RLCT X ( f , . . . , f r ; ϕ X ) and ( λ Y , θ Y ) = RLCT Y ( g , . . . , g s ; ϕ Y ).By composing with projections X × Y → X and X × Y → Y , we may regardthe f i and g j as functions analytic over X × Y . Let I X and I Y be ideals in A X × Y generated by the f i and g j respectively. Recall that the sum I X + I Y is generatedby all the f i and g j while the product I X I Y is generated by f i g j for all i, j . Proposition 3.7.

The RLCTs for the sum and product of ideals I X and I Y are RLCT X × Y ( I X + I Y ; ϕ X ϕ Y ) = ( λ X + λ Y , θ X + θ Y − , RLCT X × Y ( I X I Y ; ϕ X ϕ Y ) =  ( λ X , θ X ) if λ X < λ Y , ( λ Y , θ Y ) if λ X > λ Y , ( λ X , θ X + θ Y ) if λ X = λ Y . roof. Deﬁne f ( x ) = f + · · · + f r and g ( y ) = g + · · · + g s , and let Z X ( N ) and Z Y ( N ) be the corresponding Laplace integrals. By Proposition 3.2,log Z X ( N ) = − λ X log N + ( θ X −

1) log log N + O (1)log Z Y ( N ) = − λ Y log N + ( θ Y −

1) log log N + O (1)asymptotically. If ( λ, θ ) = RLCT X × Y ( I X + I Y ; ϕ X ϕ Y ), then − λ log N + ( θ −

1) log log N + O (1)= log R X × Y e − Nf ( x ) − Ng ( y ) | ϕ X || ϕ Y | dx dy = log (cid:0) R X e − Nf ( x ) | ϕ X | dx (cid:1)(cid:0) R Y e − Ng ( y ) | ϕ Y | dy (cid:1) = log Z X ( N ) + log Z Y ( N )= − ( λ X + λ Y ) log N + ( θ X + θ Y −

2) log log N + O (1)and the ﬁrst result follows. For the second result, note that f ( x ) g ( y ) = f g + f g + · · · + f r g s . Let ζ X ( z ) and ζ Y ( z ) be the zeta functions corresponding to f ( x ) and g ( y ). ByProposition 3.2, ( λ X , θ X ) and ( λ Y , θ Y ) are the smallest poles of ζ X ( z ) and ζ Y ( z )while RLCT X × Y ( I X I Y ; ϕ X ϕ Y ) is the smallest pole of ζ ( z ) = R X × Y (cid:0) f ( x ) g ( y ) (cid:1) − z/ | ϕ X || ϕ Y | dx dy = (cid:0) R X f ( x ) − z/ | ϕ X | dx (cid:1)(cid:0) R Y g ( y ) − z/ | ϕ Y | dy (cid:1) = ζ X ( z ) ζ Y ( z ) . The second result then follows from the relationship between the poles.Our last property tells us the behavior of RLCTs under a change of variables.Consider an ideal I ⊂ A W where W is a neighborhood of the origin. Let M be areal analytic manifold and ρ : M → W be a proper real analytic map. Then, the pullback ρ ∗ I is locally the ideal of real analytic functions on M that is generatedby f ◦ ρ for all f ∈ I (also called the inverse image ideal sheaf [19, § ρ isan isomorphism between M \ V ( ρ ∗ I ) and W \ V ( I ), we say that ρ is a change ofvariables away from V ( I ). Let | ρ ′ | denote the Jacobian determinant of ρ . Wecall ( ρ ∗ I ; ( ϕ ◦ ρ ) | ρ ′ | ) the pullback pair . Proposition 3.8.

Let W be a neighborhood of the origin and I ⊂ A W a ﬁnitelygenerated ideal. If M is a real analytic manifold, ρ : M → W is a change ofvariables away from V ( I ) and M = ρ − (Ω ∩ W ) , then RLCT Ω ( I ; ϕ ) = min x ∈ ρ − (0) RLCT M x ( ρ ∗ I ; ( ϕ ◦ ρ ) | ρ ′ | ) . Proof.

Let f , . . . , f r generate I and let f = f + · · · + f r . Then, RLCT Ω ( I ; ϕ )is the smallest pole and multiplicity of the zeta function ζ ( z ) = Z Ω f ( ω ) − z/ | ϕ ( ω ) | dω ⊂ W is a suﬃciently small neighborhood of the origin in Ω. Applyingthe change of variables ρ , we have ζ ( z ) = Z ρ − (Ω ) f ◦ ρ ( µ ) − z/ | ϕ ◦ ρ ( µ ) || ρ ′ ( µ ) | dµ. The proof of Lemma 2.4 shows that if Ω is suﬃciently small, there are ﬁnitelymany points y ∈ ρ − (0) and a cover {M y } of M = ρ − (Ω ) such that ζ ( z ) = X y Z M y f ◦ ρ ( µ ) − z/ | ϕ ◦ ρ ( µ ) || ρ ′ ( µ ) | σ y ( µ ) dµ where { σ y } is a partition of unity subordinate to {M y } . Furthermore, the f i ◦ ρ generate the pullback ρ ∗ I and f ◦ ρ = ( f ◦ ρ ) + · · · + ( f r ◦ ρ ) . Therefore,RLCT M y ( f ◦ ρ ; ( ϕ ◦ ρ ) | ρ ′ | σ y ) = RLCT M y ( ρ ∗ I ; ( ϕ ◦ ρ ) | ρ ′ | )and the result follows from the two previously displayed equations.We are now ready to prove Theorem 1.2 which was inspired by Watanabe. Proof of Theorem 1.2.

Let Q ( ω ) = P ki =1 ( p i ( ω ) − q i ) . The learning coeﬃcientis the RLCT of the Kullback-Leibler distance K ( ω ), so it is enough to show thatRLCT Ω x K = RLCT Ω x Q for each x ∈ V ( K ) = V ( Q ). By Corollary 3.5, we onlyneed to show that K and Q are equivalent in a suﬃciently small neighborhoodof x . Now, the Taylor expansion − log t = (1 − t ) + (1 − t ) + · · · implies thereare constants c , c > t near 1, c ( t − ≤ − log t + t − ≤ c ( t − . (13)Choosing a suﬃciently small W x such that p i ( ω ) /q i is near 1, we have c ( p i ( ω ) q i − ≤ − log p i ( ω ) q i + p i ( ω ) q i − ≤ c ( p i ( ω ) q i − for all ω ∈ W x . Multiplying by q i , summing from i = 1 to k and observing thatthe p i and the q i add up to 1, we get c k X i =1 q i (cid:16) p i ( ω ) q i − (cid:17) ≤ K ( ω ) ≤ c k X i =1 q i (cid:16) p i ( ω ) q i − (cid:17) . Again, using the fact that the q i are non-zero, we have c max i q i k X i =1 (cid:0) p i ( ω ) − q i (cid:1) ≤ K ( ω ) ≤ c min i q i k X i =1 (cid:0) p i ( ω ) − q i (cid:1) which completes the claim. The more general statement for a real analytic K ( ω )which is bounded by scalar multiples of a sum of squared functions follows fromProposition 2.5, Corollary 3.5 and the deﬁnition of RLCT Ω ( I ; ϕ ).16 Newton Polyhedra and Nondegeneracy

Given an analytic function f ∈ A ( R d ), we pick local coordinates { w , . . . , w d } in a neighborhood of the origin. This allows us to represent f as a power series P α c α ω α where ω = ( ω , . . . , ω d ) and each α = ( α , . . . , α d ) ∈ N d . Let [ ω α ] f denote the coeﬃcient c α of ω α in this expansion. Deﬁne its Newton polyhedron P ( f ) ⊂ R d to be the convex hull P ( f ) = conv { α + α ′ : [ ω α ] f = 0 , α ′ ∈ R d ≥ } . A subset γ ⊂ P ( f ) is a face if there exists β ∈ R d such that γ = { α ∈ P ( f ) : h α, β i ≤ h α ′ , β i for all α ′ ∈ P ( f ) } . where h , i is the standard dot product. Dually, the normal cone at γ is the set ofall β ∈ R d satisfying the above condition. Each β lies in the non-negative orthant R d ≥ because otherwise, the linear function h · , β i does not have a minimum overthe unbounded set P ( f ). As a result, the union of all the normal cones gives apartition F ( f ) of the non-negative orthant called the normal fan . Now, given acompact subset γ ⊂ R d , deﬁne the face polynomial f γ = X α ∈ γ ∩ N d c α ω α . Recall that f γ is singular at a point x ∈ R d if ord x f γ ≥

2, i.e. f γ ( x ) = ∂f γ ∂ω ( x ) = · · · = ∂f γ ∂ω d ( x ) = 0 . We say that f is nondegenerate if f γ is non-singular at all points in the torus( R ∗ ) d for all compact faces γ of P ( f ), otherwise we say f is degenerate . Now,we deﬁne the distance l of P ( f ) to be the smallest t ≥ t, t, . . . , t ) ∈P ( f ). Let the multiplicity θ of l be the codimension of the lowest-dimensionalface of P ( f ) at this intersection of the diagonal with P ( f ). However, if l = 0,we leave θ undeﬁned. These notions of nondegeneracy, distance and multiplicitywere ﬁrst coined and studied by Varchenko [25].We now extend the above notions to ideals. For any ideal I ⊂ A , deﬁne P ( I ) = conv { α ∈ R d : [ ω α ] f = 0 for some f ∈ I } . Related to this geometric construction is the monomial idealmon( I ) = h ω α : [ ω α ] f = 0 for some f ∈ I i . Note that I and mon( I ) have the same Newton polyhedron, and if I is generatedby f , . . . , f r , then mon( I ) is generated by monomials ω α appearing in the f i .One consequence is that P ( f + · · · + f r ) is the scaled polyhedron 2 P ( I ). Moreimportantly, the threshold of I is bounded by that of mon( I ). To prove thisresult, we need the following lemma. Recall that by the Hilbert Basis Theoremor by Dickson’s Lemma [10], mon( I ) is ﬁnitely generated.17 emma 4.1. Given f ∈ A ( R d ) , let S be a ﬁnite set of exponents α of mono-mials ω α which generate mon( h f i ) . Then, there is a constant c > such that | f ( ω ) | ≤ c X α ∈ S | ω | α in a suﬃciently small neighborhood of the origin.Proof. Let P α c α ω α be the power series expansion of f . Because f is analyticat the origin, there exists ε > X α | c α | ε α + ··· + α d < ∞ . Let S = { α (1) , . . . , α ( s ) } where the monomials ω α ( i ) generate mon( h f i ). Then, f ( ω ) = ω α (1) g ( ω ) + · · · + ω α ( s ) g s ( ω )for some power series g i ( ω ). Each series g i ( ω ) is absolutely convergent in the ε -neighborhood U of the origin because f is absolutely convergent in U . Thus, the g i ( ω ) are analytic. Their absolute values are bounded above by some constant c in U , and the lemma follows.Below, RLCT ( I ; ϕ ) denotes the RLCT of I with respect to ϕ at the origin. Proposition 4.2.

Let I ⊂ A be a ﬁnitely generated ideal and ϕ : R d → R benearly analytic at the origin. Then, RLCT ( I ; ϕ ) ≤ RLCT (mon( I ); ϕ ) . Proof.

Given f ∈ A ( R d ), let S be a ﬁnite set of generating exponents α formon( h f i ). By Lemma 4.1 and the Cauchy-Schwarz inequality, there exist con-stants c, c ′ > f ≤ (cid:16) c X α ∈ S | ω | α (cid:17) ≤ c ′ X α ∈ S ω α in a suﬃciently small neighborhood of the origin. Therefore, if f , . . . , f r gener-ate I , then f + . . . + f r is bounded by a constant multiple of the sum of squares ofmonomials generating mon( I ). The result now follows from Propostion 3.4.Given a compact subset γ ⊂ R d , deﬁne the face ideal I γ = h f γ : f ∈ I i . The next result tells us how to compute I γ for an ideal I = h f , . . . , f r i . Proposition 4.3.

For all compact faces γ ∈ P ( I ) , I γ = h f γ , . . . , f rγ i . roof. By deﬁnition, h f γ , . . . , f rγ i ⊂ I γ . For the other inclusion, it is enough toshow that f γ ∈ h f γ , . . . , f rγ i for all f ∈ I . First, we claim that if ω α = ω α ′ ω α ′′ with α ∈ γ and ω α ′ ∈ mon( I ), then ω α ′′ = 1. Indeed, for all β ∈ R d ≥ in thenormal cone dual to γ , we have h α, β i = h α ′ , β i + h α ′′ , β i , but h α, β i ≤ h α ′ , β i so h α ′′ , β i = 0. This implies that α ′ + kα ′′ ∈ γ for all integers k >

0. Since γ iscompact, α ′′ must be the zero vector so ω α ′′ = 1.Now, if f ∈ I , then f = h f + · · · + h r f r for some analytic functions h , . . . , h r . Clearly, f γ = ( h f ) γ + · · · + ( h r f r ) γ . By the above claim, ( h i f i ) γ = h i f iγ where h i is the constant term in h i . Hence, f γ = h f γ + · · · + h r f rγ ∈h f γ , . . . , f rγ i as required. Remark 4.4.

Let β be a vector in the normal cone dual to the face γ of P ( I ).Now, consider the weight order associated to β , and let in β f be the sum of allthe terms of f that are maximal with respect to this (partial) order [10, § β I be the initial ideal in β I = h in β f : f ∈ I i . A set of functions f , . . . , f r ∈ I is a Gr¨obner basis for I if and only ifin β I = h in β f , . . . , in β f r i . Comparing this statement with the previous result, one could ask why the gen-erators f , . . . , f r of I need not be a Gr¨obner basis for Proposition 4.3 to hold.This confusion comes from incorrectly equating the face ideal I γ with the initialideal in β I when we only have containment I γ ⊂ in β I . For instance, if I = h f , f , f i = h xy − z , xz − y , yz − x i and γ is the convex hull of { (1 , , , (1 , , , (0 , , } , then I γ = h xy, xz, yz i .Meanwhile, β = (1 , ,

1) and in β I contains y − z = zf − yf but y − z / ∈ I γ .Lastly, we give several equivalent deﬁnitions of nondegeneracy for ideals. Ifan ideal I satisﬁes these conditions, then we say that I is sos-nondegenerate ,where sos stands for sum-of-squares . Proposition 4.5.

Let I ⊂ A be an ideal. The following are equivalent:1. For some generating set { f , . . . , f r } for I , f + · · · + f r is nondegenerate.2. For all generating sets { f , . . . , f r } for I , f + · · · + f r is nondegenerate.3. For all compact faces γ ⊂ P ( I ) , the variety V ( I γ ) ⊂ R d does not intersectthe torus ( R ∗ ) d .Proof. Let f , . . . , f r generate I and let f = f + · · · + f r . If γ is a compactface of P ( I ), then the set (2 γ ) is a compact face of P ( f ) = 2 P ( I ). Furthermore, f (2 γ ) = f γ + · · · + f rγ and ∂f (2 γ ) ∂ω i = 2 f γ ∂f γ ∂ω i + · · · + 2 f rγ ∂f rγ ∂ω i . Now, f γ + · · · + f rγ = 0 if and only if f γ = · · · = f rγ = 0. It follows that f isnondegenerate if and only if V ( h f γ , . . . , f rγ i ) ∩ ( R ∗ ) d = V ( I γ ) ∩ ( R ∗ ) d = ∅ forall compact faces γ ⊂ P ( I ). This proves (1) ⇔ (3) and (2) ⇔ (3).19 emark 4.6. The nondegeneracy of a function f need not imply the sos-non-degeneracy of the ideal h f i , e.g. f = x + y . Remark 4.7.

After ﬁnishing this paper, we discovered another notion of non-degeneracy for ideals of complex formal power series due to Saia [22], which wasshown to be equivalent to the complex version of Proposition 4.5(3) [5, § σ isgenerated by vectors v , . . . , v k ∈ R d if σ = { P i λ i v i : λ i ≥ } . If σ is generatedby lattice vectors v i ∈ Z d , then σ is rational . If the origin is a face of σ , then σ is pointed . A ray is a pointed one-dimensional cone. Every rational ray hasa lattice generator of minimal length called the minimal generator , and everypointed rational polyhedral cone σ is generated by the minimal generators ofits one-dimensional faces. If these minimal generators are linearly independentover R , then σ is simplicial . A simplicial cone is smooth if its minimal generatorsalso form part of a Z -basis of Z d . A collection F of pointed rational polyhedralcones in R d is a fan if the faces of every cone in F are in F and the intersectionof any two cones in F are again in F . The support of F is the union of itscones as subsets of R d . If the support of F is the non-negative orthant, then F is locally complete . If every cone of F is simplicial (resp. smooth), then F is simplicial (resp. smooth ). A fan F is a reﬁnement of another fan F if thecones of F come from partitioning the cones of F . See [12] for more details.Given a smooth simplicial locally complete fan F , we have a smooth toricvariety P ( F ) covered by open charts U σ ≃ R d , one for each cone σ of F that ismaximal under inclusion. Furthermore, the blow-up ρ F : P ( F ) → R d is deﬁnedas follows: for each maximal cone σ of F minimally generated by v , . . . , v d with v i = ( v i , . . . , v id ), we have monomial maps ρ σ : U σ → R d on the open charts.( µ , . . . , µ d ) ( ω , . . . , ω d ) ω = µ v µ v · · · µ v d d ω = µ v µ v · · · µ v d d ... ω d = µ v d µ v d · · · µ v dd d Let v = v σ be the matrix ( v ij ) where each minimal generator v i forms a row of v . We represent the above monomial map by ω = µ v . If v i + represents the i -throw sum of v , the Jacobian determinant of this map is(det v ) µ v − · · · µ v d + − d . We are now ready to connect these concepts. The next two theorems are dueto Varchenko, see [25] and [2, § f γ = 0, but his proof [2, Lemma 8.9]actually supports the stronger notion. The set up is as follows: suppose f isanalytic in a neighborhood W of the origin. Let F be any smooth simplicialreﬁnement of the normal fan F ( f ) and ρ F be the blow-up associated to F . Set M = ρ − F ( W ). Let l be the distance of P ( f ) and θ its multiplicity.20 heorem 4.8. ( M, W, ρ F ) desingularizes f at if f is nondegenerate. Theorem 4.9.

RLCT f = (1 /l, θ ) if ( M, W, ρ F ) desingularizes f at . We extend Theorem 4.9 to compute RLCT ( f ; ω τ ) for monomials ω τ . Givena polyhedron P ( f ) ⊂ R d and a vector τ = ( τ , . . . , τ d ) of non-negative integers,let the τ -distance l τ be the smallest t ≥ t ( τ + 1 , . . . , τ d + 1) ∈ P ( f )and let the multiplicity θ τ be the codimension of the face at this intersection. Theorem 4.10.

RLCT ( f ; ω τ ) = (1 /l τ , θ τ ) if ( M, W, ρ F ) desingularizes f at .Proof. We follow roughly the proof in [2, §

8] of Theorem 4.9. Let σ be a maximalcone of F . Because F reﬁnes F ( f ), σ is a subset of some maximal cone σ ′ of F ( f ). Let α ∈ R d be the vertex of P ( f ) dual to σ ′ . Let v be the matrix whoserows are minimal generators of σ and ρ the monomial map µ µ v . Under thismap, the term c α w α in f becomes the leading monomial, so f ( ρ ( µ )) = g ( µ ) µ vα for some function g satisfying g ( µ ) = 0 for all µ ∈ U σ . Then, | f ( ω ) | − z | ω τ | dω = | f ( ρ ( ν )) | − z | ρ ( µ ) | τ | ρ ′ ( µ ) | dµ = (det v ) | g ( µ ) | − z | µ | − vαz | µ vτ µ v − · · · µ v d + − d | Thus, for the cone σ ,( λ σ , θ σ ) = (min S, S ) , S = n v i · ( τ + 1) v i · α : 1 ≤ i ≤ d o where τ +1 = ( τ +1 , . . . , τ d +1). We now give an interpretation for the elementsof S . Fixing i , let P be the aﬃne hyperplane normal to v i passing through α .Then, ( v i · α ) / ( v i · ( τ + 1)) is the distance of P from the origin along the ray { t ( τ + 1) : t ≥ } . Since RLCT ( f ; ω τ ) = min σ ( λ σ , θ σ ), the result follows. Remark 4.11.

After ﬁnishing this paper, the author discovered that a similarresult was proved by Vasil’ev [26] for complex analytic functions.Monomial ideals play in special role in the theory of real log canonical thresh-olds of ideals. The proof of this next result is due to Piotr Zwiernik.

Proposition 4.12.

Monomial ideals are sos-nondegenerate.Proof.

Let f = f + · · · + f r where f , . . . , f r are monomials generating I . Foreach face γ of P ( I ), f γ is also a sum of squares of monomials, so f γ does nothave any zeros in ( R ∗ ) d and the result now follows from Proposition 4.5(3).Our tools now allow us to prove Theorem 1.3. As a special case, we have aformula for the RLCT of a monomial ideal with respect to a monomial ampli-tude function. The analogous formula for complex log canonical thresholds ofmonomial ideals was discovered and proved by Howald [17]. Proof of Theorem 1.3.

If the ideal I is sos-nondegenerate, then the equality fol-lows from Proposition 4.5, Theorem 4.8 and Theorem 4.10. For all other ideals,the inequality is the result of Proposition 4.2 and Proposition 4.12.21 emark 4.13. Deﬁne the principal part f P of f to be P α c α ω α where the sumis over all α lying in some compact face γ of P ( f ). The above theorems implythat if f is nondegenerate, then RLCT f = RLCT f P . However, the latter isnot true in general. For instance, if f = ( x + y ) + y , then f P = ( x + y ) butRLCT f = (3 / ,

1) and RLCT f P = (1 / , Corollary 4.14. If f ∈ A ( R d ) has a local minimum at the origin with f (0) = 0 and its Hessian ( ∂ f /∂ω i ∂ω j ) is full rank, then RLCT f = ( d/ , .Proof. Because its Hessian is full rank, there is a linear change of variables suchthat f = ω + · · · + ω d + O ( ω ). Thus, f is nondegenerate and the Newtonpolyhedron P ( f ) has distance l = 2 /d with θ = 1. Corollary 4.15.

Let I be the ideal h f , . . . , f s i , and suppose the Jacobian matrix ( ∂f i /∂ω j ) has rank r at . Then, RLCT I ≤ ( ( r + d ) , .Proof. Because the rank of ( ∂f i /∂ω j ) is r , there is a linear change of variablessuch that the only linear monomials appearing in I are ω , . . . , ω r . It followsthat P ( I ) lies in the halfspace α + · · · + α r + ( α r +1 + · · · + α d ) ≥ / ( r + d − r ) = 2 / ( r + d ). In this section, we use our tools to compute the learning coeﬃcients of a na¨ıveBayesian network M with two ternary random variables and two hidden states.It was designed by Evans, Gilula and Guttman [11] for investigating connectionsbetween the recovery time of 132 schizophrenic patients and the frequency ofvisits by their relatives. Their data is summarized in the 3 × ≤ Y <

10 10 ≤ Y <

20 20 ≤ Y Totals

Visited regularly 43 16 3 Visited rarely 6 11 10 Visited never 9 18 16 which we store as a 3 × q of relative frequencies. The model is given by p : Ω = ∆ × ∆ × ∆ × ∆ × ∆ → ∆ ω = ( t, a , a , b , b , c , c , d , d ) ( p ij ) p ij = ta i b j + (1 − t ) c i d j , i, j ∈ { , , } where a = 1 − a − a , a = ( a , a , a ) ∈ ∆ and similarly for b, c and d . Hence, a3 × I = Z Ω p p p p p p p p p dω which was computed exactly by Sturmfels, Xu and the author [21].We now estimate this integral using Watanabe’s asymptotic formula for thelog likelihood integral in Theorem 1.1. We assume that the data ˆ q was generatedby some true distribution q = ( q ij ) ∈ R × in the model. Ideally, we want q tobe equal to the matrix ˆ q of relative frequencies, but in general, the data ˆ q rarelylies in the model. In this example, the matrix ˆ q is not in the model because itis full rank. However, we should be able to ﬁnd a distribution q in the modelthat is close to ˆ q , because in practice, we want to study models which describethe data well. A good candidate for q is the maximum likelihood distribution.Using the EM algorithm, this distribution is q = 1132  . . . . . . . . .  which comes from the maximum likelihood estimate t = 0 . a , a ) = (0 . , . , ( b , b ) = (0 . , . , ( c , c ) = (0 . , . , ( d , d ) = (0 . , . . Note that the ML distribution q is indeed very close to the data ˆ q .Our next theorem summarizes how the asymptotics of log Z ( N ) depend on q .Let S i denote the set of rank i matrices in p (Ω), and S ∗ i ⊂ S i be the matrices withpositive entries. Before we prove this theorem, let us apply it to our statisticalproblem. Using the exact value of I computed by Lin–Sturmfels–Xu [21], we get( log I ) exact = − . . Meanwhile, if the BIC was erroneously applied with the dimension d = 9 of theparameter space, we would have( log I ) BIC = − . . On the other hand, by calculating the real log canonical threshold of the poly-nomial ideal h p ( ω ) − q i , we ﬁnd that the learning coeﬃcient of the model at theML distribution q is ( λ, θ ) = (7 / , I ) RLCT ≈ − . I .Our proposal to use the ML distribution as the true distribution q is admit-tedly simplistic, given that noise in the data will almost surely bring us to some q ∈ S ∗ . Nonetheless, our next theorem proves that the learning coeﬃcient isalways smaller than the (9 / ,

1) prescribed by the BIC. For deeper statisticaldiscussions, the reader should turn to Drton and Plummer [8] where they ad-dressed the paradox of circular reasoning in requiring true parameter values forthe asymptotic approximation of Bayesian integrals. They also proposed a novelalgorithm where the marginal likelihood is estimated as a weighted average ofthe contributions from all true distributions. We hope that mathematical anal-yses such as our next theorem will help inform these kinds of discussions, andprovide useful estimates and bounds for a variety of statistical computations.

Theorem 5.1.

The learning coeﬃcient ( λ, θ ) of the model at q > is given by ( λ, θ ) = (cid:26) (5 / , if q ∈ S ∗ , (7 / , if q ∈ S ∗ . Therefore, asymptotically as N → ∞ , log Z ( N ) = N X i,j ˆ q ij log q ij − λ log N + ( θ −

1) log log N + η N where ˆ q is the matrix of relative frequencies of the data and η N is a randomvariable whose expectation E [ η N ] converges to a constant. We postpone the proof of this theorem to the end of the section. Let us beginwith a few remarks about our approach to this problem. Firstly, Theorem 1.2states that the learning coeﬃcient ( λ, θ ) of the statistical model is given by(2 λ, θ ) = min ω ∗ ∈V RLCT Ω ω ∗ h p ( ω ) − q i where V is the ﬁber p − ( q ) = { ω ∈ Ω : p ( ω ) = q } over q . Instead of focusing ona ﬁxed q and its ﬁber V , let us vary the parameter ω ∗ over all of Ω. For each ω ∗ ∈ Ω, we translate Ω so that ω ∗ is the origin and compute the RLCT of theideal h p ( ω + ω ∗ ) − p ( ω ∗ ) i . This is the content of Proposition 5.2. The proof ofTheorem 5.1 will then consist of minimizing these RLCTs over the ﬁber V foreach q in the model.Secondly, in our computations, we will often be choosing diﬀerent generatorsfor our ideal and making appropriate changes of variables. Generators with fewterms and small total degree are often highly desired. Another useful trick is tomultiply or divide the generators by functions f ( ω ) satisfying f (0) = 0. Suchfunctions are units in the ring A of real analytic functions so this multiplicationor division will not change the ideal generated.We will perform many of the computations by hand to demonstrate how thevarious properties from Section 3 can be applied. At points in the proof whereRLCTs of monomial ideals are required, the Singular library from Section 124omes in useful. We hope that some day the computation of learning coeﬃcientsfor statistical models will be fully automated.Thirdly, for the full proof of Proposition 5.2, we will have to analyze interac-tions between the model singularities and the boundary of the parameter space.Some of these interactions are messy. To improve the readability of the paper,we moved the detailed proof to the appendix while retaining some interestingcomputations in this section.Finally, we come to our main proposition. Let us deﬁne the following subsetsof Ω. These subsets stratify Ω according to the real log canonical threshold inthe manner described in Conjecture 2.9.Ω u = { ω ∗ ∈ Ω : t ∗ ∈ { , }} Ω m = { ω ∗ ∈ Ω : t ∗ / ∈ { , }} Ω m = { ω ∗ ∈ Ω m : a ∗ = c ∗ , b ∗ = d ∗ } Ω m kl = { ω ∗ ∈ Ω m : { i : a ∗ i = 0 } = k, { i : b ∗ i = 0 } = l } Ω m = { ω ∗ ∈ Ω m : ( b ∗ = d ∗ , a ∗ = c ∗ ) or ( a ∗ = c ∗ , b ∗ = d ∗ ) } Ω m = { ω ∗ ∈ Ω m : ( a ∗ = c ∗ , ∃ i a ∗ i = 0) or ( b ∗ = d ∗ , ∃ i b ∗ i = 0) } Ω m = { ω ∗ ∈ Ω m : a ∗ = c ∗ , b ∗ = d ∗ } Ω m ad = { ω ∗ ∈ Ω m : ∃ i, j a ∗ i = d ∗ j = 0 , c ∗ i = 0 , b ∗ j = 0 } Ω m bc = { ω ∗ ∈ Ω m : ∃ i, j b ∗ i = c ∗ j = 0 , d ∗ i = 0 , a ∗ j = 0 } Ω m = Ω m ad ∪ Ω m bc Ω m = Ω m ad ∩ Ω m bc . Proposition 5.2.

Given ω ∗ ∈ Ω , let I be the ideal h p ( ω + ω ∗ ) − p ( ω ∗ ) i . Then, RLCT I =  (5 , if ω ∗ ∈ Ω u , (6 , if ω ∗ ∈ Ω m , (6 , if ω ∗ ∈ Ω m ∪ Ω m ∪ Ω m ∪ Ω m , (7 , if ω ∗ ∈ Ω m , (7 , if ω ∗ ∈ Ω m ∪ Ω m , (8 , if ω ∗ ∈ Ω m , (6 , if ω ∗ ∈ Ω m \ Ω m , (7 , if ω ∗ ∈ Ω m , (7 , if ω ∗ ∈ Ω m \ Ω m , (8 , if ω ∗ ∈ Ω m \ Ω m , (9 , if ω ∗ ∈ Ω m . Proof Idea.

We give a shortened analysis that ignores the eﬀect of the boundaryof Ω on the RLCTs. The derived RLCTs will be smaller than the actual ones byProposition 3.3. A full proof involving boundary eﬀects is given in the appendix.Our ideal I is generated by g ij = f ij ( ω + ω ∗ ) − f ij ( ω ∗ ) where f ij = ta i b j + (1 − t ) c i d j , i, j ∈ { , , } and a = b = c = d = 1. One can check that I is also generated by g , g ,25 , g , and g ij − ( d j + d ∗ j ) g i − ( a i + a ∗ i ) g j , i, j ∈ { , } which expand to give c ( t ∗ − t ) + a ( t ∗ + t ) + tu ∗ c ( t ∗ − t ) + a ( t ∗ + t ) + tu ∗ d ( t ∗ − t ) + b ( t ∗ + t ) + tv ∗ d ( t ∗ − t ) + b ( t ∗ + t ) + tv ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ where t ∗ = t ∗ , t ∗ = 1 − t ∗ , u ∗ i = a ∗ i − c ∗ i , v ∗ i = b ∗ i − d ∗ i . Note that P ( a i + a ∗ i ) = 1and P a ∗ i = 1 so P a i = 0 and similarly for b, c, d . Also, P u ∗ i = P a ∗ i − c ∗ i = 0.The same is true for v ∗ . We now do a case-by-case analysis. Case 1: ω ∗ ∈ Ω m . This implies t ∗ = 0 and t ∗ = 0. Since the indeterminates b , b , c , c appearonly in the ﬁrst four polynomials, this suggests the change of variables c i = ( c ′ i − tu ∗ i − a i ( t ∗ + t )) / ( t ∗ − t ) , i = 1 , b i = ( b ′ i − tv ∗ i − d i ( t ∗ − t )) / ( t ∗ + t ) , i = 1 , t, a , a , b ′ , b ′ , c ′ , c ′ , d , d . In view of Proposition 3.8,the Jacobian determinant of this substitution is a constant, while the pullbackideal can be written as I + I where I = h b ′ , b ′ , c ′ , c ′ i and I is generated by a d − a t ∗ v ∗ + d t ∗ u ∗ ,a d − a t ∗ v ∗ + d t ∗ u ∗ ,a d − a t ∗ v ∗ + d t ∗ u ∗ ,a d − a t ∗ v ∗ + d t ∗ u ∗ . The indeterminates in I and I are disjoint, so we may apply Proposition 3.7.The RLCT of I is (4 , I . Case 1.1: ω ∗ ∈ Ω m . This implies u ∗ = 0 , v ∗ = 0 or u ∗ = 0 , v ∗ = 0. Without loss of generality, weassume v ∗ = 0 , u ∗ = 0 , u ∗ = 0 ( u ∗ + u ∗ + u ∗ = 0 so at most one of them is zero)and substitute d i = ( d ′ i + a t ∗ v ∗ i ) / ( t ∗ u ∗ + a ) , i = 1 , . The resulting pullback of I is h d ′ , d ′ i . If ω ∗ lies in the interior of Ω, we use eitherNewton polyhedra or Proposition 3.7 to show that the RLCT of this monomialideal is (2 , ω ∗ lies on the boundary of Ω, the situation is more complicatedand we analyze it in detail in the appendix. Case 1.2: ω ∗ ∈ Ω m . u ∗ = 0 , v ∗ = 0. Without loss of generality, suppose that u ∗ = 0.If ω ∗ ∈ Ω m , we further assume that a ∗ = d ∗ j = 0 , u ∗ = 0 , v ∗ j = 0. Substituting d i = ( d ′ i + a t ∗ v ∗ i ) / ( a + t ∗ u ∗ ) , i = 1 , a = ( a ′ + a u ∗ ) /u ∗ , the pullback ideal is h a ′ , d ′ , d ′ i so the RLCT at an interior point is (3 , Case 1.3: ω ∗ ∈ Ω m . This implies u ∗ i = v ∗ i = 0 for all i . The pullback ideal can be written as h a , a ih d , d i whose RLCT over an interior point of Ω is (2 ,

2) by Proposition 3.7.

Case 2: ω ∗ ∈ Ω u . Without loss of generality, assume t ∗ = 0 and substitute c i = ( c ′ i − t ( a i + u ∗ i )) / (1 − t ) i = 1 , d i = ( d ′ i − t ( b i + v ∗ i )) / (1 − t ) i = 1 , . The pullback ideal is the sum of h c ′ , c ′ , d ′ , d ′ i and h t ih a + u ∗ , a + u ∗ ih b + v ∗ , b + v ∗ i . The RLCT of the ﬁrst summand is (4 , h t i is (1 ,

1) while that of h a + u ∗ , a + u ∗ i and h b + v ∗ , b + v ∗ i are at least (2 ,

1) each. By Proposition 3.7,the RLCT of their product is (1 ,

1) and that of the pullback ideal is (5 , Proof of Theorem 5.1.

Given a matrix q = ( q ij ), the learning coeﬃcient ( λ, θ )of the model at q is the minimum of RLCTs at points ω ∗ ∈ Ω where p ( ω ∗ ) = q .The theorem then follows from Proposition 5.2, Theorem 1.1 and the claims p (Ω u ) = S , p (Ω m ) ⊂ S , p (Ω m ) ⊂ S , p (Ω m ) / ∈ S ∗ . These four claims are easy to check from the deﬁnitions of the subsets of Ω.

In this section, we give a full proof of Proposition 5.2 that considers the eﬀect ofthe boundary of the parameter space Ω on the RLCTs. The next lemma comesin handy in dealing with boundary issues. It helps us in computing the RLCTsof monomial ideals at boundary points where the parameter space contains anice neighborhood Ω × Ω . Here, Ω is an orthant in the coordinates involvedin the monomials of I , while Ω is a small cone in the remaining coordinates.27 emma 6.1. Let Ω ⊂ { ( x , . . . , x d ) ∈ R d } be semianalytic. Let I be a monomialideal and ϕ a monomial function in x , . . . , x r . If there exists a vector ξ ∈ R d − r such that Ω × Ω ⊂ Ω for suﬃciently small ε , Ω = { ( x , . . . , x r ) ∈ [0 , ε ] r } Ω = { ( x r +1 , . . . , x d ) = t ( ξ + ξ ′ ) for t ∈ [0 , ε ] , ξ ′ ∈ [ − ε, ε ] d − r } , then RLCT Ω ( I ; ϕ ) = RLCT ( I ; ϕ ) .Proof. Because I and | ϕ | remain unchanged by the ﬂipping of signs of x , . . . , x r ,their threshold does not depend on the choice of orthant, so RLCT Ω ( I ; ϕ ) =RLCT ( I ; ϕ ). The lemma now follows from Proposition 3.7 and the fact thatthe threshold of the zero ideal over the cone neighborhood Ω is ( ∞ , − ). Detailed Proof of Proposition 5.2.

Recall that the ideal I is generated by c ( t ∗ − t ) + a ( t ∗ + t ) + tu ∗ c ( t ∗ − t ) + a ( t ∗ + t ) + tu ∗ d ( t ∗ − t ) + b ( t ∗ + t ) + tv ∗ d ( t ∗ − t ) + b ( t ∗ + t ) + tv ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ a d − a t ∗ v ∗ + d t ∗ u ∗ We do a case-by-case analysis of the structure of I and the boundary of Ω. Case 1: ω ∗ ∈ Ω m . This implies t ∗ = 0 and t ∗ = 0. Since the indeterminates b , b , c , c appearonly in the ﬁrst four polynomials, this suggests the change of variables c i = ( c ′ i − tu ∗ i − a i ( t ∗ + t )) / ( t ∗ − t ) , i = 1 , b i = ( b ′ i − tv ∗ i − d i ( t ∗ − t )) / ( t ∗ + t ) , i = 1 , t, a , a , b ′ , b ′ , c ′ , c ′ , d , d . In view of Proposition 3.8,the Jacobian determinant of this substitution is a constant. Case 1.1: ω ∗ ∈ Ω m . This implies u ∗ = 0 , v ∗ = 0 or u ∗ = 0 , v ∗ = 0. Without loss of generality, weassume v ∗ = 0 , u ∗ > d i = ( d ′ i + a t ∗ v ∗ i ) / ( t ∗ u ∗ + a ) , i = 1 , . The resulting pullback ideal is h b ′ , b ′ , c ′ , c ′ , d ′ , d ′ i . If ω ∗ lies in the interior ofΩ, we use either Newton polyhedra or Proposition 3.7 to show that the RLCTof this monomial ideal is (6 , ω ∗ lies on the boundary of Ω, the situationis more complicated. Since we are considering a subset of a neighborhood of ω ∗ , the corresponding Laplace integral from Proposition 3.2a is smaller so the28hreshold is at least (6 , − u ∗ = u ∗ + u ∗ , we cannot have u ∗ = u ∗ = 0. Suppose u ∗ = 0 and u ∗ = 0. We consider a blowup where one of the charts is given by the monomialmap t = s, a i = sa ′ i , c ′ = rs, c ′ = rsc ′′ , b ′ i = rsb ′′ i , d ′ i = rsd ′′ i . Here, the pullbackpair is ( h rs i ; r s ). Now, we study the inequalities which are active at ω ∗ . Forinstance, if b ∗ = 0, then ω ∗ lies on the boundary deﬁned by 0 ≤ b + b ∗ . Afterthe various changes of variables, the inequalities are as shown below, where b ′′ = − b ′′ − b ′′ and similarly for c ′′ , d ′′ and a ′ . Note that the inequality for a ∗ = 0 is omitted because a ∗ = 0 implies u ∗ = − c ∗ ≤

0. Similar conditions onthe u ∗ i , v ∗ i hold for the other inequalities. b ∗ i = 0 : 0 ≤ rs ( b ′′ i − d ′′ i ( t ∗ − s ) / ( t ∗ u ∗ + sa ′ )) / ( t ∗ + s ) d ∗ i = 0 : 0 ≤ rsd ′′ i / ( t ∗ u ∗ + sa ′ ) c ∗ = 0 : 0 ≤ s ( − u ∗ + a ′ ( t ∗ + s ) + r ) / ( t ∗ − s ) c ∗ = 0 : 0 ≤ s ( − u ∗ + a ′ ( t ∗ + s ) + rc ′′ ) / ( t ∗ − s ) u ∗ > c ∗ = 0 : 0 ≤ s ( − u ∗ + a ′ ( t ∗ + s ) − r − rc ′′ ) / ( t ∗ − s ) u ∗ > a ∗ = 0 : 0 ≤ sa ′ u ∗ < a ∗ = 0 : 0 ≤ sa ′ u ∗ < b ∗ = b ∗ = 0, we choose coordinates b ′′ and b ′′ and set b ′′ = − b ′′ − b ′′ . The sameis done for the d ′′ i . The pullback pair is unchanged by these choices. Now, withcoordinates ( r, s ) and ( b ′′ i , b ′′ i , d ′′ j , d ′′ j , c ′′ , a ′ , a ′ ), we apply the lemma with thevector ξ = (2 , , u ∗ , u ∗ , , , ( rs ; r s ) = (6 , u ∗ , u ∗ is zero, suppose u ∗ = 0 , u ∗ = 0 without loss ofgenerality. If a ∗ = c ∗ = 0, then the arguments of the previous paragraph showthat the RLCT is again (6 , a ∗ = c ∗ = 0, we blow up the origin in R andconsider the chart where a = s, c ′ i = sc ′′ i , b ′ i = sb ′′ i , d ′ i = sd ′′ i . The pullback pairis ( h sb ′′ , sb ′′ , sc ′′ , sc ′′ , sd ′′ , sd ′′ i ; s ). The active inequalities for a ∗ = c ∗ = 0 are c ∗ = 0 : 0 ≤ s ( c ′′ − t ∗ + t ) / ( t ∗ − t ) a ∗ = 0 : 0 ≤ s. Near the origin in ( s, b ′′ , b ′′ , c ′′ , c ′′ , d ′′ , d ′′ ) ∈ R , these inequalities imply s = 0so the new region M deﬁned by the active inequalities is not full at the origin.Thus, we can ignore the origin in computing the RLCT. All other points on theexceptional divisor of this blowup lie on some other chart of the blowup wherethe pullback pair is ( s ; s ), so the RLCT is at least (7 , c = s, c = sc ′′ , a = sa ′ , b ′ i = sb ′′ i , d ′ i = sd ′′ i , we have the active inequalitiesbelow. Note that c ∗ = 0 because u ∗ = − u ∗ < b ∗ i = 0 : 0 ≤ s ( b ′′ i − d ′′ i ( t ∗ − t ) / ( t ∗ u ∗ − ( sa ′ + a )) / ( t ∗ + t ) d ∗ i = 0 : 0 ≤ sd ′′ i / ( t ∗ u ∗ − ( sa ′ + a )) c ∗ = 0 : 0 ≤ ( sc ′′ − tu ∗ + ( sa ′ + a )( t ∗ + t )) / ( t ∗ − t ) c ∗ = 0 : 0 ≤ s (1 − a ′ ( t ∗ + t )) / ( t ∗ − t ) a ∗ = 0 : 0 ≤ sa ′ a ∗ = 0 : 0 ≤ a b ′′ i and d ′′ i , we ﬁnd that the RLCTis (7 ,

1) by using Lemma 6.1 with ξ = (2 , , u ∗ , u ∗ , , , , −

1) in coordinates( b ′′ i , b ′′ i , d ′′ j , d ′′ j , a ′ , a , c ′′ , t ). Case 1.2: ω ∗ ∈ Ω m . This implies u ∗ = 0 , v ∗ = 0. Without loss of generality, suppose that u ∗ = 0.If ω ∗ ∈ Ω m , we further assume that a ∗ = d ∗ j = 0 , u ∗ = 0 , v ∗ j = 0. Substituting d i = ( d ′ i + a t ∗ v ∗ i ) / ( a + t ∗ u ∗ ) , i = 1 , a = ( a ′ + a u ∗ ) /u ∗ , the pullback ideal is h a ′ , b ′ , b ′ , c ′ , c ′ , d ′ , d ′ i so the RLCT is at least (7 , a i = ( a ′ w ∗ i + a u ∗ i ) /u ∗ for i = 1 , , w ∗ i = 0 , , − ω ∗ is not in Ω m , we consider the blowup chart a ′ = s, b ′ i = sb ′′ i , c ′ i = sc ′′ i , d ′ i = sd ′′ i .The active inequalities are as follows. The symbol v − denotes v ∗ i ≤ b ∗ i = 0 : 0 ≤ [ sb ′′ i − tv ∗ i − ( sd ′′ i + a t ∗ v ∗ i )( t ∗ − t ) / ( t ∗ u ∗ + a )] / ( t ∗ + t ) v − c ∗ i = 0 : 0 ≤ [ sc ′′ i − tu ∗ i − ( sw ∗ i + a u ∗ i )( t ∗ + t ) /u ∗ ] / ( t ∗ − t ) u + a ∗ i = 0 : 0 ≤ ( sw ∗ i + a u ∗ i ) /u ∗ u − d ∗ i = 0 : 0 ≤ ( sd ′′ i + a t ∗ v ∗ i ) / ( t ∗ u ∗ + a ) v +The crux to understanding the inequalities is this: if a ∗ i = d ∗ j = 0 , u ∗ i = 0 , v ∗ j = 0,the coeﬃcient of a appears with diﬀerent signs in the inequalities for a ∗ i = 0and d ∗ j = 0. This makes it diﬃcult to choose a suitable vector ξ for Lemma 6.1.Similarly, if b ∗ i = c ∗ j = 0 , v ∗ i = 0 , u ∗ j = 0, the coeﬃcient of u ∗ t + t ∗ a appears withdiﬀerent signs. Fortunately, since ω ∗ / ∈ Ω m , we do not have such obstructionsand it is an easy exercise to ﬁnd the vector ξ . Thus, the RLCT is (7 , ω ∗ ∈ Ω m \ Ω m , we blow up a = s, a ′ = sa ′′ , b ′ i = sb ′′ i , c i = sc ′′ i , d i = sd ′′ i . The active inequalities for a ∗ = d ∗ j = 0 imply that the new region M isnot full at the origin of this chart. Thus, we shift our focus to the other chartsof the blowup where the pullback pair is ( s ; s ), so the RLCT is at least (8 , a ′ = s, a = sa ′ , b ′ i = sb ′′ i , c i = sc ′′ i , d i = sd ′′ i , we do not haveobstructions coming from any b ∗ i = c ∗ j = 0 , v ∗ i = 0 , u ∗ j = 0 so it is again easy toﬁnd the vector ξ for Lemma 6.1. The threshold is exactly (8 , ω ∗ ∈ Ω m , consider the following two charts out of the nine charts in theblowup of the origin in R .Chart 1: a = s, t = st ′ , a ′ = sa ′′ , b ′ i = sb ′′ i , c i = sc ′′ i , d i = sd ′′ i Chart 2: t = s, a = sa ′ , a ′ = sa ′′ , b ′ i = sb ′′ i , c i = sc ′′ i , d i = sd ′′ i The inequalities for a ∗ i = d ∗ j = 0 , u ∗ i = 0 , v ∗ j = 0 and b ∗ i = c ∗ j = 0 , v ∗ i = 0 , u ∗ j = 0imply that the new region M is not full at points outside of the other sevencharts, so we may ignore these two charts in computing the RLCT. Indeed, forChart 1, the active inequalities a ∗ i = 0 : 0 ≤ s ( a ′′ w ∗ i + u ∗ i ) /u ∗ u − d ∗ i = 0 : 0 ≤ s ( d ′′ i + t ∗ v ∗ i ) / ( t ∗ u ∗ + s ) v +30ell us that a ′′ or d ′′ must be non-zero for M to be full. In Chart 2, suppose M is full at some point x where a ′′ = b ′′ = b ′′ = c ′′ = c ′′ = d ′′ = d ′′ = 0. Then, a ∗ i = 0 : 0 ≤ s ( a ′′ w ∗ i + a ′ u ∗ i ) /u ∗ u − d ∗ i = 0 : 0 ≤ s ( d ′′ i + a ′ t ∗ v ∗ i ) / ( t ∗ u ∗ + sa ′ ) v +imply that a ′ = 0 at x . However, if this is the case, the inequalities b ∗ i = 0 : 0 ≤ s [ b ′′ i − v ∗ i − ( d ′′ i + a ′ t ∗ v ∗ i )( t ∗ − s ) / ( t ∗ u ∗ + sa ′ )] / ( t ∗ + s ) v − c ∗ i = 0 : 0 ≤ s [ c ′′ i − u ∗ i − ( a ′′ w ∗ i + a ′ u ∗ i )( t ∗ + s ) /u ∗ ] / ( t ∗ − s ) u +forces b ′′ i or c ′′ i to be non-zero for some i , a contradiction. Thus, we shift our focusto the other seven charts where the pullback pair is ( s ; s ) and the RLCT is atleast (9 , a ′ = s, a = sa ′ , t = st ′ , b ′ i = sb ′′ i , c ′ i = sc ′′ i , d ′ i = sd ′′ i ,note that we cannot have both a ∗ = 0 and a ∗ = 0 because we assumed a ∗ = 0.It is now easy to ﬁnd the vector ξ for Lemma 6.1, so the threshold is (9 , Case 1.3: ω ∗ ∈ Ω m . This implies u ∗ i = v ∗ i = 0 for all i . The pullback ideal can be written as h b ′ , b ′ , c ′ , c ′ i + h a , a ih d , d i whose RLCT over an interior point of Ω is (6 ,

2) by Proposition 3.7. This occursin Ω m where none of the inequalities are active. Now, suppose the only activeinequalities come from a ∗ = c ∗ = 0. We blow up the origin in { ( a , c ′ ) ∈ R } . Inthe chart given by a = a ′ , c ′ = a ′ c ′′ , the new region M is not full at the origin,so we only need to study the chart where c ′ = c ′′ , a = c ′′ a ′ . The pullback pairbecomes ( h c ′′ i + h b ′ , b ′ , c ′ i + h a ih d , d i ; c ′′ ), and a simple application of Lemma6.1 and Proposition 3.7 shows that the threshold is (6 , − ( h b ′ , b ′ , c ′ , c ′ i + h a , a ih d , d i ; 1) (6 , a ∗ = 0 ( h b ′ , b ′ , c ′′ , c ′ i + h a ih d , d i ; c ′′ ) (6 , a ∗ = 0 , b ∗ = 0 ( h b ′′ , b ′ , c ′′ , c ′ i + h a ih d i ; b ′′ c ′′ ) (7 , a ∗ = 1 ( h b ′ , b ′ , c ′′ , c ′′ i ; c ′′ c ′′ ) (6 , a ∗ = 1 , b ∗ = 0 ( h b ′′ , b ′ , c ′′ , c ′′ i ; b ′′ c ′′ c ′′ ) (7 , a ∗ = 1 , b ∗ = 1 ( h b ′′ , b ′′ , c ′′ , c ′′ i ; b ′′ b ′′ c ′′ c ′′ ) (8 , a ∗ = c ∗ = 1 corresponds to a ∗ = a ∗ = c ∗ = c ∗ = 0. Here,we blow up the origins in { ( a , c ′ ) ∈ R } and { ( a , c ′ ) ∈ R } . As before, we canignore the other charts and just consider the one where a = c ′′ a ′ , c ′ = c ′′ , a = c ′′ a ′ , c ′ = c ′′ . The pullback pair is ( h c ′′ i + h c ′′ i + h b ′ , b ′ i , c ′′ c ′′ ). If b ∗ i = 0 for all i , the RLCT is (6 ,

1) by Lemma 6.1 and Proposition 3.7.

Case 2: ω ∗ ∈ Ω u . t ∗ = 0 and substitute c i = ( c ′ i − t ( a i + u ∗ i )) / (1 − t ) i = 1 , d i = ( d ′ i − t ( b i + v ∗ i )) / (1 − t ) i = 1 , . The pullback ideal is the sum of h c ′ , c ′ , d ′ , d ′ i and h t ih a + u ∗ , a + u ∗ ih b + v ∗ , b + v ∗ i . Since c ′ = − c ′ − c ′ and similarly for the d ′ i , a i , b i , u ∗ i and v ∗ i , it is useful to writethis ideal more symmetrically as the sum of h c ′ , c ′ , c ′ i , h d ′ , d ′ , d ′ i and h t ih a + u ∗ , a + u ∗ , a + u ∗ ih b + v ∗ , b + v ∗ , b + v ∗ i . Meanwhile, the inequalities are a ∗ i = 0 : 0 ≤ a i c ∗ i = 0 : 0 ≤ ( c ′ i − t ( a i + u ∗ i )) / (1 − t ) u ∗ i ≥ b ∗ j = 0 : 0 ≤ b j d ∗ j = 0 : 0 ≤ ( d ′ j − t ( b j + v ∗ j )) / (1 − t ) v ∗ j ≥ . We now relabel the indices of the a i and c ′ i , without changing the b j and d ′ j , sothat the active inequalities are among those from a ∗ = 0 , a ∗ = 0 , c ∗ i = 0 , c ∗ i = 0.The b j and d ′ j are thereafter also relabeled so that the inequalities come from b ∗ = 0 , b ∗ = 0 , d ∗ j = 0 , d ∗ j = 0. We claim that the new region M contains, forsmall ε , the orthant neighborhood { ( a , a , b , b , c i , c i , d j , d j , − t ) ∈ [0 , ε ] } . Indeed, the only problematic inequalities are c ∗ = 0 : 0 ≤ ( c ′ − t ( − a − a + u ∗ i )) / (1 − t ) u ∗ = 0 d ∗ = 0 : 0 ≤ ( d ′ − t ( − b − b + v ∗ j )) / (1 − t ) v ∗ = 0 . However, these inequalities cannot occur because for instance, u ∗ = 0 and c ∗ = 0implies a ∗ = 0, a contradiction since the a i were relabeled to avoid this. Finally,the threshold of h t i is (1 ,

1) while that of h a + u ∗ , a + u ∗ i and h b + v ∗ , b + v ∗ i are at least (2 ,

1) each. By Proposition 3.7, the RLCT of their product is (1 , , Acknowledgements.

The author wishes to thank Christine Berkesch, MathiasDrton, Anton Leykin, Bernd Sturmfels, Zach Teitler, Sumio Watanabe and PiotrZwiernik, as well as the anonymous reviewers for their many useful suggestions,discussions and corrections. 32 eferences [1] M. Aoyagi and S. Watanabe: Stochastic complexities of reduced rank re-gression in Bayesian estimation,

Neural Networks (2005) 924–933.[2] V. I. Arnol’d, S. M. Guse˘ın-Zade and A. N. Varchenko: Singularities ofDiﬀerentiable Maps , Vol. II, Birkh¨auser, Boston, 1985.[3] J. Bertrand, P. Bertrand and J. Ovarlez: The Mellin Transform, in

TheTransforms and Applications Handbook: Second Edition , Chapter 12, Ed.A. D. Poularikas, CRC Press, Boca Raton, 2010.[4] E. Bierstone and P. D. Milman: Resolution of singularities,

Several complexvariables , MSRI Publications (1999) 43–78.[5] C. Bivi`a-Ausina: Nondegenerate ideals in formal power series rings, RockyMountain J. Math. (2004) 495–511.[6] M. Blickle and R. Lazarsfeld: An informal introduction to multiplier ideals, Trends in Commutative Algebra , MSRI Publications (2004) 87–114.[7] A. Bravo, S. Encinas and O. Villamayor: A simpliﬁed proof of desingulari-sation and applications, Rev. Math. Iberoamericana (2005) 349–458.[8] M. Drton and M. Plummer: A Bayesian Information Criterion for SingularModels, J. R. Statist. Soc. B (2017) 1–38.[9] M. Drton, B. Sturmfels and S. Sullivant: Lectures on Algebraic Statistics ,Oberwolfach Seminars , Birkh¨auser, Basel, 2009.[10] D. Eisenbud: Commutative Algebra with a view towards Algebraic Geome-try , Graduate Texts in Mathematics , Springer-Verlag, New York, 1995.[11] M. Evans, Z. Gilula and I. Guttman: Latent class analysis of two-waycontingency tables by Bayesian methods,

Biometrika (1989) 557–563.[12] W. Fulton: Introduction to Toric Varieties , Annals of Mathematics Studies , Princeton University Press, Princeton, 1993.[13] D. Geiger and D. Rusakov: Asymptotic model selection for naive Bayesiannetworks,

J. Mach. Learn. Res. (2005) 1–35.[14] M. Greenblatt: An elementary coordinate-dependent local resolution ofsingularities and applications, J. Funct. Anal. (2008) 1957–1994.[15] M. Greenblatt: Resolution of singularities, asymptotic expansions of inte-grals, and applications,

J. Analyse Math. (2010) 221–245.[16] H. Hironaka: Resolution of singularities of an algebraic variety over a ﬁeldof characteristic zero I, II,

Ann. of Math. (2) (1964) 109–203.3317] J. A. Howald: Multiplier ideals of monomial ideals, Trans. Amer. Math.Soc (2001) 2665–2671.[18] J. Koll´ar: Singularities of pairs, in

Algebraic geometry—Santa Cruz 1995 ,221–287, Proc. Symp. Pure Math. , Amer. Math. Soc., Providence, 1997.[19] J. Koll´ar: Lectures on Resolution of Singularities (AM-166), Princeton Uni-versity Press, 2009.[20] R. Lazarsfeld:

Positivity in Algebraic Geometry I, II , A Series of ModernSurveys in Mathematics , Springer-Verlag, Berlin, 2004.[21] S. Lin, B. Sturmfels and Z. Xu: Marginal likelihood integrals for mixturesof independence models,

J. Mach. Learn. Res. (2009) 1611–1631.[22] M. J. Saia: The integral closure of ideals and the Newton ﬁltration, J.Algebraic Geom. (1996) 1–11.[23] M. Saito: On real log canonical thresholds, arXiv:math.AG/0707.2308 .[24] B. Sturmfels: Gr¨obner Bases and Convex Polytopes , University LectureSeries , Amer. Math. Soc., Providence, 1996.[25] A. N. Varchenko: Newton polyhedra and estimation of oscillating integrals, Funct. Anal. Appl. (1977) 175–196.[26] V. A. Vasil’ev: Asymptotic behavior of exponential integrals in the complexdomain, Funktsional. Anal. i Prilozhen. :4 (1979) 1–12.[27] S. Watanabe: Algebraic analysis for nonidentiﬁable learning machines, Neural Computation (2001) 899–933.[28] S. Watanabe: Algebraic Geometry and Statistical Learning Theory , Cam-bridge Monographs on Applied and Computational Mathematics , Cam-bridge University Press, Cambridge, 2009.[29] K. Yamazaki and S. Watanabe: Singularities in mixture models and upperbounds of stochastic complexity, International Journal of Neural Networks (2003) 1029–1038.[30] K. Yamazaki and S. Watanabe: Newton diagram and stochastic complexityin mixture of binomial distributions, Algorithmic Learning Theory , 350–364,Lecture Notes in Comput. Sci.3244