[PDF] Statistical Learning Theory of Quasi-Regular Cases

Abstract

Many learning machines such as normal mixtures and layered neural networks are not regular but singular statistical models, because the map from a parameter to a probability distribution is not one-to-one. The conventional statistical asymptotic theory can not be applied to such learning machines because the likelihood function can not be approximated by any normal distribution. Recently, new statistical theory has been established based on algebraic geometry and it was clarified that the generalization and training errors are determined by two birational invariants, the real log canonical threshold and the singular fluctuation. However, their concrete values are left unknown. In the present paper, we propose a new concept, a quasi-regular case in statistical learning theory. A quasi-regular case is not a regular case but a singular case, however, it has the same property as a regular case. In fact, we prove that, in a quasi-regular case, two birational invariants are equal to each other, resulting that the symmetry of the generalization and training errors holds. Moreover, the concrete values of two birational invariants are explicitly obtained, the quasi-regular case is useful to study statistical learning theory.

Full PDF

aa r X i v : . [ m a t h . S T ] F e b Statistical Learning Theory ofQuasi-Regular Cases

Koshi Yamada ∗ Sumio Watanabe ∗ October 1, 2018

Abstract

Many learning machines such as normal mixtures and layered neuralnetworks are not regular but singular statistical models, because the mapfrom a parameter to a probability distribution is not one-to-one. The con-ventional statistical asymptotic theory can not be applied to such learningmachines because the likelihood function can not be approximated by anynormal distribution. Recently, new statistical theory has been establishedbased on algebraic geometry and it was clariﬁed that the generalizationand training errors are determined by two birational invariants, the reallog canonical threshold and the singular ﬂuctuation. However, their con-crete values are left unknown. In the present paper, we propose a newconcept, a quasi-regular case in statistical learning theory. A quasi-regularcase is not a regular case but a singular case, however, it has the sameproperty as a regular case. In fact, we prove that, in a quasi-regularcase, two birational invariants are equal to each other, resulting that thesymmetry of the generalization and training errors holds. Moreover, theconcrete values of two birational invariants are explicitly obtained, thequasi-regular case is useful to study statistical learning theory.

A lot of statistical learning machines which are being applied to pattern recog-nition, bioinformatics, robotic control, and artiﬁcial intelligence have hiddenvariables, hierarchical layers, and submodules, because they are used to esti-mate the structure of the true distributions. In such learning machines, themap taking parameters to probability distributions is not one-to-one and theFisher information matrices are singular, hence they are called singular learningmachines. For example, three-layered neural networks, normal mixtures, hiddenMarkov models, Bayesian networks, and reduced rank regressions are singularlearning machines [1, 2, 4, 5, 6, 10]. If a statistical model is singular, then either ∗ Department of Computational Intelligence and Systems Science, Tokyo Institute ofTechnology, Mail box:G5-19, 4259 Nagatsuta, Midori-ku, Yokohama, 226-8502, Japan

E-mail:[email protected],[email protected] G n and T n , are given by two birationalinvariants, the real log canonical threshold λ and singular ﬂuctuation ν by theformulas, E [ G n ] = (cid:16) λ − νβ + ν (cid:17) n + o ( 1 n ) , (1) E [ T n ] = (cid:16) λ − νβ − ν (cid:17) n + o ( 1 n ) , (2)where E [ ] shows the expectation value over all training sets, n is the numberof training samples and β is the inverse temperature of the Bayes posteriordistribution. Based on this relation, we can deﬁne an information criterionwhich enables us to estimate the generalization error from the training error[13].It is well known that, if the true distribution and the statistical model arein a regular case, then λ = ν = d/ d is the dimension of theparameter space. In this case, the symmetry of the generalization and trainingerrors holds, E [ G n ] = d n + o ( 1 n ) , (3) E [ T n ] = − d n + o ( 1 n ) , (4)for arbitrary 0 < β ≤ ∞ . This case corresponds to the well-known AkaikeInformation criterion for regular statistical models. However, if they are not ina regular case, neither of them is equal to d/ ( Quasi-Regular ( Singular . In other words, a quasi-regular case is not a regular case, however, it has thesame properties as the regular case. In fact, we prove that, in quasi-regularcases, both birational invariants are equal to each other, λ = ν , and the sym-metry of the generalization and training errors holds. In a quasi-regular case,two birational invariants are obtained explicitly, hence it is a useful concept inresearches of statistical learning theory.2 Framework of Bayes Learning

In this section, we summarize the framework of the Bayes learning, and intro-duce the well-known results.

Firstly, we deﬁne the generalization and training errors. Let N , n and d benatural numbers. Let X , X , ..., X n be random variables on R N which areindependently subject to the same probability density function as q ( x ). Let p ( x | w ) be a probability density function of x for a parameter w ∈ W ⊂ R d ,where W is a set of parameters. The prior distribution is represented by theprobability density function ϕ ( w ) on W . For a given training set X n = { X , X , ..., X n } , the posterior distribution is deﬁned by p ( w | X n ) = 1 Z n n Y i =1 p ( X i | w ) β ϕ ( w ) dw, where 0 < β < ∞ is the inverse temperature and Z n is the normalizing constant.The case β = 1 is most important because it corresponds to the strict Bayesestimation. The expectation value over the posterior distribution is denoted by E w [ ] = Z ( ) p ( w | X n ) dw. The predictive distribution is deﬁned by p ( x | X n ) = E w [ p ( x | w )] . The generalization and training error, G n and T n , are respectively deﬁned by G n = Z q ( x ) log q ( x ) p ( x | X n ) dx,T n = 1 n n X i =1 log q ( X i ) p ( X i | X n ) . The generalization error shows the Kullback-Leibler distance from the true dis-tribution to the estimated distribution. The smaller the generalization error is,the better the learning result is. However, we can not know the generalizationerror directly, because calculation of G n needs the expectation value over theunknown true distribution q ( x ). On the other hand, the training error can becalculated using only training samples, in practice, as the log likelihood func-tion. Hence one of the main purposes of statistical learning theory is to clarifythe mathematical relation between them.3 .2 Two Birational Invariants Secondly, we deﬁne two birational invariants.The Kullback-Leibler distance from the true distribution q ( x ) to a parametricmodel p ( x | w ) is deﬁned by K ( w ) = Z q ( x ) log q ( x ) p ( x | w ) dx. Then K ( w ) = 0 if and only if q ( x ) = p ( x | w ). In this paper, we assume thatthere exists a parameter w which satisﬁes q ( x ) = p ( x | w ) and that K ( w ) is ananalytic function of w . Deﬁnition 1 (Real Log Canonical Threshold)

The zeta function of statis-tical learning is deﬁned by ζ ( z ) = Z K ( w ) z ϕ ( w ) dw. Then ζ ( z ) is a holomorphic function on the region Re ( z ) > , which can beanalytically continued to the unique meromorphic function on the entire complexplane [11]. All poles of the zeta function are real, negative, and rational numbers.If its largest pole is ( − λ ) , then the real log canonical threshold is deﬁned by λ .The order of the pole z = − λ is referred to as a multiplicity m . Deﬁnition 2 (Singular Fluctuation)

The functional variance is deﬁned by V n = n X i =1 { E w [(log p ( X i | w )) ] − E w [log p ( X i | w )] } . Then it was proved [13] that the expectation value ν = β n →∞ E [ V n ] exists. The constant ν is called the singular ﬂuctuation. Theorem 1

The expectation values of the generalization and training errorsare given by eq.(1) and eq.(2). Therefore E [ G n ] = E [ T n ] + 2 νn + o ( 1 n ) . (Proof) This theorem was proved in [15, 13]. (Q.E.D.) Remarks . (1) The real log canonical threshold and the singular ﬂuctuation areinvariant under a birational transform w = g ( w ′ ) ,p ( x | w ) p ( x | g ( w ′ )) ,ϕ ( w ) ϕ ( g ( w ′ )) | g ′ ( w ′ ) | , | g ′ ( w ′ ) | is the Jacobian determinant. Such constants are called birationalinvariants.(2) The real log canonical thresholds for several learning machines were clari-ﬁed [1, 2, 16] using resolution of singularities. However, the singular ﬂuctuationhas been left unknown. This paper provides the ﬁrst result which clariﬁes theconcrete values of singular ﬂuctuation in a singular case.(3) The real log canonical threshold is a well known birational invariant in al-gberaic geometry, which plays an important role in higher dimensional algberaicgeometry. The singular ﬂuctuation was found in statistical learning theory. Thirdly, we deﬁne regular and singular cases.

Deﬁnition 3

A pair of the true distribution and the parametric model, ( q ( x ) , p ( x | w )) is called to be in a regular case if and only if the set { w ; q ( x ) = p ( x | w ) } consistsof a single element w and Fisher information matrix Z ∇ log p ( x | w )( ∇ log p ( x | w )) T q ( x ) dx is positive deﬁnite. Otherwise, it is called to be in a singular case. For a regular case, the real log canonical threshold and the singular ﬂuctu-ation have been completely clariﬁed.

Theorem 2

If a pair ( q ( x ) , p ( x | w )) is in a regular case, then λ = ν = d/ ,where d is the dimension of the parameter. (Proof) This theorem was proved in [15]. (Q.E.D.) In this section, we deﬁne a quasi-regular case. This concept is ﬁrstly proposedby the present paper. Also the main theorem is introduced.

Deﬁnition 4 i Quasi-Regular Case j. Assume that there exists a parame-ter w ∈ W o such that q ( x ) = p ( x | w ) . Without loss of generality, we canassume that w is the origin w = 0 . The original parameter is denoted by w = ( w , w , ..., w d ) . Let g and ∆ d , ∆ d , ..., ∆ d g be natural numbers whichsatisfy ∆ d + ∆ d + · · · + ∆ d g = d and ∆ d = 0 . We deﬁne d j = ∆ d + · · · + ∆ d j ( j = 0 , · · · , g )5 nd a function u = ( u , u , ..., u g ) ∈ R g of the paramater w ∈ R d by u = d Y j =1 w j ,u = d Y j = d +1 w j , · · · = · · · ,u g = d g Y j = d g − +1 w j . If there exist constants c , c > such that, for arbitrary w ∈ W , c ( u + · · · + u g ) ≤ K ( w ) ≤ c ( u + · · · + u g ) , then the pair ( q ( x ) , p ( x | w )) is called to be in a quasi-regular case. Remark. (1) If g = d , then { w ; q ( x ) = p ( x | w ) } = { } and the quasi-regular case corresponds to the regular case. Hence a quasi-regularcase contains a regular case as a special one.(2) If d = g , then “ K ( w ) = 0 ⇐⇒ w = 0” does not hold, because, for at leastone variable w j , K (0 , , .., w j , , ..,

0) = 0. Hence a quasi-regular case with d = g is not a regular case but a singular case.(3) There are singular cases which are not contained in quasi-regular cases.Therefore, Regular ( Quasi-Regular ( Singularholds. The present paper shows in Theorem 3 that a quasi-regular case is not aregular case, however, it has the same property as a regular case.

Example.1

Let a statistical model be p ( x, y | w ) = r ( x ) √ π exp( −

12 ( y − ax − b tanh( cx )) ) , where w = ( a, b, c ) is the parameter and r ( x ) is the probability density functionof x . If the true distribution is given by q ( x, y ) = p ( x, y | , , u = a, u = bc, it follows that K ( w ) = 12 Z ( ax + b tanh( cx )) r ( x ) dx g = 2, because x andtanh( cx ) /c is linearly independent. In fact there exist c , c > c ( a + ( bc ) ) ≤ K ( w ) ≤ c ( a + ( bc ) ) . Hence the set of true parameters consists of the union of two lines, { w ; q ( x, y ) = p ( x, y | w ) } = { a = 0 , bc = 0 } . Example.2

Let a statistical model be p ( x, y | w ) = r ( x ) √ π exp( −

12 ( y − ax − b tanh( cx )) ) , where w = ( a, b, c ) is the parameter and the true distribution is given by q ( x, y ) = p ( x, y | , , x and tanh( cx ) /c is not linearly inde-pendent as c →

0, hence this case does not satisﬁes the quasi-regular condition.In this case c (( a + bc ) + b c ) ≤ K ( w ) ≤ c (( a + bc ) + b c ) . Example.2 resembles Example.1, however, from the viewpoint of statisticallearning theory, they are diﬀerent.

Example.3

Let a statistical model be p ( x, y, z | w ) = r ( x, y ) √ π exp( −

12 ( z − f ( x, y, w )) ) , where f ( x, y, w ) = a sin( b x ) + a x sin( b x )+ a sin( b y ) + a y sin( b y ) , and w = { ( a i , b i ) } is the parameter and the true distribution is given by q ( x, y, z ) = p ( x, y, z | q ( x, y, z ) , p ( x, y, z | w )) is in a quasi-regular casewith g = 4.The following is the main theorem of the present paper. Theorem 3 (Main Theorem) . Assume that the pair ( q ( x ) , p ( x | w )) is in aquasi-regular case and that ϕ ( w ) > on W . Then the real log canonical thresholdand the singular ﬂuctuation are given by λ = ν = g and m = d − g + 1 . orollary 1 Assume that the pair ( q ( x ) , p ( x | w )) is in a quasi-regular case andthat ϕ ( w ) > on W . For arbitrary < β < ∞ the symmetry of the generaliza-tion and training errors holds, E [ G n ] = g n + o ( 1 n ) , E [ T n ] = − g n + o ( 1 n ) . Remarks. (1) The above theorem shows the generalization and training errorsfor Bayes estimation. In the quasi-regular case, they have the same propertyas those in regular cases, however, the generalization and training errors of themaximum likelihood estimation is diﬀerent from regular case in general.(2) In the maximum likelihood method, the training error of a singular case isfar smaller than that of a regular case, whereas the generalization error of asingular case is far larger than that of a regular case. From the viewpoint of themaximum likelihood method, the quasi-regualr case is contained in the singularcase. In the present paper, we prove that the quasi-regular case has the sameproperty as the regular case from the viewpoint of the Bayes estimation.

In this section, we prove the main theorem. At ﬁrst, we derive the real logcanonical threshold of the quasi-regular case.

Lemma 1

The real log canonical threshold and its order are given by λ = g/ and m = d − g + 1 respectively. (Proof) Since each function { u j ; j = 1 , , ..., g } does not have common variable w k , the real log canonical threshold is given by the sum of individual real logcanonical thresholds (Remark 7.2 in [15]) deﬁned by ζ j ( z ) = Z d j Y i = d j − +1 ( w i ) z dw i = C ( z + 1 / d j − d j − + · · · + . Hence λ is equal to g times 1 /

2, hence λ = g/

2. The multiplicity is also givenby m = d + d − d + · · · + d g − d g − − ( g − d − g + 1 , which shows the Lemma. (Q.E.D.) 8 eﬁnition 5 For a given pair of the true distribution q ( x ) and the parametricmodel p ( x | w ) , the log density ratio function is deﬁned by f ( x, w ) = log q ( x ) p ( x | w ) . The following lemma shows that the log density ratio function of the quasi-regular case is represented by g linearly independent functions. Lemma 2

Assume that the pair ( q ( x ) , p ( x | w )) is in a quasi-regular case. Thenthere exists a set of functions { e j ( x, u ); j = 1 , , ..., g } which are analytic func-tions of u and f ( x, w ) = g X j =1 u j e j ( x, u ) in an open neighborhood of u = 0 . (Proof) Let us deﬁne a function F ( t ) = t + e − t − . for t ∈ R . Then F (0) = 0, F ′ (0) = 0, and F ′′ (0) = 1, resulting that F ( t ) ≥ F ( t ) = 0 if and only if t = 0. Moreover, F ( t ) ∼ = (1 / t for small | t | .Therefore, K ( w ) = Z q ( x ) F (cid:16) log q ( x ) p ( x | w ) (cid:17) dx = Z q ( x ) F ( f ( x, w )) dx ∼ = 12 Z q ( x ) f ( x, w ) dx. (5)By the assumption of the quasi-regular case, K ( w ) = 0 if and only if u = u = · · · = u g = 0, which is equivalent to f ( x, w ) ≡

0. That is to say, f ( x, w ) iscontained in the ideal of analytic functions generated by u , u , ..., u g . Hencethere exist a set { e j ( x, u ) } of analytic functions of u , which satisﬁes f ( x, w ) = g X j =1 u j e j ( x, u ) . Therefore, we obtained the Lemma. (Q.E.D.)In the following lemma, we show that the quasi-regular case has the generalizedFisher information matrix.

Lemma 3

The g × g matrix I ( u ) is deﬁned by I ij ( u ) ≡ Z q ( x ) e i ( x, u ) e j ( x, u ) dx. Then I ( u ) is positive deﬁnite in an open neighborhood of u = 0 . u = 0, K ( w ) = 12 ( u · I ( u ) u ) . By the condition of the quasi-regular case, c g X j =1 u j ≤ K ( w ) . Hence the minimum eigenvalue of I ( u ) is positive, which shows I ( u ) is positivedeﬁnite. (Q.E.D.)The following deﬁnition and lemma show that the empirical loss function of thequasi-regular case has the same decomposition as that of the regular case. Deﬁnition 6

A random process ξ n ( u ) ∈ R g is deﬁned by ξ n ( u ) = 1 √ n n X i =1 { I ( u ) u − e ( X i , u ) } where e ( x, u ) = ( e ( x, u ) , e ( x, u ) , ..., e g ( x, u )) T . Lemma 4

The empirical loss function deﬁned by K n ( w ) = 1 n n X i =1 f ( X i , w ) is represented by K n ( w ) = 12 ( u, I ( u ) u ) − √ n u · ξ n ( u ) in the neighborhood of u = 0 . Moreover, the random process ξ n ( u ) converges tothe gaussian process ξ ( u ) that satisﬁes E [ ξ (0) · I (0) − ξ (0)] = g. (Proof) The empirical loss function is given by K n ( w ) = K ( w ) − n n X i =1 { K ( w ) − f ( X i , w ) } . By combining this equation with the deﬁnition of ξ n ( u ), the ﬁrst half of theLemma is obtained. For the second half, the convergence ξ n ( u ) is derived fromthe general empirical process theory. Moreover, E [ ξ n (0) · I (0) − ξ n (0)]= E [tr( I (0) − ξ n (0) ξ n (0) T )] = g, ξ n (0) E [ ξ n (0) ξ n (0) T ] = Z q ( x ) e ( x, e ( x, T dx = I (0) , which completes the Lemma. (Q.E.D.)In the quasi-regular case, the relation between w = ( w , w , ..., w d ) and u =( u , u , ..., u g ) is important. The following lemma shows the property of thequasi-regular case. This lemma does not hold in general singular cases. Lemma 5

When n tends to inﬁnity, g Y j =1 δ (cid:16) u j √ n − d j Y k = d j − +1 w k (cid:17) ∼ = c (log n ) m − d Y j =1 δ ( w j ) where m = d − g + 1 and c > is a constant. (Proof) Firstly, we prove that the delta function with variables x = ( x , x , ..., x d )in M ≡ [0 , d D ( t, x ) = δ ( t − x x · · · x d )has asymptotic expansion for t → D ( t, x ) = ( − log t ) d − ( d − d Y k =1 δ ( x k )+ o (( − log t ) d − ) . (6)Let φ ( x ) be an arbitrary C ∞ -class function of x whose support is containd in D t ( φ ) ≡ Z M D ( t, x ) φ ( x ) d x . Then its Mellin transform is Z D t ( φ ) t z dt = Z M d Y i =1 ( x i ) z φ ( x ) d x , where M is the compact set that is the support of φ . Without loss of generalityBy using Taylor expansion φ ( x ) = φ (0) + x · ∇ φ (0) + · · · , we have the asymptotic expansion, Z D t ( φ ) t z dt = 1( z + 1) d φ (0) + · · · . Z D t ( t, x ) t z dt = 1( z + 1) d d Y k =1 δ ( x k ) + · · · for x ∈ [0 , d . By using inverse Mellin transform, we obtained eq.(6). Secondly,let us prove the Lemma. By using eq.(6), for each u j , δ ( u j √ n − d j Y k = d j − +1 w k ) ∝ (log n ) d j − d j − − d j Y j = d j − +1 δ ( w j )when n → ∞ . By summing up these relations for j = 1 , , ..., g , Lemma isobtained. (Q.E.D.)Let us return to the proof of the Main theorem. (Proof of Main Theorem) It was proved by eq.(6.4) in [15] that the expec-tation value of K n ( w ) is given by two birational invariants, E [ E w [ K n ( w )]] = λnβ − νn + o ( 1 n ) . Since we have already obtained the value of λ in Lemma1, that is to say, λ = g/

2, we can derive the value of ν by calculating E [ E w [ K n ( w )]]. The posteriordistribution is represented by the empirical loss function by p ( w | X n ) ∝ exp( − nβK n ( w )) ϕ ( w ) dw. The integration of the outside of the neighborhood of u = 0 with respect tothe posterior distribution goes to zero with the smaller order than exp( −√ n )as Lemma 6.3 in [15], hence we can restrict the integrated region to the neigh-borhood of u = 0. The empirical loss function is rewritten as K n ( w ) = 12 k I ( u ) (cid:16) u − I ( u ) − ξ n ( u ) √ n (cid:17) k − n ( ξ n ( u ) · I ( u ) − ξ n ( u )) . In the neighborhood of u = 0, we obtain K n ( w ) ∼ = 12 k I (0) (cid:16) u − I (0) − ξ n (0) √ n (cid:17) k − n ( ξ n (0) · I (0) − ξ n (0)) . F ( ), Z F ( √ n u ) dw = Z F ( √ n u ) g Y j =1 δ (cid:16) u − d j Y k = d j − +1 w k (cid:17) dwdu = Z F ( u ) g Y j =1 δ (cid:16) u √ n − d j Y k = d j − +1 w k (cid:17) dw dun g/ = c (log n ) m − n g/ Z F ( u ) du. On the other hand, nK n ( w ) = 12 k I (0) / (cid:16) √ n u − I (0) − ξ n (0) (cid:17) k −

12 ( ξ n (0) · I (0) − ξ n (0)) ≡ ˆ K n ( √ n u ) . Therefore, E w [ K n ( w )] = R K n ( w ) exp( − nβK n ( w )) ϕ ( w ) dw R exp( − nβK n ( w )) ϕ ( w ) dw = 1 n R ˆ K n ( √ n u ) exp( − β ˆ K n ( √ n u )) ϕ ( w ) dw R exp( − β ˆ K n ( √ n u )) ϕ ( w ) dw = 1 n R ˆ K n ( u ) exp( − β ˆ K n ( u )) du R exp( − β ˆ K n ( u )) du = 12 n R k I (0) ( u − ξ ∗ n ) k exp( − β ˆ K n ( u )) du R exp( − β ˆ K n ( u )) du − n ( ξ n (0) · I (0) − ξ n (0)) , where the notation ξ ∗ n = I (0) − ξ n (0)is used. Finally, by the integral formlula R k I (0) / u k exp( − β k I (0) / u k ) du R exp( − β k I (0) / u k ) du = gβ and by Lemma.4, we have E [ E w [ K n ( w )]] = g βn − g n + o ( 1 n ) , d/ g/ λ SF d/ g/ νG n d/ (2 n ) g/ (2 n ) (( λ − ν ) /β + ν ) /nT n − d/ (2 n ) − g/ (2 n ) (( λ − ν ) /β − ν ) /n Table 1: Regular, Quasi-Regular, and SingularThen, because λ = g holds from Lemma1, we obtain the Theorem. (Q.E.D.) Example.4

By the main theorem of this paper, the real log canonical thresh-old and the singular ﬂuctuation of Example.1 are λ = ν = 1. Also those ofExample.3 are λ = ν = 2. Let us discuss the result of this paper from the two diﬀerent points of view.Firstly, we study the theoretical aspect and then the practical aspect.

In the present paper, we introduced a new concept, a quasi-regular case. Aquasi-regular case is not a regular case, but it has the same property as the reg-ular case. Table.1 shows comparison of the real log canonical threshold (RLCT),singular ﬂuctuation (SF), the generalization error G n , and the training error T n .Even for the general singular cases, real log canonical thresholds have beenclariﬁed in several cases. However, this paper is the ﬁrst case in which thesingular ﬂuctuation was clariﬁed. In general singular cases, it is conjecturedthat the real log canonical threshold is not equal to the singular ﬂuctuation. Toclarify such conjecture is the future study. In applications, even if both birational invariants are unknown, the generaliza-tion error can be estimated from the training error and the functional variance[13] because E [ G n ] = E [ T n ] + βn E [ V n ] + o ( 1 n ) , which is asymptotically equivalent to Bayes cross validation [14].However, in Bayes estimation, the method how to approximate the posteriordistribution using Markov chain Monte Carlo (MCMC) method is an importantissue. There are a lot of parameters which determine the MCMC process, forexample, times of burn-in, times of suﬃciently updates, and so on. If we knowthe concrete values of birational invariants, then we can evaluate how accurate14he MCMC process is [9]. Therefore, the quasi-regular cases are appropriate forevaluating MCMC process. It is the future study to evaluate MCMC processusing the quasi-regular cases. In the present paper, a new concept, a quasi-regular case, was ﬁrstly proposed,and its theoretical foundation was constructed. A quasi-regular case is not aregular case but a singular case, whereas it has the same property as a regularcase. In a quasi-regular case, it was proved that the real log canonical thresholdis equal to the singular ﬂuctuation. This is the ﬁrst case in which nontrivialvalue of singular ﬂuctuation is clariﬁed.

Acknowledgement

This research was partially supported by the Ministry of Education, Science,Sports and Culture in Japan, Grant-in-Aid for Scientiﬁc Research 23500172.