[PDF] An Equation-By-Equation Method for Solving the Multidimensional Moment Constrained Maximum Entropy Problem

Abstract

An equation-by-equation (EBE) method is proposed to solve a system of nonlinear equations arising from the moment constrained maximum entropy problem of multidimensional variables. The design of the EBE method combines ideas from homotopy continuation and Newton's iterative methods. Theoretically, we establish the local convergence under appropriate conditions and show that the proposed method, geometrically, finds the solution by searching along the surface corresponding to one component of the nonlinear problem. We will demonstrate the robustness of the method on various numerical examples, including: (1) A six-moment one-dimensional entropy problem with an explicit solution that contains components of order 10 0 − 10 3 in magnitude; (2) Four-moment multidimensional entropy problems with explicit solutions where the resulting systems to be solved ranging from 70−310 equations; (3) Four- to eight-moment of a two-dimensional entropy problem, which solutions correspond to the densities of the two leading EOFs of the wind stress-driven large-scale oceanic model. In this case, we find that the EBE method is more accurate compared to the classical Newton's method, the MATLAB generic solver, and the previously developed BFGS-based method, which was also tested on this problem. (4) Four-moment constrained of up to five-dimensional entropy problems which solutions correspond to multidimensional densities of the components of the solutions of the Kuramoto-Sivashinsky equation. For the higher dimensional cases of this example, the EBE method is superior because it automatically selects a subset of the prescribed moment constraints from which the maximum entropy solution can be estimated within the desired tolerance. This selection feature is particularly important since the moment constrained maximum entropy problems do not necessarily have solutions in general.

Full PDF

AAN EQUATION-BY-EQUATION METHOD FOR SOLVING THEMULTIDIMENSIONAL MOMENT CONSTRAINED MAXIMUM ENTROPYPROBLEM

WENRUI HAO AND JOHN HARLIM

Abstract.

An equation-by-equation (EBE) method is proposed to solve a system of nonlinear equa-tions arising from the moment constrained maximum entropy problem of multidimensional variables. Thedesign of the EBE method combines ideas from homotopy continuation and Newton’s iterative methods.Theoretically, we establish the local convergence under appropriate conditions and show that the proposedmethod, geometrically, ﬁnds the solution by searching along the surface corresponding to one componentof the nonlinear problem. We will demonstrate the robustness of the method on various numerical exam-ples, including: (1) A six-moment one-dimensional entropy problem with an explicit solution that containscomponents of order 10 − in magnitude; (2) Four-moment multidimensional entropy problems withexplicit solutions where the resulting systems to be solved ranging from 70 −

310 equations; (3) Four- toeight-moment of a two-dimensional entropy problem, which solutions correspond to the densities of thetwo leading EOFs of the wind stress-driven large-scale oceanic model. In this case, we ﬁnd that the EBEmethod is more accurate compared to the classical Newton’s method, the MATLAB generic solver, andthe previously developed BFGS-based method, which was also tested on this problem. (4) Four-momentconstrained of up to ﬁve-dimensional entropy problems which solutions correspond to multidimensionaldensities of the components of the solutions of the Kuramoto-Sivashinsky equation. For the higher dimen-sional cases of this example, the EBE method is superior because it automatically selects a subset of theprescribed moment constraints from which the maximum entropy solution can be estimated within thedesired tolerance. This selection feature is particularly important since the moment constrained maximumentropy problems do not necessarily have solutions in general. Introduction

The maximum entropy principle provides a natural criterion for estimating the least biased densityfunction subjected to the given moments [14]. This density estimation approach has a wide range of ap-plications, such as, the harmonic solid and quantum spin systems [20], econometrics [26], and geophysicalapplications [3, 13]. In a nutshell, this moment constrained method is a parametric estimation techniquewhere the resulting density function is in the form of an exponential of polynomials. This is a conse-quence of maximizing the Shannon entropy subjected to the polynomial moment constraints, which isusually transformed into an unconstrained minimization problem of a Lagrangian function [27]. Standardapproaches for solving this unconstrained minimization problem are based on Newton’s iterative method[4, 27] or quasi-Newton’s based method such as the BFGS method [2, 5].In the last two papers [2, 5], where the BFGS-based method was introduced and reviewed, they con-sidered minimization problems that involve 44-83 equations, resulting from a 2D problem with momentconstraints of up to order-eight, a 3D problem with moment constraints of up to order-six, and a 4Dproblem with moment constraints of up to order-four. In this paper, we introduce a novel equation solverthat can be used to ﬁnd density function of moderately high dimensional problems (e.g., systems of 70-310equations resulting from moments up to order-four of 4-7 dimensional density functions) provided that thesolutions exist. The proposed method, which we called the Equation-By-Equation (EBE) method, is aniterative method that solves a one-dimensional problem at the ﬁrst iterate, a two-dimensional problem at

Date : February 20, 2018.2010

Mathematics Subject Classiﬁcation.

Key words and phrases.

Homotopy continuation, moment constrained, maximum entropy, equation-by-equation method.The research of W.H. was partially supported by the American Heart Association under Grant 17SDG33660722 and theInstitute for CyberScience Seed Grant.The research of J.H. was partially supported by the Oﬃce of Naval Research Grants N00014-16-1-2888, MURI N00014-12-1-0912 and the National Science Foundation Grants DMS-1317919, DMS-1619661. a r X i v : . [ m a t h . NA ] F e b he second iterate, a three-dimensional problem at the third iterate, and eventually, solves the full systemof nonlinear equations corresponding to the maximum entropy problem at the last iterate. Technically,this method combines Newton’s method with ideas from homotopy continuation. We will show that theEBE method is locally convergent under appropriate conditions. Furthermore, we will provide suﬃcientconditions for global convergence. Through the convergence analysis, we will show that, geometrically,the proposed method ﬁnds the solution of the nonlinear system of equations by tracking along the surfacecorresponding to one component of the system of nonlinear equations. The EBE method automaticallyselects a subset of the prescribed constraints from which the maximum entropy solution can be estimatedwithin the desired tolerance. This is an important feature since the maximum entropy problems do notnecessarily have solutions for general set of moment constraints.We shall ﬁnd that the EBE method produces more accurate solutions (smaller error in the moments)compared to the classical Newton’s method, the MATLAB built-in fsolve.m , and BFGS method on thetest problem in [2, 5] and on test problems based on the solutions of the Kuramoto-Shivashinski equation.Numerically, we will demonstrate that the EBE method is able to solve problems where the true solutionsconsist of components of order 10 − . We shall also see that the EBE method can solve a system ofhundreds of equations in various examples, including those with explicit solutions as well as those withdensities estimated based on solutions of complex spatially extended dynamical systems.The remaining part of the paper is organized as follows. In Section 2, we give a brief overview of themultidimensional maximum entropy problem. In Section 3, we introduce the EBE algorithm. In Section 4,we provide the local convergence analysis. In Section 5, we discuss the practical issues with the proposedmethod and provide remedies. In Section 6, we demonstrate the robustness of the EBE method on variousnumerical examples. In Section 7, we conclude the paper with a brief summary and discussion. We includean Appendix to show some computational details that are left out in the main text. Interested readers andusers can access the EBE codes (written in MATLAB) at [10].2. An overview of the maximum entropy problem

We consider the Haussdorf moment-constrained maximum entropy problem [4, 5, 8]. That is, ﬁnd theoptimal probability density ρ ∗ ( x ) which maximizes the Shannon entropy, S ( ρ ) := − (cid:90) Ω log( ρ ( x )) ρ ( x ) d x , (1)where x ∈ Ω = [ − , d satisﬁes the following linear constraints, F j := (cid:90) Ω c j ( x ) ρ ( x ) d x = f j , | j | = 0 , , , · · · , p. (2)In applications, one usually computes the statistics f j from samples of data. For arbitrary ﬁnite domain,one can rescale the data to the domain Ω.While c j ( x ) can be arbitrary functions in L (Ω , ρ ), we will focus on the usual uncentered statisticalmoments with monomial basis functions, c j ( x ) = x j in this article, where we have adopted the notations x = ( x , . . . , x d ) ∈ Ω, j = ( j , . . . , j d ) ∈ Z d + with Z + = { , , , . . . } , and x j = (cid:81) di =1 x j i i . In (2), thequantities f j are the given j -th moments that can be computed from the data. Since the total numberof monomials x j where | j | = j is C j + d − d − , then the total number of constraints in (2) for moments up toorder- p is, n = p (cid:88) j =1 C j + d − d − , excluding the normalization factor corresponding to c ( x ) = 1. For example, in two-dimensional problem,the total number of moments up to order p = 4 is n = 14. To simplify the notation below, we will use asingle index notation and understood that the total number of constraints to be satisﬁed is n , excludingthe zeroth moment. The exclusion of the zeroth moment will be clear as we discuss below. y introducing Lagrange multipliers, the above constrained optimization problem can be transformedinto the following unconstrained problem: L ( ρ ( x ) , λ , · · · λ n ) = S ( ρ ) + n (cid:88) j =0 λ j ( F j − f j ) . (3)In order to ﬁnd a solution of (3), we set ∂ L ∂ρ = 0, which gives, ρ ( x ) = 1 Z exp (cid:16) n (cid:88) j =1 λ j c j ( x ) (cid:17) , (4)where we have deﬁned Z = exp(1 − λ ). Since (cid:82) Ω ρ ( x ) d x = 1, we have Z ( λ , . . . , λ n ) = (cid:90) Ω exp (cid:16) n (cid:88) j =1 λ j c j ( x ) (cid:17) d x , (5)which indicates that Z (or implicitly λ ) is a function of λ , . . . , λ n . Therefore, the normalization factor Z can be computed via (5) once λ , . . . , λ n are estimated. Therefore, we can just concentrate on ﬁndingthe Lagrange multipliers λ , . . . , λ n which satisfy n constraints in (2), excluding the case c ( x ) = 1. Inparticular, the constrained maximum entropy problem is to solve the following nonlinear system of integralequations, F j ( λ , · · · , λ n ) := F j ( λ , . . . , λ n ) − f j = (cid:90) Ω ( c j ( x ) − f j ) exp (cid:16) n (cid:88) k =1 λ k c k ( x ) (cid:17) d x = 0 , j = 1 , . . . , n, (6)for λ , . . . , λ n .In our numerical implementation, the integral in system (6) will be approximated with a nested sparsegrid quadrature rule [9], (cid:90) Ω f ( x ) d x ≈ (cid:88) i f ( x i ) w i , where x i are the nested sparse grid nodes, and w i are the corresponding weights based on the nestedClenshaw-Curtis quadrature rule [25]. The number of nodes depends on the dimension of the problem d and the number of the nested set (based on the Smolyak construction [23]), is denoted with the parameter (cid:96) (referred as level). In the numerical implementation, we need to specify the parameter (cid:96) .3. An equation-by-equation algorithm

In this section, we describe the new Equation-By-Equation (EBE) technique to solve the system ofequations in (6), F n ( λ n ) = , (7)where we have deﬁned, F n ( λ n ) := (cid:16) F ( λ n ) , . . . , F n ( λ n ) (cid:17) , and λ n = ( λ i , . . . , λ n ). In the following iterative scheme, we start the iteration with an initial condition( α , . . . , α n ) ∈ R n . We deﬁne µ ( i ) ∈ R i as the exact solution to the following i -dimensional system, F i ( λ i , α i +1 , . . . , α n ) = , i = 1 , . . . , n, (8)where we have ﬁxed the last n − i coeﬃcients, λ i +1 = α i +1 , . . . , λ n = α n . With this notation, the exactsolution for (7) is µ ( n ) ∈ R n . We also deﬁne ˆ µ ( i ) to be the numerical estimate of µ ( i ) . With these notations,we now describe the algorithm.Generally speaking, at each iteration- i , where i = 1 , . . . , n , the EBE algorithm solves a system of i -dimensional system in (8). At each step- i , given the numerical solution at the previous step, ˆ µ ( i − ∈ R i − and initial condition α i , we apply idea from homotopy continuation to ﬁnd the solution µ ( i ) ∈ R i that solves the i -dimensional system of equations (8). Notice that we do not only add a new equa-tion F i ( λ i , α i +1 , . . . , α n ) = 0 but we also estimate the i th variable in the previous ( i −

1) equations, i − ( λ i , α i +1 , . . . , α n ) = . The scheme proceeds by solving the larger systems one-by-one until i = n sowe eventually solve (7).Now let us describe how to numerically estimate µ ( i ) at every step- i . For the ﬁrst step i = 1, we solvethe one-dimensional problem, F ( λ , α , . . . , α n ) = 0 , for λ with Newton’s method. For the steps i = 2 , . . . , n , we have ˆ µ ( i − which are the numerical estimatesof F i − ( λ i − , α i , . . . , α n ) = . To simplify the expression below, let us use F i ( λ i − , λ i ) as a short handnotation for F i ( λ i − , λ i , α i , . . . , α n ) to emphasize the independent variables.We proceed to estimate λ i using Newton’s method with T ol on the i -th equation. That is, we iterate λ m +1 i = λ mi − (cid:16) ∂F i ∂λ i ( λ mi − , λ mi ) (cid:17) − F i ( λ mi − , λ mi ) , m = 0 , . . . , (9) λ i = α i , λ i − = ˆ µ ( i − assuming that ∂F i ∂λ i ( λ mi − , λ mi ) (cid:54) = 0. Here, the partial derivative of F i with respect to λ i evaluated at λ mi isdeﬁned as, ∂F i ∂λ i ( λ mi − , λ mi ) = (cid:90) Ω ( c i ( x ) − f i ) c i ( x ) exp (cid:16) i − (cid:88) j =1 λ mj c j ( x ) + λ mi c i ( x ) (cid:17) d x , (10)where we have denoted λ mi − = ( λ mi , . . . , λ mi − ). Notice that to proceed the iteration in (9), we need toupdate λ mi − for m >

0. We propose to follow the homotopy continuation method for this update. Inparticular, we are looking for λ m +1 i − that solves F i − ( λ m +1 i − , λ m +1 i ) = , given the current estimate λ m +1 i from (9) as well as F i − ( λ mi − , λ mi ) = . At m = 0, this last constraint is numerically estimated by F i − ( ˆ µ ( i − , α i ) ≈ .One way to solve this problem is through the following predictor-corrector step which is usually used inhomotopy continuation method [7, 24]. In particular, we apply Taylor’s expansion to F i − ( λ m +1 i − , λ m +1 i ) = F i − ( λ mi − + ∆ λ , λ mi + ( λ m +1 i − λ mi )) = at ( λ mi − , λ mi ), which gives, F i − ( λ mi − , λ mi ) + F i − , λ i − ( λ mi − , λ mi )∆ λ + F i − ,λ i ( λ mi − , λ mi )( λ m +1 i − λ mi ) = , which means that ∆ λ = − F − i − , λ i − ( λ mi − , λ mi ) F i − ,λ i ( λ mi − , λ mi )( λ m +1 i − λ mi ) , assuming that F i − , λ i − ( λ mi − , λ mi ) is invertible. Based on this linear prediction, λ m +1 i − is approximated by, ˜ λ m +1 i − = λ mi − + ∆ λ = λ mi − − F − i − , λ i − ( λ mi − , λ mi ) F i − ,λ i ( λ mi − , λ mi )( λ m +1 i − λ mi ) . (11)Subsequently, when (cid:107) F i ( ˜ λ m +1 i − , λ m +1 i ) (cid:107) ≥ T ol , apply a correction using Newton’s method by expanding, = F i − ( λ m +1 i − , λ m +1 i ) = F i − ( ˜ λ m +1 i − , λ m +1 i ) + F i − , λ i − ( ˜ λ m +1 i − , λ m +1 i )∆ ˜ λ , assuming that λ m +1 i − = ˜ λ m +1 i − + ∆ ˜ λ , to ﬁnd that, λ m +1 i − = ˜ λ m +1 i − − F i − , λ i − ( ˜ λ m +1 i − , λ m +1 i ) − F i − ( ˜ λ m +1 i − , λ m +1 i ) . (12)This expression assumes that F i − , λ i − ( ˜ λ m +1 i − , λ m +1 i ) is invertible.In summary, at each step- i , we iterate (9), (11), (12). So, the outer loop i corresponds to addingone equation to the system at the time and for each i , we apply an inner loop, indexed with m , to ﬁndthe solution µ ( i ) for F i ( λ i , α i + , . . . , α n ) = . We denote the approximate solution as ˆ µ ( i ) . An adaptivetolerance technique is employed to compute the initial guess of F i by using Newton’s method. In particular,when the current tolerance T ol is not satisﬁed after executing (12), then we divide T ol by ten until T ol is met. ecall that the standard Newton’s method assumes that the Jacobian F n, λ n ∈ R n × n is nonsingular atthe root of the full system in (6) to guarantee the local convergence. In the next section, we will show thatthe EBE method requires the following conditions for local convergence: Assumption 1.

Let µ ( i ) ∈ R i be a solution of F i ( λ i , α i +1 , . . . , α n ) = , for each i = 1 , . . . , n . The EBEmethod assumes the following conditions: (1) ∂F i ∂λ i ( µ ( i ) , α i +1 , . . . , α n ) (cid:54) = 0 . (2) F i, λ i ( µ ( i ) , α i +1 , . . . , α n ) are nonsingular. (3) Each component of F i is twice diﬀerentiable in a close region whose interior contains the solution µ ( i ) . These conditions are similar to the standard Newton’s assumptions on each system of i equations. Thesmoothness condition will be used in the proof of the local convergence in the next section. Of courseif one can specify initial conditions that are suﬃciently close to the true solution, then one can simplyapply Newton’s method directly. With the EBE method, we can start with any arbitrary initial condition.Theoretically, this will require an additional condition beyond the Assumption 1 for global convergenceas we shall discuss in Section 4. In Section 5, we will provide several remedies when the initial conditionis not close to the solution. In fact, we will always set the initial condition to zero in our numericalimplementation in Section 6, α i = 0 , ∀ i = 1 , . . . , n , and demonstrate that the EBE method is numericallyaccurate in the test problems with solutions that are far away from zero.4. Convergence analysis

In this section, we study the convergence of this method. First, let’s concentrate on the convergence of theiteration (9), (11), (12) for solving the i -dimensional system, F i ( λ i − , λ i , α i +1 , . . . , α n ) := F i ( λ i − , λ i ) = for λ i − and λ i . In compact form, these three steps can be written as an iterative map,( λ m +1 i − , λ m +1 i +1 ) = H i ( λ mi − , λ mi ) , (13)where the map H i : R i → R i is deﬁned as, H i ( λ i − , λ i ) := (cid:18) g i − F i − , λ i − ( g i , H i, ) − F i − ( g i , H i, ) λ i − ( ∂F i ∂λ i ( λ i − , λ i )) − F i ( λ i − , λ i ) (cid:19) . (14)In (14), the notation H i, denotes the second component of (14) and g i := λ i − − F i − , λ i − ( λ i − , λ i ) − F i − ,λ i ( λ i − , λ i )( H i, − λ i )(15)is deﬁned exactly as in (11).For notational convenience in the discussion below, we let the components of the exact solution of (8) bedeﬁned as µ ( i ) := ( µ ( i ) i − , µ ( i ) i ) ∈ R i . Here, we denote the ﬁrst i − µ ( i ) i − = ( µ ( i )1 , . . . , µ ( i ) i − ) ∈ R i − . Similarly, we also denote H i = ( H i, , H i, ). First, we can deduce that, Theorem 4.1.

Let µ ( i ) ∈ R i be a ﬁxed point of (13) . Assume that F ∗ i − , λ i − := F i − , λ i − ( µ ( i ) ) isnonsingular and ∂F ∗ i ∂λ i := ∂F i ∂ λi ( µ ( i ) ) (cid:54) = 0 , then F ∗ i := F i ( µ ( i ) ) = .Proof. Evaluating the second equation in (14) at the ﬁxed point, we obtain µ ( i ) i = µ ( i ) i − (cid:16) ∂F ∗ i ∂λ i (cid:17) − F ∗ i , which means that F ∗ i := F i ( µ ( i ) ) = 0. This also implies that H ∗ i, = µ ( i ) i , where H ∗ i, denotes the secondcomponent of (14) evaluated at the ﬁxed point. Subsequently, g ∗ i := g i ( µ ( i ) i − , µ ( i ) i ) = µ ( i ) i − . Substituting H ∗ i, = µ ( i ) i and g ∗ i = µ ( i ) i − into µ ( i ) i − = H ∗ i, , where H ∗ i, denotes the ﬁrst equation in(14) evaluated at the ﬁxed point µ ( i ) , we immediately obtain F ∗ i − := F i − ( µ ( i ) ) = and the proof iscompleted. (cid:3) his theorem says that the ﬁxed points of (13) are indeed the solutions of F i ( λ i − , λ i , α i +1 , . . . , α n ) = , which is what we intend to solve on each iteration i = 2 , . . . , n . Next, we will establish the condition forthe ﬁxed point to be locally attracting. This condition will ensure that if we iterate the map in (14) withan initial condition that is close to the solution, then we will obtain the solution.For local convergence, we want to show that eigenvalues of the Jacobian matrix D H ∗ i := D H i ( µ ( i ) ) arein the interior of the unit ball of the complex plane. One can verify that the components of the Jacobianmatrix D H ∗ i are given by, ∂ H ∗ i, ∂λ j = − ( F ∗ i − , λ i − ) − F ∗ i − ,λ i ∂H ∗ i, ∂λ j , (16) ∂H ∗ i, ∂λ j = δ j,i − (cid:16) ∂F ∗ i ∂λ i (cid:17) − ∂F ∗ i ∂λ j , (17)for j = 1 , . . . , i , where we have used all the three conditions in the Assumption 1 (see Appendix A for thedetailed derivation). Here, δ j,i is one only if j = i and zero otherwise. To simplify the discussion below,let’s deﬁne the following notations, J := F ∗ i − , λ i − v := F ∗ i − ,λ i (18) c := (cid:16) ∂H ∗ i, ∂λ , . . . , ∂H ∗ i, ∂λ i − (cid:17) (cid:62) such that, D H ∗ i +1 = (cid:18) J − vc (cid:62) (cid:126) c (cid:62) (cid:19) ∈ R i × i . (19)We can now obtain the following result: Theorem 4.2.

Let µ ( i ) ∈ R i be a ﬁxed point of (13) such that the conditions in the Assumption 1 aresatisﬁed. Let’s σ j ( F ∗ i − , λ i − ) be the eigenvalues of F ∗ i − , λ i − and assume that they satisfy the followingorder | σ | ≥ | σ | ≥ . . . | σ i − | . If (20) (cid:12)(cid:12)(cid:12)(cid:16) ∂F ∗ i ∂λ i (cid:17) − i − (cid:88) j =1 ∂F ∗ j ∂λ i ∂F ∗ i ∂λ j (cid:12)(cid:12)(cid:12) < | σ i − ( F ∗ i − , λ i − ) | , then µ ( i ) is locally attracting.Proof. From (19), we only need to analyze the eigenvalues of J − vc (cid:62) . From basic matrix theory, recallthat the magnitude of the largest eigenvalue can be bounded above as follows, | σ ( J − vc (cid:62) ) | = (cid:107) J − vc (cid:62) (cid:107) ≤ (cid:107) J − (cid:107) (cid:107) vc (cid:62) (cid:107) , where (cid:107) · (cid:107) denotes the matrix (cid:96) -norm. For the ﬁxed point to be locally attracting, all of the eigenvaluesof J − vc (cid:62) have to be in the interior of the unit ball in the complex plane. This means that we only needto show that (cid:107) J − (cid:107) (cid:107) vc (cid:62) (cid:107) < (cid:107) vc (cid:62) (cid:107) < | σ i − ( J ) | , where σ i − ( J ) denotes the smallest eigenvalue ofthe ( i − × ( i −

1) matrix J following the ordering in the hypothesis.Since Tr( vc (cid:62) ) = (cid:80) ij =1 σ j ( vc (cid:62) ) and vc (cid:62) is a rank-one matrix, then its nontrivial eigenvalue is given by, σ ( vc (cid:62) ) = Tr( vc (cid:62) ) = i − (cid:88) j =1 ∂F ∗ j ∂λ i ∂H ∗ i, ∂λ j = − i − (cid:88) j =1 ∂F ∗ j ∂λ i ∂F ∗ i ∂λ j (cid:16) ∂F ∗ i ∂λ i (cid:17) − , where we have used the deﬁnitions in (18) and the second component in (17). From the assumption in(20), we have (cid:107) vc (cid:62) (cid:107) = | σ ( vc (cid:62) ) | = (cid:12)(cid:12)(cid:12)(cid:16) ∂F ∗ i ∂λ i (cid:17) − i − (cid:88) j =1 ∂F ∗ j ∂λ i ∂F ∗ i ∂λ j (cid:12)(cid:12)(cid:12) < | σ i − ( J ) | , nd the proof is completed. (cid:3) This theorem provides the conditions for local convergence on each iteration- i . In particular, if thehypothesis in Theorem 4.2 is satisﬁed, we will ﬁnd the solutions to (8) by iterating (13) provided that westart with a suﬃciently close initial condition. Notice also that this condition suggests that in practicethe local convergence will be diﬃcult to satisfy if the Jacobian matrix F i − , λ i − is close to singular. Withthese two theorems, we can now establish Theorem 4.3.

Let µ ( n ) ∈ R n be the solution of the n-dimensional system of equations in (7) . We assumethe hypothesis in Theorem 4.2, then the EBE method is locally convergent.Proof. Choose an initial condition, ( α , . . . , α n ), that is suﬃciently close to the solution µ ( n ) of F n ( λ n ) = .First, let us deﬁne the surface F ( λ , . . . , λ n ) = 0 as M n ; here, the dimension of M n is at most n − F ( λ n ) = as M n − , F ( λ n ) = as M n − , and so on. The dimensionof M j is at most j −

1. We assume that F n ( λ n ) = has at least a solution, then M contains the solution µ ( n ) . It is clear that M n ⊃ M n − ⊃ . . . ⊃ M .For i = 1, we solve F ( λ , α , . . . , α n ) = 0 for λ . Geometrically, we look for the ﬁrst coordinateon the surface M n . From the Assumption 1.2, we have the local convergence of the usual Newton’siteration. If α is suﬃciently close to the solution µ (1) = µ (1)1 ∈ R , as m → ∞ we obtain the solution( µ (1)1 , α , . . . , α n ) ∈ M n . By the smoothness assumption, ( µ (1)1 , α , . . . , α n ) is also close to µ ( n ) .Continuing with i >

1, we want to solve F i ( λ i , α i +1 , . . . , α n ) = for λ i . Numerically, we will apply theiterative map H i in (13) starting from ( µ ( i − , α i , . . . , α n ) ∈ M n − i +2 . By Assumption 1.2, the Jacobian, F i − , λ i − ( µ ( i − , α i , . . . , α n ) is nonsingular so by implicit function theorem, for any local neighborhood V of µ ( i − , there exists a neighborhood U of α i and a C function h i − : U → V such that µ ( i − = h i − ( α i )and F i − ( h i − ( λ i ) , λ i , α i +1 , . . . , α n ) = 0 for all λ i ∈ U . Since the initial condition α i is close to µ ( n ) i , bythe smoothness assumption it is also close to µ ( i ) i that solves F i ( λ i , α i +1 , . . . , α n ) = 0. The continuity of h i − on U means that ( µ ( i ) i − , µ ( i ) i ) ∈ V × U . Geometrically, this means the surface F i ( λ i , α i +1 , . . . , α n ) = 0intersects with the curve λ i − = h i − ( λ i ) at µ ( i ) = ( µ ( i ) i − , µ ( i ) i ). Therefore, we can ﬁnd the solution for this i -dimensional system by tracking along the curve λ i − = h i − ( λ i ) where we consider λ i as an independentparameter. The iterative map H i in (14) is to facilitate this tracking and the conditions in Theorem 4.2guarantee convergence to the solution. Notice that during this iteration, the solution remains on M n − i +2 .The solution for this i-dimensional problem is ( µ ( i ) , α i +1 , . . . , α n ) ∈ M n − ( i +1)+2 ⊂ M n − i +2 ⊂ . . . ⊂ M n .Continuing with the same argument, we ﬁnd that for i = n , µ ( n ) ∈ M ⊂ M n . (cid:3) This iterative procedure ﬁnds the solution by searching along the manifold M n in the direction of thehypersurfaces of a single parameter at a time, which local existence is guaranteed by the Assumption 1.It is clear that after each step- i , the estimated solution may not necessarily be closer to the true solutionsince the estimates do not minimize the closest path to the true solution along the manifold M n (or thegeodesic distance). This means that, locally, (cid:107) ( µ ( i +1) , α i +2 , . . . , α n ) − µ ( n ) (cid:107) ≤ (cid:107) ( µ ( i ) , α i +1 , . . . , α n ) − µ ( n ) (cid:107) for i < n − i , there exists a nonempty connected set that contains( µ ( i ) , α i +1 ) and µ ( i +1) such that F i, λ i evaluated at any point in this set is nonsingular. The existence ofthis set will allow us to build a path to connect these two points that are far apart. If this condition is notmet, we need an additional treatment to overcome this issue which will be discussed in the next section.5. Practical challenges

In this section, we will discuss several practical challenges related to our algorithm with remedies. Theyinclude non-locality of the initial condition, mistracking due to multiple solutions, non-existence of solutionswithin the desired numerical tolerance, and the computational complexity. .1. Adaptive tracking.

As we mentioned in the previous section, the EBE method only convergeslocally, which means that it requires an adequate initial condition which is practically challenging. In ournumerical simulations below, in fact, we always start from zero initial condition, α i = 0 , ∀ i = 1 , . . . , n .In this case, notice that even when we obtain an accurate solution at step- i , that is, F i ( ˆ µ ( i ) ) ≈ , as weproceed to the next iteration, | F i +1 ( ˆ µ ( i ) , α i +1 ) | (cid:29)

0, meaning that ( ˆ µ ( i ) , α i +1 ) is not close to the solution, µ ( i +1) . Even when ∂F i +1 ∂λ i +1 ( ˆ µ ( i ) , α i +1 ) is not singular, according to equation (9), λ m +1 i could be very faraway from λ mi . In this case, Newton’s method could fail in Eq. (12) because the initial guess could be veryfar from the solution.As a remedy, we employ an adaptive tracking on λ i to guarantee that the application of Newton’s methodis within its zone of convergence for each predictor-corrector step. The idea of the adaptive tracking isthat we cut the tracking step, ∆ λ i := λ i +1 − λ i , by half until the prediction-correction step in (11)-(12)converges. The detail algorithm is outlined below. Algorithm 1:

Summary of adaptive tracking algorithm

Input : Minimum step size λ min and threshold value of T ol .Compute ∆ λ i by using Newton’s method to solve F i = 0.Set F inal = ∆ λ i while | F inal | > do Solve F i − ( λ i − , λ i + ∆ λ i ) = 0 by using Newton’s method; if Newton’s method fails then ∆ λ i = ∆ λ i / if ∆ λ i < λ min then Discard the i -th equation endelse F inal = F inal − ∆ λ i ∆ λ i = min { ∆ λ i , F inal } endend Bifurcation.

In order to solve F i ( λ , λ , · · · , λ i ) = 0, we track F i − ( λ i − , λ i ) = along λ i as aparameter. During this parameter tracking, we may have some bifurcation points of λ i for the nonlinearsystem F i − ( λ i − , λ i ) = . This means that the Jacobian, F i − , λ i − ( λ i − , λ i ) is rank deﬁcient such that F i − ( λ i − , λ i ) = has multiple solutions λ i − for a given λ i . In this situation, F i has multiple realizationsfunctions of λ i (see the illustration in Figure 1 where the bifurcation point is the intersection of the twopossible realizations of F i ). In this illustration, the goal is to track along the red branch to ﬁnd the root, F i ( λ i ) = 0. As we get closer to the bifurcation point, the Jacobian, F i − , λ i − ( λ i − , λ i ), is singular such thatwe can’t evaluate (11). Intuitively, the existence of multiple solutions near the bifurcation point induces apossibility of mistracking from the red curve to the green curve (as shown by the arrows) which prohibitsone to ﬁnd the solution.To avoid such mistracking, we apply the deﬂation technique to compute the bifurcation point di-rectly [12, 16]. Once the bifurcation point is estimated, we approximate the correct branches using theRichardson extrapolation to avoid mistracking. Denote the bifurcation point as λ ∗ i , the nonlinear system F i − ( λ i − , λ i ) = is diﬃcult to solve when λ i is close to λ ∗ i since the Jacobian of F i − ( λ i − , λ i ) becomesnear singular. If the last attempt is (˜ λ i − , ˜ λ i ), we compute ( λ ∗ i − , λ ∗ i ) by solving the following deﬂatedsystem: G ( λ ∗ i − , λ ∗ i , v ) =  F i − ( λ i − , λ i ) F i − , λ i − ( λ i − , λ i ) vξ T v −  = , where v is the kernel of F i − , λ i − ( λ i − , λ i ) and ξ is a random vector to guarantee that v is not a zeroeigenvector. In this case, G ( λ ∗ i − , λ ∗ i , v ) is well-conditioned [12, 16]. Once the bifurcation point ( λ ∗ i − , λ ∗ i )is estimated, we can avoid mistracking by setting λ i = 2 λ ∗ i − ˜ λ i and solve F i − ( λ i − , λ i ) = by usingNewton’s method with an initial guess 2 λ ∗ i − − ˜ λ i − (which is a Richardson extrapolation). igure 1. Plot of F i ( λ i ) v.s. λ i : There are two bifurcation branches for the nonlinearsystem F i − ( λ i − , λ i ) = . The left part is a mistracking example; the right part is theillustration of a numerical method to avoid the bifurcation point.5.3. Nonexistence of solutions.

In general, the moment constrained maximum entropy problems maynot necessarily have solutions. Even when the solutions exist theoretically, they could be diﬃcult toﬁnd numerically due to the noisy dataset, error in the numerical integration, etc. In this case, we simplydiscard the equation F i when the minimum is larger than the desired tolerance. This feature (discarding theconstraints that give no solutions) is only feasible in the EBE algorithm. However, some theories are neededto preserve the convexity of the polynomials in the exponential term of Eq. (4) while discarding some ofthese constraints. In our numerical simulations below, we handle this issue by re-ordering the constraints. Inparticular, for a problem with moment constraints up to order-4, we include the constraints correspondingto E [ x i ] ( i = 1 , · · · , d ) in the earlier step of the EBE iterations to avoid these constraints being discarded.Note that this method is sensitive to ordering, that is, diﬀerent ordering of constraints yields diﬀerent pathto compute the solution. Therefore, a systematic ordering technique that simultaneously preserves theconvexity of the polynomial in the exponential term of Eq. (4) is an important problem to be addressedin the future.5.4. Computational complexity.

The most expensive computational part in EBE is the numericalevaluation of (6). For a fast numerical integration, we store the monomial basis c j ( x ) as a matrix of size N (cid:96) × n , where N (cid:96) is the number of sparse grid points and n is number of monomial basis. In this case, thecomputational cost in evaluating F j is (2 j + 1) N (cid:96) ( j − j + 1 multiplications and 1 subtractionfor each grid point), excluding the computational cost for exponential function evaluation, which is on theorder of log m to obtain an error of resolution 2 − m [6]. For the i -th iteration of the EBE algorithm, thecomputational cost to evaluate the i -dimensional system F i is (cid:80) ij =1 (2 j + 1) N (cid:96) = i + i N (cid:96) , excluding theexponentiation. 6. Numerical results

In this section, we show numerical results of the EBE method on ﬁve examples. In all of the simulationsbelow, unless stated, we set the Newton’s tolerance

T ol = 10 − and the predictor tolerance T ol = 10 − .In the ﬁrst test example, we will describe how the EBE method works on each iteration. The goal ofthe second example is to demonstrate the global convergence with solutions that are far away from initialcondition, α j = 0. In particular, we will test the EBE method on a problem with solutions, λ j , that havemagnitudes ranging from order 10 − . In this example, we will show the robustness of the estimateas a function of the number of integration points (or the sparse grid level (cid:96) ). The third example is todemonstrate the performance on high dimensional problems (with 70 ≤ n ≤

310 of order hundreds),induced from order-four moments of four to seven dimensional density functions. While these ﬁrst threeexamples involve estimating densities of the form (4), in the next two examples, we also test the EBEmethod to estimate densities from a given data set where the maximum entropy solutions may or may λ λ λ Figure 2.

The illustration of Example 1: The black curve is λ = h ( λ ), the green pointsare the iterations when we solved F ( λ , λ ,

0) = 0; The red curve is ( λ , λ ) = h ( λ ),the blue points are the iterations when we solved F ( λ , λ , λ ) = F ( λ , λ , λ ) = 0; Thecyan point is the numerical solution.not exist. The ﬁrst data-driven problem is to estimate densities of the ﬁrst two leading EOFS of the windstress-driven large-scale oceanic model [2, 5]. The second data-driven problem is to estimate two- to ﬁve-dimensional densities arising from solutions of the Kuramoto-Sivashinsky equation. In these two problems,we compare our method with the classical Newton’s method, the MATLAB built-in solver fsolve.m , andthe previously developed BFGS-based method [2, 5]. Example 1.

We consider a simple example ρ ( x ) ∝ exp( x + x + x ) for x ∈ [ − , so that the exactsolution is λ = (1 , , . Here, the moments f j can be computed numerically as follows, f j = (cid:82) − x j ρ ( x ) dx (cid:82) − ρ ( x ) dx , for i = 1 , , . In order to numerically integrate both the denominator and numerator, we used a regular one-dimensionalsparse grid of level (cid:96) = 7 (the number of nodes is 65). Our goal here is to illustrate the method and toshow the trajectory of the solutions after each iteration of the inner loop m and outer loop i . In Figure 2,we show the surface of F ( λ , λ , λ ) = 0 (grey). For i = 1 , we solve the F ( λ , ,

0) = 0 , after threeiterations ( m = 3 ) the solution converges to λ = 2 . (see Table 1). For i = 2 , we start with this solutionand introduce the second variable λ for solving the second equation F ( λ , λ ,

0) = 0 with constraint F ( λ , λ ,

0) = 0 . Here, the solution follows the path λ = h ( λ ) thanks to the implicit function theorem(black curve). Numerically, a sequence of (green) points following this path converges to a point thatsatisﬁes F ( λ , λ ,

0) = F ( λ , λ ,

0) = 0 (the green point in the intersection between black and red curvesin Figure 2). In the next iteration i = 3 , we introduce the third variable λ for solving the third equation F ( λ , λ , λ ) = 0 with constraints F ( λ , λ , λ ) = F ( λ , λ , λ ) = 0 . By the implicit function theorem,we have ( λ , λ ) = h ( λ ) that satisﬁes F ( h ( λ ) , λ ) = F ( h ( λ ) , λ ) = 0 , which is shown in red curvein Figure 2. On this red curve, we have a sequence of (blue) points which converges to the solution of thefull system (cyan point shown in Figure 2). The coordinate of the solution on each iteration is shown inTable 1. Notice that the solutions always lie on the surface F ( λ , λ , λ ) = 0 . Example 2.

We consider a one-dimensional example with up to order-six moment constraints with explicitsolution given by, ρ ( x ) ∝ exp (cid:16) x + 16 x + 24 x + 96 x − x − x (cid:17) , as shown in Figure 3. This example is a tough test problem since the solution, λ = (2 , , , , − , ,has components of order − . Similar to Example 1, we compute the moments f i by using a one-dimensional sparse grid of level (cid:96) = 7 (65 nodes). The EBE algorithm converges to the exact solution witherror, (cid:107) λ − λ ∗ (cid:107) = 5 . × − . Since the numerical experiment is performed with an initial condition α j = 0 that is far from the solution, this result demonstrates a global convergence of the EBE method. able 1. The coordinate of the solutions of Example 1 for each iteration, starting from(0 , , i , the EBE takes few iterates ( m ) to ﬁnd the i − dimensionalsolution, ﬁxing λ j = α j = 0 for j > i . (cid:72)(cid:72)(cid:72)(cid:72)(cid:72) m i -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 x ρ ( x ) Real solutionNumerical solution

Figure 3.

The unnormalized density ρ ( x ) in Example 2 Next, we investigate the sensitivity of the estimates to the number of sparse grid points used in approxi-mating the integral. In our numerical experiments, we estimate the true moments f i using one-dimensionalsparse grid of level (cid:96) = 20 (524,289 nodes) and feed these moment estimates into the EBE algorithm. InFigure 4, we show the error in λ (with (cid:96) metric) for diﬀerent levels of the sparse grid from 6 to 15 that areused in the EBE method. Notice that the error decreases as a function of (cid:96) and the improvement becomesnegligible for (cid:96) > . Example 3.

In this example, we consider a d -dimensional example with an explicit solution, ρ ( x ) ∝ exp( − x + x − x − x − . x ) , on domain Ω = [ − , d where we will vary d = 4 , . . . , . For these simulations, we consider up to order-fourmoment constraints and ﬁx the sparse grid level (cid:96) = 8 to compute the integration.Here, the EBE method is able to estimate λ with (cid:96) -errors of order − (the error in λ is . × − and moments error is . × − ). In this computation, the dimensions of the nonlinear system are for d = 4 , for d = 5 , for d = 6 , and for d = 7 . Here, the EBE method is able to recover thetrue density even if we prescribe more constraints, corresponding to d larger than four. Example 4.

Next, we consider estimating a two-dimensional probability density of the two leading empir-ical orthogonal functions of a geophysical model for wind stress-driven large-scale oceanic model [18, 19] .This is exactly the same test example in the previously developed BFGS-based method [2, 5] . In fact, thetwo-dimensional density that we used here was supplied by Rafail Abramov. First, we compare the EBEmethod with the BFGS algorithm of [2] which code can be downloaded from [1] . In this comparison, we use level of sparse grid -10 || λ - λ * || Figure 4.

The solution error as a function of the number of sparse grid.

Table 2.

Summary of solutions for Example 4: Moment errors for diﬀerent algorithmswith diﬀerent grids.

Methods order4 6 8BFGS algorithm with uniform grid . × − . × − . × − EBE algorithm with uniform grid . × − . × − . × − EBE algorithm with sparse grid . × − . × − . × − Matlab fsolve.m with sparse grid . × − . × − . × − Newton with sparse grid . × − diverge divergethe same uniformly distributed grid points where the total number of nodes are ×

85 = 7 , . We setthe Newton’s tolerance of the EBE algorithm to be − . In Table 2 notice that the moment errors of theEBE are much smaller compared to those of the BFGS method.While the EBE is superior compared to BFGS, we should note that the BFGS method does not usethe Hessian of F i whereas the EBE does. For a fair comparison, we include results using the MATLABbuilt-in function fsolve.m , which default algorithm is the trust-region-dogleg (see the documentation fordetail [17] ). In our numerical implementation, we apply fsolve.m with a speciﬁed Hessian function F n .We also include the classical Newton’s method with a speciﬁed Hessian function F n . In this comparison,we use the same sparse grid of level (cid:96) = 11 (or 7,169 nodes) to compute the two-dimensional integral.Notice that the EBE method is still superior compared to these two schemes as reported in Table 2. In fact,Newton’s method does not converge for higher-order moment constraints. The joint two-dimensional PDFsare shown in Figure 5. The ﬁrst row is the two-dimensional density function provided by R. Abramov. Thesecond row shows the EBE estimates using up to order four-, six-, and eight-moment constraints. The thirdand fourth rows show the BFGS and MATLAB fsolve.m estimates, respectively. Example 5.

In this example, we consider estimating multidimensional densities of the solutions of theKuramoto-Sivashinsky equation. Here, the solutions are integrated with a fourth-order time-diﬀerencingmethod on equally spaced grid points over a domain of [0 , π ] as in [15] . We use initial condition u ( x,

0) = cos( x/ (16 ξ ))(1 + sin( x/ , where ξ ∼ U [0 , and integration time step of . . The datais generated by integrating 10,000 time steps. Based on this data set, we randomly select d componentsand estimate the d -dimensional joint density associated to these components. For visual comparison, we Measured PDF y x Order 4

Order 6 B F G S P D F EBE P D F Order 8 f s o l v e P D F Figure 5.

The 2D measured probability density functions supplied by R. Abramov (ﬁrstrow); PDFs computed by the EBE method (second row), BFGS algorithm (third row),and the MATLAB fsolve.m function (fourth row). also show the results from a two-dimensional kernel density estimation method [22, 21] as a reference.Numerically, we use the MATLAB built-in function, ksdensity.m . Note that the BFGS algorithm [2] doesnot work on this data set while the classical Newton’s method only converges for the two-dimensional case.We also show the corresponding results with the MATLAB fsolve.m with speciﬁed Hessian function as inthe previous example. The moment errors of these three schemes are reported in Table 3.In Figure 6, we show the two-dimensional density estimated by EBE algorithm compared to those fromthe fsolve.m , the classical Newton’s method, and the 2D kernel density estimate. For the two-dimensionalcase, the resulting densities are visually identical although the corresponding moment error of the EBEmethod is still the smallest compared to the Newton’s and the MATLAB fsolve.m (see Table 3). InFigure 7, we show the contour plot of the two-dimensional marginal densities obtained from solving the three-dimensional problem given four-moment constraints with the EBE method and the MATLAB fsolve.m . Fordiagnostic purpose, we also provide the corresponding contour plots of the two-dimensional kernel densityestimates. Notice that the MATLAB fsolve.m produces completely inaccurate estimate. The EBE method able 3. Summary of solutions for Example 5. d EBE method fsolve Newton2 . × − . × − . × − . × − . × − diverge4 . × − . diverge5 . × − . divergeproduces an estimate that qualitatively agrees to the corresponding two-dimensional KDE estimates. Theslight disagreement between these estimates are expected since we only provide up to order-four momentsinformation.In Figure 8, we show the results for the four-dimensional problem. We do not show the estimate fromthe MATLAB fsolve.m since it is not accurate at all. Here, we include more than four-order moments.Speciﬁcally, the total number of constraints for up to order-four moments is 70 while this result is based on87 constraints, including 17 additional higher-order moment constraints that include order-six moments, E [ x i ] , i = 1 , . . . , . See the movie of the density estimates for each iteration in the supplementary material [11] . Notice that the marginal densities estimated by the EBE look very similar to those estimated by thetwo-dimensional kernel density estimation. If more constraints are included, we found that we lose theconvexity of the polynomial terms in (4) . As we mentioned before, we need a better criteria to preserve theconvexity of the solutions.In Figure 9, we include the result from a ﬁve-dimensional simulation. We also do not show the estimatefrom the MATLAB fsolve.m since it is not accurate at all. In this ﬁve-dimensional case, the EBE methodautomatically discards 34 equations (moment constraints). In this case, we suspect that either the maximumentropy solution that accounts for all of the constraints does not exist or the EBE method cannot ﬁnd thesolution. Here, the EBE method just estimates the best ﬁtted solution within the tolerance of − bysolving 91 out of 125 moment constraints. Summary

In this paper, we introduced a novel equation-by-equation algorithm for solving a system of nonlinearequations arising from the moment constrained maximum entropy problem. Theoretically, we have es-tablished the local convergence and provided a suﬃcient condition for global convergence. Through theconvergence analysis, we understood that the method, geometrically, ﬁnds the solution by searching alongthe surface corresponding to one component of the nonlinear equations. Numerically, we have demon-strated its accuracy and eﬃciency on various examples. In one of the examples, we found that the EBEalgorithm produces more accurate solutions compared to the previously developed BFGS-based algorithmwhich does not use the Hessian information [2, 5]. In this same example, we also found that the EBEis superior compared to two schemes that use the Hessian information, including the current MATLABbuilt-in solver which uses the trust-region-dogleg algorithm and the classical Newton’s method.We also found that the proposed EBE algorithm is able to solve a system of 70-310 equations when themaximum entropy solution exists compared to the previously developed BFGS method which was shownto work for a system of size 44-83 equations. On the Kuramoto-Shivashinski example, the EBE methodis able to reconstruct the density of a four-dimensional problem accounting up to order-four moments (or70 constraints). In this case, we showed that the estimate is improved by accounting for 17 additionalconstraints of order-six moments. For the ﬁve-dimensional problem with moments up to order-four, theEBE method reconstructs the solution within the desired precision, 10 − , by automatically selecting asubset of 91 constraints from the total prescribed 125 constraints induced by moments of up to order-four.While the automatic constraint selection is a desirable feature since the maximum entropy solutionswithin the tolerance may not be easily estimated (nor theoretically available), further study is required to Measured PDF fsolve PDF

EBE PDF

Newton PDF

Figure 6.

The comparison of the density functions obtained by the EBE algorithm, theMATLAB fsolve.m function, Newton’s method, and the kernel density estimate (denotedas the measured pdf) for the two-dimensional case.fully take advantage of this feature. In particular, an important open problem is to develop a mathematicaltheory for ordering the constraints since the path of the solution is sensitive to the order of the constraints.Simultaneously, the ordering of the constraints need to preserve the convexity of the polynomials in theexponential term of (4). We should stress that the EBE method is computationally not the most eﬃcientmethod since it is designed to avoid singularities by tracking along the surface corresponding to onecomponent of the nonlinear equations. Therefore, a more eﬃcient EBE method will be one of futuredirections.

Appendix A. The detailed calculation of the Jacobian of the map H i In this Appendix, we will give the detailed computation for the Jacobian of the map H i in (14) evaluatedat µ ( i ) , the solution of F i ( λ i , α i +1 , . . . , α n ) = . Recall that for H i = ( H i, , H i, ) in (14), H i, ( λ i ) = g i − F i − , λ i − ( g i , H i, ) − F i − ( g i , H i, ) H i, ( λ i ) = λ i − (cid:16) ∂F i ∂λ i ( λ i ) (cid:17) − F i ( λ i ) , where g i : R i − → R i − is deﬁned as in (15).To take another derivative of H i, with respect to λ j . We use the fact that if F i − , λ i − is a nonsingularmatrix, then ∂∂λ j (cid:16) F i − , λ i − (cid:17) − = ( F i − , λ i − ) − ∂ F i − , λ i − ∂λ j ( F i − , λ i − ) − , solve PDF . . . . . . . . . . . . . . . -1 0 1 x1 -1-0.500.51 x . . . . . . . . . . . -1 0 1 x1 -1-0.500.51 x . . . . . . . . . . . -1 0 1 x2 -1-0.500.51 x KDE PDF . . . . . . . . . . . . . . . . . . . . . . -1 0 1 x1 -1-0.500.51 x . . . . . . . . . . . . . . . . . . -1 0 1 x1 -1-0.500.51 x . . . . . . . . . . . . . . . . . . . . . . -1 0 1 x2 -1-0.500.51 x EBE PDF . . . . . . . . . . . . . . . . . . . . . . -1 0 1 x1 -1-0.500.51 x . . . . . . . . . . . . . . . . . . . -1 0 1 x1 -1-0.500.51 x . . . . . . . . . . . . . . . . . . . . . . -1 0 1 x2 -1-0.500.51 x Figure 7.

The comparison of the two-dimensional marginal density functions obtainedby the MATLAB fsolve.m function (ﬁrst column), the EBE algorithm (second column)that solves a three-dimensional problem accounting up to order-four moment constraints,and the two-dimensional kernel density estimate (third column).and the Hessian ∂ F ∗ i − , λ i − ∂λ j is well-deﬁned, which are the Assumptions 1.2 and 1.3. We can deduce that for j = 1 , . . . , i , ∂ H i, ∂λ j = ∂ g i ∂λ j − ( F i − , λ i − ) − ( F i − , λ i − ) − ∂ F i − , λ i − ∂λ j ( F i − , λ i − ) − F i − − (cid:16) F i − , λ i − ) − ( F i − , λ i − ∂ g i ∂λ j + ∂ F i − ∂λ i ∂H i, ∂λ j (cid:17) , (21) ∂H i, ∂λ j = ∂λ i ∂λ j − ∂∂λ j (cid:16) ∂F i ∂λ i (cid:17) − F i − (cid:16) ∂F i ∂λ i (cid:17) − ∂F i ∂λ j . (22)Evaluating these two equations at µ ( i ) and using the fact that F ∗ i := F i ( µ ( i ) ) = , the second terms in theright-hand-side of (21)-(22) vanish and we have, ∂ H ∗ i, ∂λ j = ∂ g ∗ i ∂λ j − ( F ∗ i − , λ i − ) − ( F ∗ i − , λ i − ∂ g ∗ i ∂λ j + ∂ F ∗ i − ∂λ i ∂H ∗ i, ∂λ j )= − ( F ∗ i − , λ i − ) − (cid:16) ∂ F ∗ i − ∂λ i ∂H ∗ i, ∂λ j (cid:17) ,∂H ∗ i, ∂λ j = δ j,i − (cid:16) ∂F ∗ i ∂λ i (cid:17) − ∂F ∗ i ∂λ j . where δ j,i is one only if j = i and zero otherwise. DF computed by the EBE . . . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1x1-1-0.500.51 x . . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1x1-1-0.500.51 x . . . . . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1x2-1-0.500.51 x . . . . . . . . . . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1x3-1-0.500.51 x KDE PDF . . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1x1-1-0.500.51 x . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1x1-1-0.500.51 x . . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1x2-1-0.500.51 x . . . . . . . . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1x3-1-0.500.51 x Figure 8.

The comparison of the two-dimensional marginal density functions obtainedby the EBE algorithm (ﬁrst column) that solves a four-dimensional problem accountingmore than order-four moment constraints (see text for detail) and the two-dimensionalkernel density estimate (second column).

Acknowledgments

We thank Rafail Abramov for supplying the two-dimensional density data set for 4. The BFGS codethat we used for comparison in 4 was downloaded from [1]. . . . . . . . . . . . . . . . . . EBE PDF -1 -0.5 0 0.5 1 x1 -1-0.500.51 x . . . . . . . . . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1 x2 -1-0.500.51 x . . . . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1 x4 -1-0.500.51 x . . . . . . . . . . . . . . . . . . KDE PDF -1 -0.5 0 0.5 1 x1 -1-0.500.51 x . . . . . . . . . . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1 x2 -1-0.500.51 x . . . . . . . . . . . . . . . . . . . -1 -0.5 0 0.5 1 x4 -1-0.500.51 x Figure 9.

The comparison of the two-dimensional marginal density functions obtainedby the EBE algorithm (ﬁrst column) that solves a ﬁve-dimensional problem accounting theautomatically selected, 91 out of the prescribed 125 moments, and the two-dimensionalkernel density estimate (second column).

References [1] R. Abramov,

The multidimensional moment-constrained maximum entropy algorithm , https://github.com/rafail-abramov/MaxEntMC , 2007.[2] , The multidimensional moment-constrained maximum entropy problem: A BFGS algorithm with constraintscaling , Journal of Computational Physics (2009), no. 1, 96–108.[3] R. Abramov, A. Majda, and R. Kleeman,

Information theory and predictability for low-frequency variability , Journal ofthe atmospheric sciences (2005), no. 1, 65–87.[4] R.V. Abramov, An improved algorithm for the multidimensional moment-constrained maximum entropy problem , Journalof Computational Physics (2007), no. 1, 621–644.[5] ,

The multidimensional maximum entropy moment problem: A review of numerical methods , Communicationsin Mathematical Sciences (2010), no. 2, 377–392.[6] Timm Ahrendt, Fast computations of the exponential function , Annual Symposium on Theoretical Aspects of ComputerScience, Springer, 1999, pp. 302–312.[7] D. Bates, J. Hauenstein, A. Sommese, and C. Wampler,

Numerically solving polynomial systems with bertini , vol. 25,SIAM, 2013.[8] M. Frontini and A. Tagliani,

Maximum entropy in the ﬁnite stieltjes and hamburger moment problem , Journal of Math-ematical Physics (1994), no. 12, 6748–6756.[9] T. Gerstner and M.L. Griebel, Numerical integration using sparse grids , Numerical algorithms (1998), no. 3-4, 209–232.[10] W. Hao and J. Harlim, Supplementary material: MATLAB software for the Equation-by-equation method for solvingthe maximum entropy problem , https://github.com/whao2008/EBE , 2018.[11] , Supplementary movie for the Equation-by-equation method for solving the maximum entropy problem , , 2018.[12] W. Hao, J. Hauenstein, B. Hu, Y. Liu, A. Sommese, and Y.-T. Zhang, Continuation along bifurcation branches for atumor model with a necrotic core , Journal of Scientiﬁc Computing (2012), no. 2, 395–413.[13] K. Haven, A. Majda, and R. Abramov, Quantifying predictability through information theory: small sample estimationin a non-gaussian framework , Journal of Computational Physics (2005), no. 1, 334–362.

14] E.T. Jaynes,

Information theory and statistical mechanics , Physical review (1957), no. 4, 620.[15] A.-K. Kassam and L.N. Trefethen,

Fourth-order time-stepping for stiﬀ pdes , SIAM Journal on Scientiﬁc Computing (2005), no. 4, 1214–1233.[16] A. Leykin, J. Verschelde, and A. Zhao, Newton’s method with deﬂation for isolated singularities of polynomial systems ,Theoretical Computer Science (2006), no. 1, 111–122.[17] MathWorks,

Matlab fsolve.m , .[18] J.D. McCalpin, The statistics and sensitivity of a double-gyre model: The reduced-gravity, quasigeostrophic case , Journalof physical oceanography (1995), no. 5, 806–824.[19] J.D. McCalpin and D.B. Haidvogel, Phenomenology of the low-frequency variability in a reduced-gravity, quasigeostrophicdouble-gyre model , Journal of physical oceanography (1996), no. 5, 739–752.[20] L.R. Mead and N. Papanicolaou, Maximum entropy in the problem of moments , Journal of Mathematical Physics (1984), no. 8, 2404–2417.[21] E. Parzen, On estimation of a probability density function and mode , The annals of mathematical statistics (1962),no. 3, 1065–1076.[22] M. Rosenblatt, Remarks on some nonparametric estimates of a density function , The Annals of Mathematical Statistics (1956), no. 3, 832–837.[23] S.A. Smolyak, Quadrature and interpolation formulas for tensor products of certain classes of functions , Dokl. Akad.Nauk SSSR, vol. 4, 1963, p. 123.[24] A. Sommese and C. Wampler,

The numerical solution of systems of polynomials arising in engineering and science ,vol. 99, World Scientiﬁc, 2005.[25] L.N. Trefethen,

Is gauss quadrature better than clenshaw-curtis? , SIAM review (2008), no. 1, 67–87.[26] X. Wu, Calculation of maximum entropy densities with application to income distribution , Journal of Econometrics (2003), no. 2, 347–354.[27] Z. Wu, G.N. Phillips Jr, R. Tapia, and Y. Zhang,

A fast newton algorithm for entropy maximization in phase determi-nation , SIAM review (2001), no. 4, 623–642. Department of Mathematics, the Pennsylvania State University, University Park, PA

E-mail address : [email protected] Department of Mathematics, Department of Meteorology and Atmospheric Science, the Pennsylvania StateUniversity, University Park, PA

E-mail address : [email protected]@psu.edu