[PDF] Data-based Automatic Discretization of Nonparametric Distributions

Abstract

Although using non-Gaussian distributions in economic models has become increasingly popular, currently there is no systematic way for calibrating a discrete distribution from the data without imposing parametric assumptions. This paper proposes a simple nonparametric calibration method based on the Golub-Welsch algorithm for Gaussian quadrature. Application to an optimal portfolio problem suggests that assuming Gaussian instead of nonparametric shocks leads to up to 17% overweighting in the stock portfolio because the investor underestimates the probability of crashes.

Full PDF

aa r X i v : . [ q -f i n . E C ] M a y Data-based Automatic Discretization ofNonparametric Distributions

Alexis Akira Toda ∗ May 13, 2019

Abstract

Although using non-Gaussian distributions in economic models hasbecome increasingly popular, currently there is no systematic way for cal-ibrating a discrete distribution from the data without imposing paramet-ric assumptions. This paper proposes a simple nonparametric calibrationmethod based on the Golub and Welsch (1969) algorithm for Gaussianquadrature. Application to an optimal portfolio problem suggests thatassuming Gaussian instead of nonparametric shocks leads to up to 17%overweighting in the stock portfolio because the investor underestimatesthe probability of crashes.

Keywords: calibration, discrete approximation, Gaussian quadra-ture.

JEL codes:

C63, C65, G11.

This paper studies the following problem, which applied theorists often en-counter. A researcher would like to calibrate the parameters of a stochasticmodel. One of the model inputs is a probability distribution of shocks, whichis to be approximated by a discrete distribution. Due to computational con-siderations, the researcher would like this distribution to have as few supportpoints (nodes) as possible, say ﬁve. Given the data of shocks, how should theresearcher calibrate the nodes and probabilities of this ﬁve-point distribution?While there are many established methods for discretizing processes withGaussian shocks such as Tauchen (1986), Tauchen and Hussey (1991), and Rouwenhorst(1995), discretizing non-Gaussian distributions remains relatively unexplored.However, it has become increasingly common in economics to study modelswith non-Gaussian shocks. For example, the rare disasters model (Rietz, 1988;Barro, 2006; Gabaix, 2012) uses rare but large downward jumps to explain assetpricing puzzles. One issue with discretizing non-Gaussian distributions is howto calibrate them. If we have a parametric density, it is possible to discretizeit using the Gaussian quadrature as in Miller and Rice (1983) or the maximum ∗ Department of Economics, University of California San Diego. Email: [email protected]. See Farmer and Toda (2017) and the references therein for a detailed literature review. N -pointdistribution that approximates the data well without imposing parametric as-sumptions. Because the degree of freedom of an N -point distribution is large(2 N − N -point Gaussian quadrature with some weighting function using theGolub and Welsch (1969) algorithm, one only needs to know the moments ofthe weighting function up to order 2 N . Therefore a natural way to discretizea nonparametric distribution is simply to feed the 2 N sample moments intothe Golub-Welsch algorithm. Since this method does not involve optimization(it is a matter of solving for the eigenvalues/vectors of a sparse matrix), theimplementation is easy and fast.As an application, I discretize the U.S. historical stock returns data andsolve an optimal portfolio problem with constant relative risk aversion utility. Iconsider two cases in which the investor uses the nonparametric and Gaussiandensities. I show that when the investor incorrectly believes that the stockreturns distribution is lognormal, the stock portfolio is overweighted by up to17% because he underestimates the probability of crashes. These examples showthat the choice of the calibration method may matter quantitatively. The closest paper to mine is Miller and Rice (1983), who use the Gaussianquadrature to discretize distributions. While they consider only the discretiza-tion of parametric distributions, my focus is on the discretization of nonpara-metric distributions estimated from data. Tanaka and Toda (2013) considerthe discretization of distributions on preassigned nodes by matching the mo-ments using the maximum entropy principle, and Tanaka and Toda (2015) proveconvergence and obtain an error estimate. Farmer and Toda (2017) considerthe discretization of general non-Gaussian Markov processes by applying theTanaka-Toda method to conditional distributions. In one of the applications,they discretize a nonparametric density on a preassigned grid by approximatingit with a Gaussian mixture. Since computing the nodes and weights of Gaussianquadrature does not require optimizing over parameters (unlike the maximumlikelihood estimation of Gaussian mixture parameters or solving the maximumentropy problem), my method is easier and faster to implement, and the grid ischosen endogenously. On the other hand, the Farmer and Toda (2017) methodcan discretize general Markov processes, whereas the proposed method in thispaper is designed to discretize a single distribution.

Suppose for the moment that the nonparametric density f ( x ) is known. Sincestochastic models often involve expectations, we would like to ﬁnd nodes { x n } Nn =1 { w n } Nn =1 such thatE[ g ( X )] = Z ∞−∞ g ( x ) f ( x ) d x ≈ N X n =1 w n g ( x n ) , (2.1)where g is a general integrand and X is a random variable with density f ( x ).The right-hand side of (2.1) deﬁnes an N -point quadrature formula.When (2.1) is exact (i.e., ≈ becomes =) for all polynomials of degree up to D , we say that the quadrature formula has degree of exactness D . Since thedegree of freedom in an N -point quadrature formula is 2 N (because there are N nodes and N weights), we cannot expect to integrate more than 2 N monomials f ( x ) = 1 , x, . . . , x N − exactly. When the quadrature formula (2.1) is exact forthese monomials, or equivalently when it has degree of exactness 2 N −

1, wecall the formula

Gaussian . The following Golub-Welsch algorithm provides aneﬃcient way to compute the nodes and weights of the Gaussian quadrature.(Appendix B provides more theoretical background.)

Algorithm 1 (Golub and Welsch, 1969) .

1. Select a number of quadrature nodes N ∈ N .2. For k = 0 , , . . . , N , compute the k -th moment of the density m k = R x k f ( x ) d x .3. Deﬁne the matrix of moments M = ( M ij ) ≤ i,j ≤ N +1 by M ij = m i + j − .4. Compute the Cholesky factorization M = R ′ R . Let R =( r ij ) ≤ i,j ≤ N +1 .5. Deﬁne α = r /r , α n = r n,n +1 r nn − r n − ,n r n − ,n − ( n = 2 , . . . , N ), and β n = r n +1 ,n +1 r nn ( n = 1 , . . . , N − N × N symmetrictridiagonal matrix T N =  α β · · · β α β . . . ...0 β α . . . 0... . . . . . . . . . β N − · · · β N − α N  . (2.2)6. Compute the eigenvalues { x n } Nn =1 of T N and the corresponding eigen-vectors { v n } Nn =1 . { x n } Nn =1 are the nodes of the Gaussian quadratureand the weights { w n } Nn =1 are given by w n = m v n / k v n k >

0, where v n = ( v n , . . . , v nn ) ′ .Once we compute the nodes { x n } Nn =1 and weights { w n } Nn =1 , we can use themas the discrete approximation of the density f .3ote that the only inputs to the Golub-Welsch algorithm 1 are the num-ber of nodes N and the moments m k = R x k f ( x ) d x of the density f , where k = 0 , . . . , N . Therefore a natural idea for discretizing a nonparametric densitygiven the data { x i } Ii =1 is simply to feed the sample moments into the Golub-Welsch algorithm 1. Summarizing the above observations, we obtain the fol-lowing algorithm for the data-based automatic discretization of nonparametricdistributions. Algorithm 2 (Automatic discretization of nonparametric distributions) .

1. Given the data { x i } Ii =1 and the desired number of discrete points N ,for k = 0 , . . . , N compute the k -th sample moment b m k = 1 I I X i =1 x ki . (2.3)2. Feed these moments { b m k } Nk =0 into the Golub-Welsch algorithm 1 tocompute the nodes { ¯ x n } Nn =1 and weights { w n } Nn =1 . The desired dis-cretization assigns probability w n on the point ¯ x n .Because the N -point Gaussian quadrature has degree of exactness 2 N − N -point discretization matches the sample moments of dataup to order 2 N − b m k has the minimum mean-squared erroramong all estimates of the form P Ii =1 a i x ki , where { a i } Ii =1 are some weights,Algorithm 2 is in a sense optimal.Appendix A shows that the accuracy of the proposed method exceeds thatof using a parametric distribution when the latter is misspeciﬁed. In this section I illustrate the usefulness of the proposed method using minimaleconomic examples. Consider a CRRA investor with relative risk aversion γ >

R > R f > θ be the fraction of wealth (portfolio share) invested in the stock, the investor’soptimal portfolio problem ismax θ − γ E[( Rθ + R f (1 − θ )) − γ ] . (3.1)I obtain the annual data on U.S. nominal stock returns, risk-free rate, andinﬂation for the period 1927–2016 from the spreadsheet of Amit Goyal. Forthe stock returns I use the CRSP volume-weighted index including dividends. I The spreadsheet is at .Using monthly or quarterly data give qualitatively similar results, though slightly less extremequantitatively. R f = 1 . R − log R f to obtain a discrete distribution with nodes { ¯ x n } Nn =1 and weights { w n } Nn =1 , where I choose the number of points N = 5(increasing the number of points further does not change the results). Thegross stock return in state n is deﬁned by R n = R f e ¯ x n , which occurs withprobability w n . Finally, I numerically solve the optimal portfolio problem (3.1).Figure 1 shows the results when we change the relative risk aversion in the range γ ∈ [1 , -0.15 -0.1 -0.05 0 0.05 0.1 0.15 Log consumption growth P r obab ili t y den s i t y HistogramNonparametricGaussian (a) Histogram and densities.

Relative risk aversion O p t i m a l po r tf o li o NonparametricGaussian (b) Optimal portfolio.

Relative risk aversion P o r tf o li o e rr o r ( % ) (c) Portfolio error. Figure 1: Excess returns distribution and numerical solution.Figure 1a shows the histogram of the log excess returns distribution as well asthe nonparametric kernel density estimator and the Gaussian distribution ﬁttedby maximum likelihood. We can see that the histogram and the nonparametricdensity have a long left tail corresponding to stock market crashes, which theGaussian distribution misses. Figure 1b shows the optimal portfolio θ for the twomodels. We can see that when the investor incorrectly believes that the stockreturns distribution is lognormal, he overweights the stock portfolio because heunderestimates the probability of crashes. Figure 1c shows the percentage ofthis overweight (portfolio error) θ G /θ NP −

1, where G and NP stand for Gaussianand nonparametric densities. The portfolio error is substantial, in the range of4–17%. 5

Concluding remarks

This paper has proposed a simple, automatic method for discretizing a non-parametric distribution, given the data. Using an asset pricing model and anoptimal portfolio problem as a laboratory, I showed that the error from using aparametric distribution (such as the Gaussian distribution) can be substantial.A natural extension is to consider the discretization of Markov processeswith nonparametric shocks. For example, one may be tempted to apply thekernel density estimation and Gaussian quadrature in the Tauchen and Hussey(1991) method to discretize the AR(1) process x t = ρx t − + ε t , where | ρ | < { ε t } ∞ t =0 are independent and identically dis-tributed according to some probability density function f . However, it is wellknown that the Tauchen-Hussey method is not accurate when the persistence ρ is moderately high (Flod´en, 2008). Using the AR(1) asset pricing model in theOnline Appendix of Farmer and Toda (2017) to evaluate the solution accuracy,I found that the Gaussian quadrature-based methods for discretizing Markovprocesses is even less accurate when the shock distribution is nonparametric.Therefore for such processes, it is preferable to use the Farmer and Toda (2017)maximum entropy method with an even-spaced grid (see their Section 4.3.3 foran example).Finally, although I proposed my method as a tool for discretization, it canalso be used as a quadrature method. For example, Pohl et al. (2018) solve theBansal and Yaron (2004) long run risks model using the projection method andGauss-Hermite quadrature, but that is because the model is assumed to haveGaussian shocks. If instead a researcher wishes to use nonparametric shocks,my method can be directly used to construct a quadrature rule from data. References

Ravi Bansal and Amir Yaron. Risks for the long run: A potential resolutionof asset pricing puzzles.

Journal of Finance , 59(4):1481–1509, August 2004.doi:10.1111/j.1540-6261.2004.00670.x.Robert J. Barro. Rare disasters and asset markets in the twenti-eth century.

Quarterly Journal of Economics , 121(3):823–866, 2006.doi:10.1162/qjec.121.3.823.Leland E. Farmer and Alexis Akira Toda. Discretizing nonlinear, non-GaussianMarkov processes with exact conditional moments.

Quantitative Economics ,8(2):651–683, July 2017. doi:10.3982/QE737.Martin Flod´en. A note on the accuracy of Markov-chain approximations tohighly persistent AR(1) processes.

Economics Letters , 99(3):516–520, June2008. doi:10.1016/j.econlet.2007.09.040.Xavier Gabaix. Variable rare disasters: An exactly solved framework for tenpuzzles in macro-ﬁnance.

Quarterly Journal of Economics , 127(2):645–700,May 2012. doi:10.1093/qje/qjs001. 6ene H. Golub and John H. Welsch. Calculation of Gauss quadra-ture rules.

Mathematics of Computation , 23(106):221–230, may 1969.doi:10.1090/S0025-5718-69-99647-1.Allen C. Miller, III and Thomas R. Rice. Discrete approximations of prob-ability distributions.

Management Science , 29(3):352–362, March 1983.doi:10.1287/mnsc.29.3.352.Walter Pohl, Karl Schmedders, and Ole Wilms. Higher-order eﬀects in assetpricing models with long-run risks.

Journal of Finance , 73(3):1061–1111,June 2018. doi:10.1111/joﬁ.12615.Thomas A. Rietz. The equity risk premium: A solution.

Journal of MonetaryEconomics , 22(1):117–131, July 1988. doi:10.1016/0304-3932(88)90172-9.K. Geert Rouwenhorst. Asset pricing implications of equilibrium business cyclemodels. In Thomas F. Cooley, editor,

Frontiers of Business Cycle Research ,chapter 10, pages 294–330. Princeton University Press, 1995.Ken’ichiro Tanaka and Alexis Akira Toda. Discrete approximations of contin-uous distributions by maximum entropy.

Economics Letters , 118(3):445–450,March 2013. doi:10.1016/j.econlet.2012.12.020.Ken’ichiro Tanaka and Alexis Akira Toda. Discretizing distributions with ex-act moments: Error estimate and convergence analysis.

SIAM Journal onNumerical Analysis , 53(5):2158–2177, 2015. doi:10.1137/140971269.George Tauchen. Finite state Markov-chain approximations to univari-ate and vector autoregressions.

Economics Letters , 20(2):177–181, 1986.doi:10.1016/0165-1765(86)90168-0.George Tauchen and Robert Hussey. Quadrature-based methods for obtainingapproximate solutions to nonlinear asset pricing models.

Econometrica , 59(2):371–396, March 1991. doi:10.2307/2938261.7 nline AppendixA Accuracy

As in any numerical method, evaluating the accuracy is very important. Inthis section I evaluate the accuracy of the proposed method using the optimalportfolio problem in Section 3 as a laboratory.I design the numerical experiment as follows. First I ﬁt a Gaussian mix-ture distribution with two components to the annual log excess returns data.The proportion, mean, and standard deviation of each mixture components are p = ( p j ) = (0 . , . µ = ( µ j ) = ( − . , . σ = ( σ j ) =(0 . , . γ ∈ { , , } using the Gaussian quadrature (Golub-Welschalgorithm 1) for Gaussian mixtures with 11 points. Finally, I generate randomnumbers from this Gaussian mixture with various sample sizes, discretize thesedistributions with various methods, and compute the optimal portfolio. I re-peat this procedure with M = 1,000 Monte Carlo replications and compute therelative bias and mean absolute error (MAE)Bias = 1 M M X m =1 (cid:16)b θ m /θ ∗ − (cid:17) , (A.1a)MAE = 1 M M X m =1 (cid:12)(cid:12)(cid:12)b θ m /θ ∗ − (cid:12)(cid:12)(cid:12) , (A.1b)where b θ m is the optimal portfolio from simulation m and θ ∗ is the theoreticaloptimal portfolio. For the sample size I consider T = 100 , , N = 3 , , ,

9. For the discretiza-tion method I consider three cases. The ﬁrst is the nonparametric Gaussianquadrature method (Algorithm 2), which I refer to as “NP-GQ”. The secondis the Gauss-Hermite quadrature, where the mean and standard deviation areestimated by maximum likelihood. This is the most natural method if thereturns distribution is lognormal. The third is the maximum entropy methodproposed by Tanaka and Toda (2013, 2015) and Farmer and Toda (2017) wherethe kernel density estimator (with Gaussian kernel) is fed into, which I refer toas “NP-ME”. For this method one needs to assign the grid and the number ofmoments to match. Following Corollary 3.5 of Farmer and Toda (2017), I usean even-spaced grid centered at the sample mean that spans p N −

1) timesthe sample standard deviation at both sides, where N is the number of gridpoints. I match 4 sample moments whenever possible for N ≥

5, and otherwiseI match 2 sample moments (mean and variance). For more details on the exactalgorithm, please refer to Tanaka and Toda (2013, 2015) and Sections 2 and 3.2of Farmer and Toda (2017).Tables 1 and 2 show the relative bias and mean absolute error of the op-timal portfolio, respectively. As expected, the optimal portfolio computed us-ing Gauss-Hermite is biased upwards because it uses the Gaussian distribution,which underestimates the probability of crashes. Among the two nonparametricdiscretization methods, NP-GQ uniformly outperforms NP-ME in terms of bias8nd mean absolute error, especially when the sample size is small ( T = 100).For N = 3 grid points, in which case it is impossible to match 4 moments withNP-ME, the proposed NP-GQ method performs signiﬁcantly better. Finally,increasing N beyond 5 does not improve the bias or the mean absolute error forNP-GQ, which suggests that using a ﬁve-point distribution is enough (at leastfor solving this portfolio problem).Table 1: Relative bias of the optimal portfolio.Method NP-GQ Gauss-Hermite NP-ME T N γ = 2 4 6 γ = 2 4 6 γ = 2 4 6100 3 0.054 0.053 0.053 0.168 0.123 0.109 0.140 0.113 0.1055 0.051 0.053 0.053 0.159 0.123 0.109 0.096 0.082 0.0777 0.051 0.053 0.053 0.158 0.123 0.109 0.082 0.074 0.0709 0.051 0.053 0.053 0.157 0.123 0.109 0.076 0.071 0.0681,000 3 0.005 0.005 0.005 0.105 0.060 0.047 0.089 0.059 0.0505 0.004 0.005 0.005 0.103 0.060 0.047 0.031 0.019 0.0167 0.004 0.005 0.005 0.103 0.060 0.047 0.022 0.014 0.0119 0.004 0.005 0.005 0.103 0.060 0.047 0.018 0.012 0.01010,000 3 0.001 0.001 0.001 0.098 0.054 0.041 0.084 0.054 0.0455 0.001 0.001 0.001 0.098 0.054 0.041 0.020 0.011 0.0087 0.001 0.001 0.001 0.098 0.054 0.041 0.012 0.006 0.0049 0.001 0.001 0.001 0.098 0.054 0.041 0.008 0.004 0.003 Note: the table reports the relative bias of the optimal portfolio deﬁned by (A.1a). Fordiscretization methods, “NP-GQ” uses Algorithm 2, “Gauss-Hermite” uses the Gauss-Hermitequadrature (with mean and standard deviation estimated by maximum likelihood), and “NP-ME” uses the maximum entropy method with the kernel density estimator. T is the samplesize in each simulation. N is the number of nodes in the quadrature formula. γ is the relativerisk aversion. All results are based on 1,000 Monte Carlo replications. B Gaussian quadrature

In this appendix we prove some properties of the Gaussian quadrature. For no-tational simplicity let us omit a, b (so R means R ba ) and assume that R w ( x ) x n d x exists for all n ≥

0. For functions f, g , deﬁne the inner product ( f, g ) by( f, g ) = Z ba w ( x ) f ( x ) g ( x ) d x. (B.1)As usual, deﬁne the norm of f by k f k = p ( f, f ). The ﬁrst step is to constructorthogonal polynomials { p n ( x ) } Nn =0 corresponding to the inner product (B.1). Deﬁnition 1 (Orthogonal polynomial) . The polynomials { p n ( x ) } Nn =0 are called orthogonal if (i) deg p n = n and the leading coeﬃcient of p n is 1, and (ii) for all m = n , we have ( p m , p n ) = 0.Some authors require that the polynomials are orthonormal, so ( p n , p n ) = 1.In this paper we normalize the polynomials by requiring that the leading coeﬃ-cient is 1, which is useful for computation. The following three-term recurrence9able 2: Relative mean absolute error of the optimal portfolio.Method NP-GQ Gauss-Hermite NP-ME T N γ = 2 4 6 γ = 2 4 6 γ = 2 4 6100 3 0.239 0.247 0.249 0.323 0.306 0.301 0.296 0.293 0.2925 0.236 0.247 0.249 0.314 0.305 0.301 0.267 0.271 0.2707 0.236 0.247 0.249 0.313 0.305 0.301 0.256 0.264 0.2659 0.236 0.247 0.249 0.312 0.305 0.301 0.251 0.262 0.2631,000 3 0.068 0.072 0.073 0.125 0.100 0.095 0.112 0.098 0.0955 0.067 0.072 0.073 0.124 0.101 0.095 0.078 0.078 0.0787 0.067 0.072 0.073 0.124 0.101 0.095 0.074 0.076 0.0769 0.067 0.072 0.073 0.124 0.101 0.095 0.072 0.075 0.07610,000 3 0.021 0.023 0.023 0.098 0.056 0.045 0.084 0.056 0.0485 0.021 0.023 0.023 0.098 0.056 0.045 0.029 0.025 0.0257 0.021 0.023 0.023 0.098 0.056 0.045 0.024 0.024 0.0249 0.021 0.023 0.023 0.098 0.056 0.045 0.023 0.023 0.023 Note: the table reports the relative mean absolute error of the optimal portfolio deﬁned by(A.1b). See Table 1 for the deﬁnition of variables. relation (TTRR) shows the existence of orthogonal polynomials and providesan explicit algorithm for computing them.

Proposition 2 (Three-term recurrence relation, TTRR) . Let p ( x ) = 1 , p ( x ) = x − ( xp ,p ) k p k , and for n ≥ deﬁne p n +1 ( x ) = x − ( xp n , p n ) k p n k ! p n ( x ) − k p n k k p n − k p n − ( x ) . (B.2) Then p n ( x ) is the degree n orthogonal polynomial.Proof. Let us show by induction on n that (i) p n is an degree n polynomial withleading coeﬃcient 1, and (ii) ( p n , p m ) = 0 for all m < n . The claim is trivialfor n = 0. For n = 1, by construction p is a degree 1 polynomial with leadingcoeﬃcient 1, and since p ( x ) = 1, we obtain( p , p ) = x − ( xp , p ) k p k ! p , p ! = ( xp , p ) − ( xp , p ) = 0 . Suppose the claim holds up to n . Then for n + 1, by (B.2) the leadingcoeﬃcient of p n +1 is the same as that of xp n , which is 1. If m = n , then( p n +1 , p n ) = x − ( xp n , p n ) k p n k ! p n − k p n k k p n − k p n − , p n ! = ( xp n , p n ) − ( xp n , p n ) − k p n k k p n − k ( p n − , p n ) = 0 . m = n −

1, then( p n +1 , p n − ) = x − ( xp n , p n ) k p n k ! p n − k p n k k p n − k p n − , p n − ! = ( xp n , p n − ) − ( xp n , p n ) k p n k ( p n , p n − ) − k p n k = ( p n , xp n − ) − k p n k . Since the leading coeﬃcients of p n , p n − are 1, we can write xp n − ( x ) = p n ( x ) + q ( x ), where q ( x ) is a polynomial of degree at most n −

1. Clearly q can beexpressed as a linear combination of p , p , . . . , p n − , so ( p n , q ) = 0. Therefore( p n +1 , p n − ) = ( p n , p n + q ) − k p n k = k p n k + ( p n , q ) − k p n k = 0 . Finally, if m < n −

1, then( p n +1 , p m ) = x − ( xp n , p n ) k p n k ! p n − k p n k k p n − k p n − , p m ! = ( xp n , p m ) − ( xp n , p n ) k p n k ( p n , p m ) − k p n k k p n − k ( p n − , p m )= ( p n , xp m ) = 0because xp m is a polynomial of degree 1 + m < n .The following lemma shows that an degree n orthogonal polynomial hasexactly n real roots (so they are all simple). Lemma 3. p n ( x ) has exactly n real roots on ( a, b ) .Proof. By the fundamental theorem of algebra, p n ( x ) has exactly n roots in C .Suppose on the contrary that p n ( x ) has less than n real roots on ( a, b ). Let x , . . . , x k ( k < n ) those roots at which p n ( x ) changes its sign. Let q ( x ) =( x − x ) · · · ( x − x k ). Since p n ( x ) q ( x ) > <

0) almost everywhere on ( a, b ),we have ( p n , q ) = Z w ( x ) p n ( x ) q ( x ) d x = 0 . On the other hand, since deg q = k < n , we have ( p n , q ) = 0, which is acontradiction.The following theorem shows that using the N roots of the degree N orthog-onal polynomial p N ( x ) as quadrature nodes and choosing speciﬁc weights, wecan integrate all polynomials of degree up to 2 N − Theorem 4 (Gaussian quadrature) . Let a < x < · · · < x N < b be the N rootsof the degree N orthogonal polynomial p N and deﬁne w n = Z w ( x ) L n ( x ) d x or n = 1 , . . . , N , where L n ( x ) = Y m = n x − x m x n − x m is the degree N − polynomial that takes value 1 at x n and 0 at x m ( m ∈{ , . . . , N } \ n ). Then Z w ( x ) p ( x ) d x = N X n =1 w n p ( x n ) (B.3) for all polynomials p ( x ) of degree up to N − .Proof. Since deg p ≤ N − p N = N , we can write p ( x ) = p N ( x ) q ( x ) + r ( x ) , where deg q, deg r ≤ N −

1. Since q can be expressed as a linear combination oforthogonal polynomials of degree up to N −

1, we have ( p N , q ) = 0. Hence Z w ( x ) p ( x ) d x = ( p N , q ) + Z w ( x ) r ( x ) d x = Z w ( x ) r ( x ) d x. On the other hand, since { x n } Nn =1 are roots of p N , we have p ( x n ) = p N ( x n ) q ( x n ) + r ( x n ) = r ( x n )for all n , so in particular N X n =1 w n p ( x n ) = N X n =1 w n r ( x n ) . Therefore it suﬃces to show (B.3) for polynomials r of degree up to N −

1. Letus show that r ( x ) = N X n =1 r ( x n ) L n ( x )identically. To see this, let ˜ r be the right-hand side. Since L n ( x m ) = δ mn (Kronecker’s delta), we have˜ r ( x m ) = N X n =1 r ( x n ) L n ( x m ) = N X n =1 δ mn r ( x n ) = r ( x m ) , so r and ˜ r agree on N distinct points { x n } Nn =1 . Since each L n ( x ) is a degree N − r ≤ N −

1. Therefore it must be r = ˜ r .Since r can be represented as a linear combination of L n ’s, it suﬃces to show(B.3) for all L n ’s. But since by deﬁnition Z w ( x ) L n ( x ) d x = w n = N X m =1 w m δ mn = N X m =1 w m L n ( x m ) , the claim is true. 12n practice, how can we compute the nodes { x n } Nn =1 and weights { w n } Nn =1 of the N -point Gaussian quadrature? The solution is given by the followingGolub-Welsch algorithm. Theorem 5 (Golub and Welsch, 1969) . For each n ≥ , deﬁne α n , β n by α n = ( xp n − , p n − ) k p n − k , β n = k p n kk p n − k > . Deﬁne the N × N symmetric tridiagonal matrix T N as in (2.2) . Then the Gaus-sian quadrature nodes { x n } Nn =1 are eigenvalues of T N . Letting v n = ( v n , . . . , v nn ) ′ be an eigenvector of T N corresponding to eigenvalue x n , then the weights { w n } Nn =1 are given by w n = v n k v n k Z w ( x ) d x > . (B.4) Proof.

By (B.2) and the deﬁnition of α n , β n , for all n ≥ p n +1 ( x ) = ( x − α n +1 ) p n ( x ) − β n p n − ( x ) . Note that this is true for n = 0 by deﬁning p − ( x ) = 0 and β = 0. For each n , let p ∗ n ( x ) = p n ( x ) / k p n k be the normalized orthogonal polynomial. Then theabove equation becomes k p n +1 k p ∗ n +1 ( x ) = k p n k ( x − α n +1 ) p ∗ n ( x ) − k p n − k β n p ∗ n − ( x ) . Dividing both sides by k p n k >

0, using the deﬁnition of β n , β n +1 , and rearrang-ing terms, we obtain β n p ∗ n − ( x ) + α n +1 p ∗ n ( x ) + β n +1 p ∗ n +1 ( x ) = xp ∗ n ( x ) . In particular, setting x = x k (where x k is a root of p N ), we obtain β n p ∗ n − ( x k ) + α n +1 p ∗ n ( x k ) + β n +1 p ∗ n +1 ( x k ) = x k p ∗ n ( x k ) . for all n and k = 1 , . . . , N . Since β = 0 by deﬁnition and p ∗ N ( x k ) = 0 (since x k is a root of p N and hence p ∗ N = p N / k p N k ), letting P ( x ) = ( p ∗ ( x ) , . . . , p ∗ N − ( x )) ′ and collecting the above equation into a vector, we obtain T N P ( x k ) = x k P ( x k )for k = 1 , . . . , N . Deﬁne the N × N matrix P by P = ( P ( x ) , . . . , P ( x N )). Then T N P = diag( x , . . . , x N ) P , so x , . . . , x N are eigenvalues of T N provided that P is invertible. Now since { p ∗ n } N − n =0 are normalized and Gaussian quadratureintegrates all polynomials of degree up to 2 N − δ mn = ( p ∗ m , p ∗ n ) = Z w ( x ) p ∗ m ( x ) p ∗ n ( x ) d x = N X k =1 w k p ∗ m ( x k ) p ∗ n ( x k )for m, n ≤ N −

1. Letting W = diag( w , . . . , w N ), this equation becomes P W P ′ = I . Therefore P, W are invertible and x , . . . , x N are eigenvalues of T N . Solving for W and taking the inverse, we obtain W − = P ′ P ⇐⇒ w n = N − X k =0 p ∗ k ( x n ) > n . To show (B.4), let v n be an eigenvector of T N corresponding toeigenvalue x n . Then v n = cP ( x n ) for some constant c = 0. Taking the norm,we obtain k v n k = c k P ( x n ) k = c N − X k =0 p ∗ k ( x n ) = c w n ⇐⇒ w n = c k v n k . Comparing the ﬁrst element of v n = cP ( x n ), noting that p ( x ) = 1 and hence p ∗ = p / k p k = 1 / k p k , we obtain c = v n k p k = v n Z w ( x ) p ( x ) d x = v n Z w ( x ) d x,x,