aa r X i v : . [ m a t h . S T ] J a n Differential equations, splines andGaussian processes
Lars Lau RaketFebruary 8, 2021
Abstract
We explore the connections between Green’s functions for certain dif-ferential equations, covariance functions for Gaussian processes, andthe smoothing splines problem. Conventionally, the smoothing splineproblem is considered in a setting of reproducing kernel Hilbert spaces,but here we present a more direct approach. With this approach, somechoices that are implicit in the reproducing kernel Hilbert space settingstand out, one example being choice of boundary conditions and moreelaborate shape restrictions.The paper first explores the Laplace operator and the Poisson equa-tion and studies the corresponding Green’s functions under variousboundary conditions and constraints. Explicit functional forms arederived in a range of examples. These examples include several novelforms of the Green’s function that, to the author’s knowledge, havenot previously been presented. Next we present a smoothing splineproblem where we penalize the integrated squared derivative of thefunction to be estimated. We then show how the solution can be ex-plicitly computed using the Green’s function for the Laplace operator.In the last part of the paper, we explore the connection between Gaus-sian processes and differential equations, and show how the Laplaceoperator is related to Brownian processes and how processes that arisedue to boundary conditions and shape constraints can be viewed asconditional Gaussian processes. The presented connection betweenGreen’s functions for the Laplace operator and covariance functions forBrownian processes allows us to introduce several new novel Brownianprocesses with specific behaviors. Finally, we consider the connectionbetween Gaussian process priors and smoothing splines.This paper was originally developed as part of the teaching mate-rial for a graduate course in functional data analysis held in 2015 atDepartment of Mathematical Sciences, University of Copenhagen.
Splines are piecewise polynomials that are required to have certain smooth-ness properties. The introductory quote, by one of the early pioneers in thedevelopment of splines, encapsulates much of the feeling of working withsplines. Splines are truly wonderful and are applicable to a large body ofproblems, but to understand the details of how they arise and how they areconnected to various branches of mathematics and statistics require care.During the last half century, smoothing splines have become a fundamen-tal tool for estimating continuous functions from discretely observed data.In its typical formulation, the smoothing spline problem involves finding thefunction θ that minimizes a discrete Gaussian likelihood term that mea-sures the deviation from observed data plus a penalty term that measuresthe roughness of the function. The conventional choice of roughness measureis in terms of the integral of its squared second derivative. Let y , . . . , y m denote observed data with associated continuous covariate values t , . . . , t m .In this note, we will think of the t -variables as representing time and denotethem as such, but in practice they can represent any continuous covariate.The classical smoothing spline estimator is defined as the minimizerˆ θ = arg min θ m X i =1 ( y i − θ ( t i )) + λ Z T | θ ′′ ( t ) | d t. (1)It was shown by Holladay (1957) that the function that minimizes the rough-ness term R T | θ ′′ ( t ) | d t under the constraint of interpolating y , . . . , y m attime points t < · · · < t m is a cubic spline with the so-called natural bound-ary contions θ ′′ ( t ) = θ ′′ ( t m ) = 0. It easily follows that the solution to(1) must be a natural cubic smoothing spline, in fact, one can replace thequadratic data term with any reasonable pointwise data term and the solu-tion will still be a natural cubic smoothing spline. The argument is simple:Given any candidate solution ˜ θ , choose ˆ θ as the natural cubic spline thatinterpolates ˜ θ in the points t , . . . , t m . The pointwise data term will producethe same measure of deviation for ˜ θ and ˆ θ , but the roughness term for theinterpolating cubic spline ˆ θ will always be less than or equal to the roughnessterm for ˜ θ .Since Holladay’s results, much work has been devoted to studying thesmoothing spline problem (1) and its generalizations. The theory and method-ology for smoothing splines matured greatly when Grace Wahba and collab-orators formulated and solved the problem in the setting of reproducing2ernel Hilbert spaces (RKHS) (Kimeldorf and Wahba, 1971; Wahba, 1990).While the RKHS approach is powerful, it is not easily accessible in terms ofthe required level of mathematical skill and the generality of the approachmay lead to underappreciation of some of the finer details of the consideredproblems. The avid reader may for example have noticed how the smoothingspine problem (1) did not mention the space over which the problem wasconsidered. The classical RKHS approach will simply choose the functionspace to minimize over as the least restrictive space, but from the perspec-tive of a statistician that choice may not align with the knowledge of theproblem at hand. For simplicity, the rest of this paper will explore a slightlysimpler smoothing spline problem, namelyˆ θ = arg min θ m X i =1 ( y i − θ ( t i )) + λ Z T | θ ′ ( t ) | d t, (2)where the roughness penalty is replaced by the integrated squared derivativeof the function. The solution to this problem will in general be a piecewiselinear function with knots in t , . . . , t m as shown by Schoenberg (1964), butwe show here how we may get complex behavior of the solution by imposingdifferent criteria on the space we are minimizing over. We even show thatby imposing sufficiently strict criteria, we may get pathological behavior ofthe spines, for example such that they become infinitely smooth polynomi-als. Most of the approaches and connections presented here will seamlesslytransfer to the classical smoothing spline problem (1) and the general L-spline problem. While requiring some determination, explicit forms of var-ious Green’s functions defining the solutions to the classical spline problemcan be derived (see e.g. Rytgaard, 2016.)The approach taken here is related to and influenced by a number of pre-vious works. One of them is the work by Markussen (2013) where the con-nection between Green’s functions and smoothing splines is used to developa framework for approximate inference for mixed smoothing spline mod-els. Along the way, Markussen presents a result (Markussen, 2013:Theorem1) that gives an explicit (although quite complex) form for Green’s func-tion associated with constant-coefficient linear differential operators undervarious boundary conditions. This approach was extended to data on mul-tidimensional domains such as [0 , d by Raket and Markussen (2014). Inthis paper, the connection between Green’s function for various differentialoperators and covariance functions for known Gaussian processes such as theBrownian sheet, the tied-down Brownian sheet and the Mat´ern processes areexplored in examples. Finally, parts of the work presented here has been ex-tended to the original smoothing spline problem (1) in Rytgaard (2016). Inthis work, Rytgaard presents explicit forms for the Green’s functions underall combinations of Dirichlet and Neumann boundary conditions along witha number of other extensions involving shape restrictions and alternativelikelihood terms. 3 Green’s functions and the Poisson equation
In this note we will consider the Poisson equation as our driving example.The equation is given by − ∂ t f ( t ) = h ( t ) (3)where − ∂ t is the Laplace operator − ∂ t f = − f ′′ , f : [0 , → R is twicedifferentiable in a suitable sense and h ∈ L ([0 , f asdouble integrals of h f ( t ) = − Z ta Z s a h ( s ) d s d s . Here we are however not too interested in the specific solutions to the differ-ential equation, but rather in the general problem of inverting the differentialoperator − ∂ t . Suppose an inverse operator G exists, then it should hold that − ∂ t G h ( t ) = h ( t )and thus this operator provides a general method for solving the differentialequation (3). When a differential operator L is invertible its inverse G will be an integraloperator with kernel G , G f ( t ) = Z G ( s, t ) f ( s ) d s. (4)The kernel G is referred to as the Green’s function for L . Green’s func-tions have two important properties that we will use extensively, but notprove here: firstly, Green’s functions are symmetric G ( s, t ) = G ( t, s ); andsecondly, the Green’s function as a function of the individual coordinates G ( s, · ) typically obey boundary conditions on the space of function underconsideration. In the cases we will consider here, the kernel will be a well-behaved function, but it will sometimes have a more complex structure. Ingeneral the Green’s function will be a distribution or generalized function .The most basic distribution is the Dirac delta at t , δ t , for which it holdsthat Z δ t ( s ) f ( s ) d s = f ( t ) . (5)It is important to note here that the above definition of the Dirac deltafunction only defines it as the point evaluation functional in combination4ith an inner product. Thus, it does not always have the standard formof the Dirac delta where one would think of it as a function with δ t ( s ) = 0when s = t with an infinitely large value at t corresponding to a point mass. Example 2.1.
Using distributions, it is possible to define derivatives ofdiscontinuous functions in a distributional sense. For example, suppose thatthe constant function 1( t ) = 1 is in the function space we are considering.Then for a ∈ [0 , Z t δ a ( s ) d s = [ a, ( t )where A ( t ) is the indicator function that is 1 when t ∈ A and 0 otherwise.Thus, in the distributional sense ∂ t [ a, ( t ) = δ a ( t ) . ◦ By applying the differential operator L on both sides equation (4), weget the important property of Green’s functions that f ( t ) = Z L G ( s, t ) f ( s ) d s, which gives rise to an alternative definition of Green’s functions: a Green’sfunction G for L is a distribution for which( L G ( s, · ))( t ) = δ t ( s ) (6)or equivalently L G ( s, · ) = δ s . Again, we note that this definition is in the distributional sense, thus, de-pending on the function space we are considering, the representation of δ t may vary. Example 2.2.
From Example 2.1 we see that G ( s, t ) = [ s, ( t ) is a Green’sfunction for ∂ t . ◦ Since L is a linear operator on a function space H , it is invertible ifand only if its kernel is trivialker( L ) = { f ∈ H : L f = o } = { o } where o ( t ) = 0 is the zero function. Example 2.3.
The kernel for the Laplace operator on the space of twicecontinuously differentiable functions H = C ([0 , f forwhich − ∂ t f ( t ) = 0 for all t. − ∂ t ) = { f ∈ H : f ( t ) = at + b } and the Laplace operator is not invertible on H . ◦ As the previous example showed, we may need to impose extra conditionson the function space H in order to identify a unique Green’s function for L . We may however be able to say something about its general form incertain cases. Example 2.4.
Any Green’s function for the Laplace operator on a subspace H of C ([0 , − ∂ t G ( s, t ) = δ s ( t ) . Assuming there exists f ∈ H such that − ∂ t f = 1, we can use Example 2.1to write − ∂ t G ( s, t ) = ∂ t [ s, ( t ) . This means means that ∂ t G ( s, t ) = − [ s, ( t ) + a ( s ) , so that G ( s, t ) = − ( t − s ) [ s, ( t ) + a ( s ) t + a ( s ) . (7)Because of the symmetry of G , a and a will generally be first-order affinefunctions. Again we note that this form is only in the distributional sense,and the exact functional form may differ from this. ◦ The typical solution to the problem of non-trivial kernels is to consider linearsubspaces of functions subject to boundary conditions. The two most impor-tant types of boundary conditions are the homogeneous Dirichlet boundaryconditions f (0) = f (1) = 0and the homogeneous Neumann boundary conditions f ′ (0) = f ′ (1) . Given a function space, both of these types of boundary conditions willproduce linear subspaces of the original space. One can also consider higher-order Neumann boundary conditions f ( k ) (0) = f ( k ) (1) = 0 , and combinations of boundary conditions, both in the form of requiringmultiple homogeneous boundary conditions to hold simultaneously, or byspecifying linear combinations of boundary conditions that should be zero.6 xample 2.5. Consider the Laplace operator − ∂ t on the subspace H of C ([0 , C ([0 , − ∂ t on H is the affine functions for whichthe boundary conditions hold. Thus ker( − ∂ t ) = { o } on H and the Laplaceoperator is invertible.The Green’s function is in a space of distributions associated to H ,which means that G ( s, · ) must obey the boundary conditions on H . Sincethe conditions of Example 2.4 hold (see Exercise 1), the Green’s function isgiven by equation (7). By the boundary contions we have that0 = G ( s,
0) = a ( s )and 0 = G ( s,
1) = − (1 − s ) + a ( s )Thus, G ( s, t ) = − ( t − s ) [ s, ( t ) + (1 − s ) t = s ∧ t − st (8)where ∧ denotes the minimum. ◦ Example 2.6.
Consider again the Laplace operator, but this time on thesubspace H of f ∈ C ([0 , f obeys homogeneous Neumann bound-ary conditions. We see that the kernel of − ∂ t consists of all constant func-tions, and thus we need an extra condition to ensure invertibility. A simplechoice is to impose the restriction that f (0) = 0, which will make the kerneltrivial. ◦ Example 2.7.
We continue to consider the Laplace operator. This time onthe subspace H of functions f subject to the mixed boundary conditions f (0) = f ′ (1) = 0. Clearly the kernel of the Laplace operator is trivial,so a unique Green’s function exists. Since there exists f ∈ H such that − ∂ t f ( t ) = 1 (see Exercise 1), we can use Example 2.4 to give the form ofthe Green’s function. Imposing the boundary conditions on the general formof the Green’s function (7) we get that0 = G ( s,
0) = a ( s )and 0 = ∂ t G ( s,
1) = − a ( s ) . Thus, the Green’s function is G ( s, t ) = − ( t − s ) [ s, ( t ) + t = s ∧ t. (9) ◦ .3 Finding Green’s functions using Fourier series Recall that the
Fourier series of a function f : [0 , → R is given by s f ( t ) = 12 a + ∞ X i =1 a i cos(2 iπt ) + ∞ X i =1 b i sin(2 iπt ) (10)where a i = 2 Z f ( t ) cos(2 iπt ) d t and b i = 2 Z f ( t ) sin(2 iπt ) d t. One idea for finding Green’s functions is to use a Fourier series representa-tion for them, and use the differential operator and boundary conditions toidentify the coefficients. We have that − ∂ t s f ( t ) = 4 π ∞ X i =1 a i i cos(2 iπt ) + 4 π ∞ X i =1 b i i sin(2 iπt ) . (11)The sine and cosine functions make up an orthogonal basis for L ([0 , Z cos(2 iπt ) cos(2 jπt ) d t = i = j = 01 / i = j = 00 otherwise , Z sin(2 iπt ) sin(2 jπt ) d t = ( / i = j = 00 otherwise , and that the inner product between any sine and cosine function is zero. Toidentify the coefficients for the solution s f of the differential equation (3) − ∂ t s f ( t ) = h ( t )we can multiply both sides by respectively cos( jπt ) and sin( jπt ) and inte-grate. In the first case, we get4 π a j j Z cos(2 jπt ) d t = Z cos(2 jπt ) h ( t ) d t. Solving this in a j gives a j = Z j π cos( jπt ) h ( t ) d t, and a similar result holds for b j . By rearranging, the solution s f thus hasthe form s f ( t ) = 12 a + Z ∞ X i =1 i π (cos(2 iπs ) cos(2 iπt ) + sin(2 iπs ) sin(2 iπt )) h ( s ) d s. (12)8omparing this to the definition of the Green’s function (4) it is not toohard to see why this may be useful for finding Green’s functions. Example 2.8 (Constant term) . From the definition of a , we see that a = 0when Z f ( t ) d t = 0 . Thus, if we consider the linear subspace H of f ∈ C ([0 , G ( s, t ) = ∞ X i =1 i π (cos(2 iπs ) cos(2 iπt ) + sin(2 iπs ) sin(2 iπt )) . (13)On the other hand, assume that a = 0. Using integration by parts twice,the first term can be written on the form12 a = Z f ( s ) d s = [ f ( s )(2 s + c )] − [ f ′ ( s )( t + c s + c )] + Z ( s + c s + c ) f ′′ ( t ) d s = f (1)(2 + c ) − f (0) c − f ′ (1)(1 + c + c ) + f ′ (0) c − Z ( s + c s + c ) h ( s ) d s where it has been used that f ′′ ( s ) = − h ( s ). Thus, if we can make the firstfour terms disappear, the Green’s function has the form G ( s, t ) = ∞ X i =1 i π (cos( iπs ) cos( iπt )+sin( iπs ) sin( iπt )) − ( s + c ( t ) s + c ( t )) . ◦ We note once again that the found forms of the Green’s functions areonly to be understood in distributional sense, that is, they give the rightresults under the integration. But as a function of t the given form of aGreen’s function will typically not belong to H . We end this section withan example that demonstrates how to find the closed-form solution of thecosine-series in the general form of the Green’s function (13). Example 2.9 (Cosine series) . Consider the cosine series c ( s, t ) = ∞ X i =1 i π cos(2 iπs ) cos(2 iπt ) . From the product-to-sum formula, we get thatcos(2 iπs ) cos(2 iπt ) = 12 (cos(2 iπ ( s − t )) + cos(2 iπ ( s + t ))) . (14)9e can thus rewrite the representation c ( s, t ) = ∞ X i =1 i π cos(2 iπ ( s − t )) + ∞ X i =1 i π cos(2 iπ ( s + t )) . Since the cosine function is even cos(2 iπ ( s − t )) = cos(2 iπ | s − t | ). Onthe other hand, since the cosine function has period 2 π , cos(2 iπ ( s + t )) =cos(2 iπ ( s + t − [1 , ( s + t ))).Using some complex analysis and properties of dilogarithms, one canshow that for s ∈ [0 , ∞ X i =1 i π cos(2 iπs ) = 14 ( s ( s −
1) + 1 / . (15)Putting it all together, we get that c ( s, t ) = 112 + 12 ( s + t − s ∧ t ) − [1 , ( s + t )( s + t −
1) (16)which can also be written on the form c ( s, t ) = ( + ( s + t − s ∧ t ) for 0 ≤ s ≤ − t + ( s ( s −
1) + t ( t −
1) + 1 − s ∧ t ) for 1 − t < s ≤ . ◦ The approach for finding Green’s functions for the Poisson equation pre-sented in the previous section is very general. To find the Green’s function G for − ∂ t on some subspace H , we may formulate the constraints of thesubspace in terms of the Fourier basis functions. Example 2.10.
Consider the space H of f ∈ C ([0 , s f (0) = 12 a + ∞ X i =1 a i and 0 = s f (1) = 12 a + ∞ X i =1 a i . Thus, the representation of f becomes s f ( t ) = ∞ X i =1 a i (cos(2 iπt ) −
1) + ∞ X i =1 b i sin(2 iπt ) , G ( s, t ) = ∞ X i =1 i π ((cos(2 iπs ) − iπt ) −
1) + sin(2 iπs ) sin(2 iπt )) . From Example 2.5 we know that this series must equal s ∧ t − st , but wecan also show that explicitly. The product-to-sum formulas for the sinefunctions give thatsin(2 iπs ) sin( iπt ) = 12 (cos(2 iπ ( s − t )) − cos(2 iπ ( s + t ))) . (17)The cosine term ((cos(2 iπs ) − iπt ) −
1) equalscos(2 iπs ) cos(2 iπt ) + 1 − cos(2 iπs ) − cos(2 iπt ) , so using the cosine product-to-sum formula (14), we get that the Green’sfunction has the form G ( s, t ) = ∞ X i =1 i π (cos(2 iπ ( s − t )) + 1 − cos(2 iπs ) − cos(2 iπt )) . which, using the cosine-series result (15) gives the explicit form G ( s, t ) = 12 (cid:18) ( | s − t | ( | s − t |− / − ( s ( s − / − ( t ( t − / (cid:19) . which can easily be manipulated to give the form G ( s, t ) = s ∧ t − st. Example 2.11 (Zero coefficients) . Consider (the closure of) a linear sub-space H of C k spanned by Fourier basis functions subject to the constraints a i = 0 , . . . , a i ℓ = 0 . Suppose that − ∂ t is invertible on on H . From (11), we see that any admis-sible function h on the left-hand side of the differential equation (3) will alsohave zero coefficients for the corresponding basis functions. Thus, due tothe orthogonality all the corresponding terms will disappear from the seriesinside the integral in (12). ◦ Example 2.12 (Linear combinations of coefficients) . Consider (the closureof) a subspace H of C k spanned by the Fourier basis functions subject toa linear constraint on the coefficients ∞ X i =0 c i a i = 0 . c i ∈ R . Suppose that − ∂ t is invertible on on H , and for simplicityassume that c = 0, so we have no constraint on a . If we consider thegeneral form of the Green’s function (13), we see that it will generally notbe in H , since this would require that ∞ X i =0 c i i = 0 . To correct for this, consider the compensating term G c ( s, t ) = ∞ X i =0 c i i π cos(2 iπs ) cos(2 iπt ) . − ∂ t G c ( s, t ) = ∞ X i =0 c i cos(2 iπs ) cos(2 iπt ) . Using the Fourier series representation s f of a function f ∈ H , we see that Z − ∂ t G c ( s, t ) s f ( s ) d s = 12 ∞ X i =1 c i a i = 0 . Thus, we can subtract G c from the general form of the Green’s functionwithout it having any effect in the distributional sense. ◦ Example 2.13 (Balanced, periodic functions) . Consider the subspace H of functions f ∈ C ([0 , R f ( t ) d t = 0. The kernel of the Laplaceoperator on H isker( − ∂ t ) = { f ∈ H : f ( t ) = a ( t − / } , so we need one additional constraint. The least restricting constraint isthat f (0) = f (1), so we restrict ourselves to those functions. The integralconstraint is equivalent to a = 0 in the Fourier expansion (10), and all theFourier basis functions obey the constraint of periodicity (that the endpointsare the same). Thus, the Green’s function is on the form G ( s, t ) = ∞ X i =1 i π (cos(2 iπs ) cos(2 iπt ) + sin(2 iπs ) sin(2 iπt )) , (18)which we can rewrite using the sine and cosine product-to-sum formulas aswe did in Example 2.10. This gives the the representation G ( s, t ) = ∞ X i =1 i π cos(2 iπ ( s − t )) . G ( s, t ) = 12 | s − t | − | s − t | + 112 . We note that this Green’s function is equivalent to the covariance functionof the periodic Brownian motion constructed by Mumford and Desolneux(2010; Section 6.6.1). ◦ Example 2.14 (Odd functions) . Consider the space of functions H ⊂C ([0 , f ( t ) = − f (1 − t ) for t ∈ [0 , / /
2. Under the oddness constraint, the additional constraints f (0) = f (1), f (0) = 0 and f (1) = 0 are all equivalent. Let H be the restriction to thesefunctions.The oddness implies that R f ( t ) d t = 0, so the Green’s function musthave a similar form to the previously found Green’s function (18). Theterms that are left out are exactly all the cosine functions, since they areeven. Using the product-to-sum formula for sine, the Green’s function isfound to be G ( s, t ) = ∞ X i =1 i π (cos(2 iπ ( s − t )) − cos(2 iπ ( s + t ))) . We see that this must equal G ( s, t ) = 14 (cid:18) | s − t | ( | s − t | − − ( s + t − [1 , ( s + t ))( s + t − [1 , ( s + t ) − (cid:19) = ( s ∧ t − st for s + t ≤ s ∧ t + s ∨ t − st − for s + t > ◦ Example 2.15 (Mixed boundary conditions) . In this example we will con-sider the Laplace operator − ∂ t on function spaces H consisting of functions f ∈ C ([0 , f (0) = f ′ (1) = 0 andpossibly other conditions. In the case that no other conditions are available,we have already seen that the Green’s function is G ( s, t ) = s ∧ t. If we add the constraint that Z f ( t ) d t = 0 , we see that the previous Green’s function no longer works, since Z G ( s, t ) d t = − s − s .
13n the other hand, we see that any constant multiplied to f will integrateto zero, and thus, looking at the definition of the Green’s function it seemsnatural to add a quadratic polynomial to the Green’s function (which willbecome a constant after applying the Laplace operator). In particular, wechoose the polynomial p s ( t ) = a ( s ) t + a ( s ) t + a ( s ) that obey the boundaryconditions and integrate to s − s . We easily see that a ( s ) = 0, and fromthe the boundary condition at 1, − a ( s ) = a ( s ) . We now simply need to determine a ( s ) such that s − s Z p s ( t ) d t = Z a ( s ) t − a ( s ) t d t = − a ( s ) . We see that, a ( s ) = − ( s − s ), and thus the Green’s function becomes G ( s, t ) = ( s ∧ t ) −
34 ( s − s )( t − t ) . Example 2.16 (Dirichlet boundary) . In this example we will considerthe Laplace operator − ∂ t on function spaces H consisting of functions f ∈ C ([0 , f (0) = f (1) = 0 and possibly other conditions. In the simplest case we saw thatthe Green’s function was G ( s, t ) = s ∧ t − st. (19)Consider the additional constraint Z f ( t ) d t = 0 . As in the previous example, it seems natural to substract a polynomial thatis second order in s and t from the Green’s function (19) to find a newGreen’s function. We see that Z ( s ∧ t − st ) d t = 12 (1 − s ) s. From the boundary conditions, the polynomial must be of the form p s ( t ) = a ( s ) t − a ( s ) t. We have that Z a ( s )( t − t d t = − a ( s )so a ( s ) = 3(1 − s ) s . The Green’s function becomes G ( s, t ) = s ∧ t − st − − s ) s (1 − t ) t. ◦ xample 2.17 (Second order polynomials) . Consider the constraint that f ′′′ ( t ) = 0. The subspace of such functions are the second order polynomials.Now let H be the space of second order polynomials subject to the mixedBoundary conditions f (0) = f ′ (1) = 0. This means that any function f ∈ H has the form f ( t ) = a ( t − t, a ∈ R . The Green’s function should obey that Z − ∂ t G ( s, t ) f ( t ) d t = f ( s ) , but, using integration by parts twice, we get that the above equation isequivalent to − a Z G ( s, t ) d t = a ( s − s. Thus, Z G ( s, t ) d t = −
12 ( s − s, meaning that the Green’s function is G ( s, t ) = 34 ( s − s ( t − t. ◦ Example 2.18 (Second order polynomial bridges) . Consider the same setupas in Example 2.17, but with the subspace constraint that H is the spaceof second order polynomials subject to homogeneous Dirichlet boundaryconditions. This means that any function f ∈ H has the form f ( t ) = a ( t − t, a ∈ R . The Green’s function should obey that Z − ∂ t G ( s, t )( a ( t − t ) d t = a ( s − s, but we have that Z ∂ t G ( s, t ) t d t = 0and Z ∂ t G ( s, t ) t d t = 2 Z G ( s, t ) d t. Thus it must hold that Z G ( s, t ) d t = −
12 ( s − s. G ( s, t ) = 3 s (1 − s ) t (1 − t ) . ◦ Let pairs of observation times and observations ( t , y ) , . . . , ( t m , y m ) be given.Assume that the observations are generated according to the statisticalmodel y i = θ ( t i ) + ε i (20)where the ε i s are independent N (0 , σ )-variables, and θ ∈ H where H issome space of functions from the unit interval into the reals. The negativelog likelihood is ℓ ( θ, σ ) = m σ + 12 σ m X i =1 ( y i − θ ( t i )) , (21)but if H is a flexible space of functions, such as C ([0 , θ to be an interpolatingfunction, but rather a function that matches the trend of the observationswhile filtering out some of the noise. One way of achieving this is to adda term to the likelihood (21) that penalizes roughness. Here we will con-sider the integrated square derivative magnitude as our choice of roughnessmeasure P ( θ ) = Z k θ ′ ( t ) k d t. We can define the penalized log likelihood functional in θ as ℓ λ ( θ ) = m X i =1 ( y i − θ ( t i )) + λP ( θ ) (22)where λ > θ = arg min θ ∈ H ℓ λ ( θ )a smoothing spline (although it will typically not be smooth). In the fol-lowing we will go through the necessary theory of identifying the smoothingspline. 16 .1 Calculus of variations Minimization of functionals is the topic of calculus of variations. The func-tional derivative δF/δf of a functional F : f F ( f ) in the present settingis defined by the equation Z δFδf ( t ) h ( t ) d t = dd ǫ F ( f + ǫh ) (cid:12)(cid:12)(cid:12) ǫ =0 (23)where h is an arbitrary function living in the same function space as f .As with conventional derivatives, a minimizer is found by solving for thefunctional derivative equal to the zero function. Example 3.1.
Let us find the functional derivative of the roughness mea-sure P . Taking the right-hand side of (23), we see thatdd ǫ Z k ( θ + ǫh ) ′ ( t ) k d t (cid:12)(cid:12)(cid:12) ǫ =0 = dd ǫ Z ( θ ′ ( t ) + ǫh ′ ( t )) d t (cid:12)(cid:12)(cid:12) ǫ =0 = 2 Z θ ′ ( t ) h ′ ( t ) d t. Using integration by parts, this integral can be rewritten2 Z θ ′ ( t ) h ′ ( t ) d t = 2( θ ′ (1) h (1) − θ ′ (0) h (0)) − Z θ ′′ ( t ) h ( t ) d t. (24)Thus, if we for example consider a space of functions satisfying homogeneousDirichlet boundary conditions f (0) = f (1) = 0, the functional derivative is δP/δθ = 2 ∂ t θ . To minimize the roughness measure we thus need a function θ such that ∂ t θ ( t ) = 0, which in the present case amounts to θ ( t ) = 0. ◦ In the previous example we saw that minimization of functionals maybe related to differential equations. There are a number of interesting linksbetween splines, Gaussian processes and stochastic differential equations. Inthe following we will find the penalized maximum likelihood estimator for θ .First, we need to identify the functional derivative of the data term in thepenalized log likelihood (22). Computing the right-hand side of (23) we get − m X i =1 ( y i − θ ( t i )) h ( t i ) . As a consequence, the functional derivative must be of the form − m X i =1 ( y i − θ ( t )) δ t i ( t ) .
17f we choose boundary conditions such that the boundary terms in (24)disappear, we can write the functional derivative of ℓ λ as − m X i =1 ( y i − θ ( t )) δ t i ( t ) − λθ ′′ ( t ) . Suppose the Laplace operator − ∂ t is invertible on H with Green’s function G . Then we can write δ t i = − ∂ t G ( t i , t ). Solving the for the functionalderivative equal to the zero function then leads to1 λ m X i =1 ( y i − θ ( t ))( − ∂ t G ( t i , t )) = − ∂ t θ ( t ) . Consider a solution on the form θ ( t ) = m X i =1 c i G ( t i , t ) , insertion in the differential equation gives1 λ m X i =1 (cid:18) y i − m X j =1 c j G ( t j , t ) (cid:19) δ t i ( t ) = m X i =1 c i δ t i ( t ) , but since δ t i is the point evaluation in t i we can substitute G ( t j , t ) with G ( t j , t i ). The coefficients c i can thus be found by solving a linear system ofequations m X j =1 c j ( G ( t j , t i ) + λ { i } ( j )) = y i , i = 1 , . . . , m. (25)Define the matrix of the Green’s function evaluated at all combinations ofobservation points G = G ( t , t ) · · · G ( t , t m )... . . . ... G ( t m , t ) · · · G ( t m , t m ) . The linear system (25) can be written on the form( G − λ I m ) c = y where I m is the m × m identity matrix and bold indicates vectorization.Thus, the coefficients are given by c = ( G + λ I m ) − y G ( t ) = G ( t , t )... G ( t m , t ) . Putting everything together, we have shown thatˆ θ ( t ) = G ( t )( G + λ I m ) − y . (26)In particular, if we want to evaluate ˆ θ in all observation points, we cancompute ˆ θ ( t )...ˆ θ ( t m ) = G ( G + λ I m ) − y . A Gaussian process x is a continuous-time stochastic process for which allfinite-dimensional distributions follow multivariate normal distributions. Inthis note, we will largely ignore the details of the underlying probabilityspace and think of x as a random variable on L ([0 , x : [0 , → R are considered random functions. The most important of all stochastic processes is the Brownian motion. Wewill leave its construction to other courses, but give one definition of it.The Brownian motion x is a stochastic processes for which x (0) = 0 almostsurely, and the increments x ( t ) − x (0) , x ( t ) − x ( t ) , . . . x ( t n ) − x ( t n − )are independent, zero-mean normally distributed with variances t i +1 − t i forall choices of n and 0 < t < · · · < t n .From this definition, we can derive some important properties of Brow-nian motion. Clearly x is a zero-mean process, that isE[ x ( t )] = 0 for all t ∈ [0 , . We can also find the covariance function. Let s ≤ t Cov[ x ( s ) , x ( t )] = E[ x ( s ) x ( t )] − E[ x ( s )]E[ x ( t )] = E[ x ( s ) x ( t )]Now, write x ( t ) = x ( t ) − x ( s ) + x ( s ). ThenCov[ x ( s ) , x ( t )] = E[ x ( s )( x ( t ) − x ( s ))] + E[ x ( s ) ] = s.
19e thus conclude that for general s, t ∈ [0 , x ( s ) , x ( t )] = s ∧ t, and note that we found this expression as the Green’s function for theLaplace operator in equation (9). Example 4.1 (The Brownian bridge) . Let x be a Brownian motion. Let 0 ≤ s ≤ t ≤
1. Since x is a Gaussian process, the distribution of ( x ( s ) , x ( t ) , x (1))is a multivariate normal distribution with mean µ = (0 , ,
0) and covariancematrix Σ given by evaluating the covariance function at all combinations ogthe points s , t and 1. That is,Σ = s s ss t ts t . From the theory of the multivariate normal distribution, we know that if( x , x ) is joint normal with mean ( µ , µ ) and covariance matrix (cid:18) Σ Σ Σ Σ (cid:19) , then the conditional distribution of x | x = a is again multivariate normalwith mean µ + Σ Σ − ( a − µ )and covariance matrix Σ − Σ Σ − Σ . Now consider the condition x (1) = 0. We can use the above formulas with x = ( x ( s ) , x ( t )) and x = x (1) to get that the conditional distributionof ( x ( s ) , x ( t )) | x (1) = 0 is a zero-mean normal distribution with covariancematrix Σ = (cid:18) s − s s − sts − st t − t (cid:19) . This process is called the Brownian bridge and we see that its covariancefunction must be s ∧ t − st . Again we note the connection to the Green’sfunction for the Laplace operator found in Example 2.5. ◦ In Bayesian statistics, one can choose prior probability distributions forparameters in the model. The Bayesian equivalent to maximum likelihoodestimation is maximum a posteriori estimation, that is, we maximize theposterior probability given the data and the prior. Assume that we are20nterested in the parameter θ and have data y . The posterior probability(distribution) is given by Bayes’ theorem p ( θ | y ) = p ( y | θ ) p ( θ ) p ( y )where p ( y | θ ) is the likelihood, p ( θ ) is the prior and p ( y ) is known as theevidence. The evidence may be very hard to compute because we need tointegrate out θ in the joint distribution, but typically we can avoid com-puting this term altogether, since it is just a normalization constant that isindependent of θ . The maximum a posteriori estimate ˆ θ MAP can be foundby minimizing − log p ( θ | y ) ∝ − log p ( y | θ ) − log p ( θ ) . Comparing this to the penalized log likelihood (22), we see that both meth-ods minimize the sum of the negative log likelihood and an extra term whichin both cases acts as a regularization term.Consider again the statistical model (20) where we observe a function θ ∈ H with independent, identically distributed Gaussian noise. We maychoose a zero-mean Gaussian process with covariance function σ τ G ( · , · )as our prior for θ . This means that any finite evaluation of θ in the vectorof points s , θ = ( θ ( s ) , . . . , θ ( s ℓ )) follows a multivariate normal distributionof the form p ( θ ) = 1 p ℓ π ℓ det G ( s , s ) exp (cid:18) − σ τ θ ⊤ G ( s , s ) − θ (cid:19) where G ( s , s ) = G ( s , s ) . . . G ( s , s ℓ )... . . . ... G ( s ℓ , s ) . . . G ( s ℓ , s ℓ ) . The joint distribution of y and θ is zero-mean multivariate normal withcovariance matrix (cid:18) σ τ G ( t , t ) + σ I m × m σ τ G ( t , s ) σ τ G ( s , t ) σ τ G ( s , s ) (cid:19) where t = ( t , . . . , t m ) is the vector of observation points. The maximumposterior estimate of θ in the points s is the conditional expectation of θ given y , which is given byˆ θ MAP = G ( s , t )( G ( t , t ) + τ − I m × m ) − y . (27)Comparing with the previously found smoothing spline (26), we see thatthe maximum a posteriori estimate will be identical if we choose G as theGreen’s function for the Laplace operator on H and set λ = τ − .21 .3 Conditional Gaussian processes In this section we will consider a zero-mean Gaussian process x with covari-ance function G . Example 4.2 (Neumann boundary conditions) . Consider the observationsof x at the points s , t , 1 − ǫ and 1, ordered increasingly. We see that( x ( s ) , x ( t ) , ( x (1) − x (1 − ǫ )) /ǫ ) follow a zero-mean normal distribution withcovariance G ( s,s ) G ( s,t ) G ( s, −G ( s, − ǫ ) ǫ G ( t,s ) G ( t,t ) G ( t, −G ( t, − ǫ ) ǫ G ( s, −G ( s, − ǫ ) ǫ G ( t, −G ( t, − ǫ ) ǫ G (1 , G (1 − ǫ, − ǫ ) − G (1 , − ǫ ) ǫ . Thus, the conditional distribution of ( x s , x t ) given that ( x (1) − x (1 − ǫ )) /ǫ = 0is zero-mean normal with covariance G ( s,s ) G ( s,t ) G ( t,s ) G ( t,t ) ! − ǫ G (1 , G (1 − ǫ, − ǫ ) − G (1 , − ǫ ) G ′ ǫ ( s, G ′ ǫ ( s, G ′ ǫ ( t, G ′ ǫ ( s, G ′ ǫ ( t, G ′ ǫ ( t, ! where G ′ ǫ ( s,
1) = G ′ ǫ (1 , s ) = G ( s, −G ( s, − ǫ ) ǫ . ◦ Example 4.3.
Let x be a Brownian motion. The covariance function is G ( s, t ) = s ∧ t . The second term in the formula for the conditional variancefrom Example 4.2 is zero, since G ′ ǫ ( s,
1) = 0 for all s ≤ − ǫ . Thus thecovariance is unchanged, which is also what we would expect. This resulteasily generalizes (see Exercise 12) to any future increment of any size, whichmeans that the Brownian motion is independent of future increments. ◦ Example 4.4.
Let x be a Brownian bridge. The covariance function is G ( s, t ) = s ∧ t − st . The formula for the conditional variance from Example 4.2are now (cid:18) s − s t − stt − st t − t (cid:19) − ǫ − ǫ (cid:18) s stst t (cid:19) . As ǫ →
0, the covariance just remains that of the Brownian bridge. ◦
1. Consider the Poisson equation (3) on the following three functionspaces H = { f ∈ C : f (0) = f (1) = 0 } , H = { f ∈ C : f (0) = f ′ (1) = 0 } , H = { f ∈ C : f ′ (0) = f ′ (1) = 0 , R f ( t ) d t = 0 } . − ∂ t f ( t ) can equal the constant function 1( t ) = 1for f belonging to each of these sets. Explain the relation to exam-ples 2.1 and 2.7.2. Show that H = { f ∈ C ([0 , R f ( t ) d t = 0 } is a subspace of C ([0 , H = { f ∈ C ([0 , f (0) = f (1) } is a subspace of C ([0 , H = { f ∈ C ([0 , f ( t ) = f (1 − t ) for all t ∈ [0 , / } and H = { f ∈ C ([0 , f ( t ) = − f (1 − t ) for all t ∈ [0 , / } are subspaces of C ([0 , H be the linear subspace of functions in C ([0 , f (0) = f (1) = 0 and f (4) ( t ) = 0. Identify the Green’s function for the Laplaceoperator on H .6. Let H be the space of functions in C ([0 , x be a Brownian motion. Define a new process ˜ x ( t ) = x ( t ) − tx (1).compute its mean and covariance function. What process is ˜ x ?8. Let ˜ x be a backward Brownian motion, that is, ˜ x ( t ) = x (1 − t ) for t ∈ [0 ,
1] where x is a standard Brownian motion. Determine thecovariance function for ˜ x .9. Let x be the sum of independent forward and backward Brownianmotions (see previous exercise). Determine the covariance functionfor x . Determine boundary conditions that this covariance functionobeys.10. Let ˜ x be the sum of a forward and backward Brownian motion (seeprevious exercises), but this time assume that the backward Brownianmotion arise from the forward Brownian motion, that is ˜ x ( t ) = x ( t ) + x (1 − t ) where x is the standard Brownian motion. Determine thecovariance function for x . Determine boundary conditions that thiscovariance function obeys. 231. Argue why it must hold that the conditional expectation maximizesE[ θ | y ] the posterior p ( θ | y ) when we assume that θ is a Gaussianprocess.12. Show the general version of the result about conditioning on futureincrements for the Brownian motion mentioned in Example 4.3. Youcan either use the conditional distribution, or use the definiton ofBrownian motion.13. Give an interpretation of the result in Example 2.17 in terms of theBrownian motion. References
Holladay, John C (1957). “A smoothest curve approximation”. In:
Mathe-matical tables and other aids to computation
Journal of Mathematical Analysis and Appli-cations
Bernoulli
19 (1), pp. 1–17.Mumford, David and Agn`es Desolneux (2010).
Pattern theory: the stochasticanalysis of real-world signals . CRC Press.Raket, Lars Lau and Bo Markussen (2014). “Approximate inference for spa-tial functional data on massively parallel processors”. In:
ComputationalStatistics & Data Analysis
72, pp. 227 –240. doi : .Rytgaard, Helene Charlotte (2016). “Statistical models for robust splinesmoothing”. MA thesis. University of Copenhagen.Schoenberg, Isaac Jacob (1964). “Spline interpolation and best quadratureformulae”. In: Bulletin of the American Mathematical Society