[PDF] Splinets -- splines through the Taylor expansion, their support sets and orthogonal bases

Abstract

A new representation of splines that targets efficiency in the analysis of functional data is implemented. The efficiency is achieved through two novel features: using the recently introduced orthonormal spline bases, the so-called {\it splinets} and accounting for the spline support sets in the proposed spline object representation. The recently-introduced orthogonal splinets are evaluated by {\it dyadic orthogonalization} of the B-splines. The package is built around the {\it Splinets}-object that represents a collection of splines. It treats splines as mathematical functions and contains information about the support sets and the values of the derivatives at the knots that uniquely define these functions. Algebra and calculus of splines utilize the Taylor expansions at the knots within the support sets. Several orthonormalization procedures of the B-splines are implemented including the recommended dyadic method leading to the splinets. The method bases on a dyadic algorithm that can be also viewed as the efficient method of diagonalizing a band matrix. The locality of the B-splines in terms of the support sets is, to a great extend, preserved in the corresponding splinet. This together with implemented algorithms utilizing locality of the supports provides a valuable computational tool for functional data analysis. The benefits are particularly evident when the sparsity in the data plays an important role. Various diagnostic tools are provided allowing to maintain the stability of the computations. Finally, the projection operation to the space of splines is implemented that facilitates functional data analysis. An example of a simple functional analysis of the data using the tools in the package is presented. The functionality of the package extends beyond the splines to piecewise polynomial functions, although the splines are its focus.

Full PDF

11 Splinets – splines through the Taylorexpansion, their support sets andorthogonal bases by Krzysztof Podgórski

Abstract

A new representation of splines that targets efﬁciency in the analysis of functional data isimplemented. The efﬁciency is achieved through two novel features: using the recently introducedorthonormal spline bases, the so-called splinets and accounting for the spline support sets in theproposed spline object representation. The recently-introduced orthogonal splinets are evaluated by dyadic orthogonalization of the B -splines. The package is built around the Splinets -object that representsa collection of splines. It treats splines as mathematical functions and contains information about thesupport sets and the values of the derivatives at the knots that uniquely deﬁne these functions. Algebraand calculus of splines utilize the local Taylor expansions at the knots within the support sets. Severalorthonormalization procedures of the B -splines are implemented including the recommended dyadicmethod leading to the splinets. The method bases on a dyadic algorithm that can be also viewed as theefﬁcient method of diagonalizing a band matrix. The locality of the B -splines in terms of the supportsets is, to a great extend, preserved in the corresponding splinet. This together with implementedalgorithms utilizing locality of the supports provides a valuable computational tool for the functionaldata analysis. The beneﬁts are particularly evident when the sparsity in the data plays important role.Various diagnostic tools are provided allowing to maintain stability of the computations. Finally, theprojection operation to the space of splines is implemented that facilitates functional data analysis.An example of simple functional analysis of the data using the tools in the package is presented. Thefunctionality of the package extends beyond the splines to piecewise polynomial functions, althoughthe splines are its focus. Introduction

In functional analysis, it is often desired that functions considered are continuous or even differentiableup to a certain order. From this perspective, spline functions given over a set of knots form convenientﬁnite-dimensional functional spaces. There exist many R-packages that handle splines, see Perperoglouet al. (2019). However, none of them treats consistently as elements of functional spaces with explicitlyevaluated orthogonal bases. The proposed package,

Splinets , approaches to splines exactly like thisby using an object that represents a set of functional splines and providing efﬁcient orthogonal bases assuch objects. This focus on functional form of splines and functional analysis approach to them makes

Splinets particularly suitable for functional data analysis. The care was taken to make this functionaltreatment efﬁcient through ﬁrst, accounting for support sets of spline, then by using efﬁcient andnovel orthogonal bases.For a given set of knots and a smoothness order, a spline between two subsequent knots is equal toa polynomial that is smoothly connected at the knots to the polynomials over the neighboring intervals.The order of smoothness at the knots is equal to the number of derivatives that are continuous atthese knots, including the zero order derivative, i.e. the function itself. A selection of an order thatis higher than zero makes splines a smoother alternative to the piecewise constant functions. Givena smoothness order and knots, the splines form a ﬁnite dimensional functional space and one canconsider a suitable basis of functions that spans it. The B -splines are the most popular bases ofsplines de Boor (1978); Schumaker (2007). Their locality expressed by their support, i.e. the sets overwhich they are non-zero is one of their main features. In order to utilize this feature in numericalimplementations, one would have to keep track of the support sets of splines. In the presentedimplementation of splines, the object stores the information about the support sets. More speciﬁcally,the spline object contains a sequence of disjoint intervals such that their union is equal to the supportset. The object stores also values of the derivatives at the knots which allows to use the Taylorexpansion at the knots to evaluate splines at any point. Since the Taylor representation is local withrespect to knots it is characterized by numerical stability in the computations.More speciﬁcally, the main object Splinets implemented through the S4 system for the OOP in R is deﬁned through setClass function with the following slots representation(knots="vector", smorder="numeric", equid="logical", type = "character",supp="list", der="list", taylor = "matrix", type = "character", epsilon="numeric"), a r X i v : . [ s t a t . C O ] F e b which represents a collection of splines, all built over the same knots given in knots , of the smoothnessorder k given in smorder . Further supp is the list of matrices having row-wise pairs of the endpoints ofthe intervals, the union of which constitutes the support set of a particular spline, and the ﬂag equid informs about the equally placed knots, for which the computation can be signiﬁcantly accelerated.The matrices of the derivatives at the knots inside the support sets are given in the list der of matrices,where an element in the list is referring to a particular spline in our collection, the length of thelist corresponds to the number of splines in the object. Descriptions of other ﬁelds are given in the Splinets -package but are not crucial for this presentation.While the locality of the representation expressed by the support sets are making the packagesuitable for analysis of sparse data, its most important feature is efﬁciently using the spline bases.The standard B -bases are evaluated and the efﬁciency is achieved by explicitly using the support inimplemented algebra and calculus of splines. There is, however, one problem with the B -splines basesthat adds a computational burden when they are used to decompose a function and for functionaldata analysis. Namely, the B -splines are not orthogonal. Since any basis in a Hilbert space can beorthogonalized, it is to the point to consider orthonormalization of the B -splines, Nguyen (2015);Goodman (2003); Cho and Lai (2005). In Liu et al. (2019), a new natural orthogonalization methodfor the B -splines has been introduced. The orthogonalization was argued to be the most appropriatesince it, ﬁrstly, preserves most from the original structure of the B -splines and, secondly, obtainscomputational efﬁciency both in the basis element evaluations and in the spectral decomposition of afunctional signal.Although fundamentally different, the orthogonalization method was inspired by one-sided andtwo-sided orthogonalization discussed in Mason et al. (1993). Since the constructed basis spreads anet of splines rather than a sequence of them, we coin the term splinet when referring to such a base. Itis formally shown that the splinets, similarly to the B -splines, feature locality with a small size of thetotal support. If the number of knots over which the splines of a given order are considered is n , thenthe total support size of the B -splines is on the order O ( ) with respect to n , while the correspondingsplinet has the total support size on the order of log n which is only slightly bigger. On the other hand,the previously discussed orthogonalized bases have the total support size of the order O ( n ) , where n stands for the number of knots which is also the number of basis functions. Moreover, if one allowsfor negligible errors in the orthonormalization, then the total support size no longer will depend on n ,i.e. becomes constant and thus achieving the rate of the original B -splines.The functionality of the package extends beyond the splines since the Splinets -object can beused to represent any piecewise polynomial function of a given order. All the relevant functionswork properly even if the smoothness at the knots is not preserved. Various methods of correctinga piecewise polynomial functions to make them splines are implemented, including the orthogonalprojection to the space of splines. Nevertheless, the focus of both the package and this presentation ison the splines.The organization of the material is as follows. We start with the functional representation of splinesused in the package implementation. Then we describe the generic functions used to build splineobjects from the piecewise polynomial functions. This is accomplished by correcting the values of thederivatives to assure the continuity at the knots. Using these correction methods, a simple randomspline generator is obtained. In the next section, we turn to a discussion of algebraic and functionaloperations on splines including integral, derivatives, and inner products. The main section is aboutthe bases of splines, where the orthonormal spline bases are at the center of the discussion. Here, wepromote the orthonormal bases that are obtained from the B -splines by efﬁcient dyadic algorithms andreferred to as splinets . The presentation of the important implementation of the projection of functionsto the space of splines by the means of the developed spline bases follows. The paper is concludedwith two examples that utilized the package. In the ﬁrst, we show how the package can be used toobtain an alternative efﬁcient orthonormalization of the B -spline basis in the non-dyadic case. In thesecond, the tools in the package are used to perform a functional data analysis by means of the splinesusing a functional dataset that is also a part of the package. Mathematical foundations of a functional representation of splines

Splines are piecewise polynomials of a given smoothness order with continuous derivatives, up to thisorder (exclusive) at the points they interconnect which are called internal knots . The domain of a splinewill be referred to as its range and is assumed across the paper to be a closed ﬁnite interval. Additionalknots called the terminal knots are the endpoints of the spline range. For a given set of knots, the spaceof splines is ﬁnite-dimensional with the inner product of the Hilbert space of the square-integrablefunctions.

Due to the continuity requirements, the behavior of a spline between two given knots is necessarilyaffected by the form of polynomials at the neighboring between-knots subintervals. Since the between-knots intervals with terminal knots have only one neighboring between-knots interval, the inﬂuenceof values over other intervals is not the same. To mitigate this biased terminal knots effect, it is naturaland, as it will be seen, also mathematically elegant to introduce the zero boundary conditions at theterminal knots for all the derivatives except the highest order one. It can be shown that to removethe zero boundary effect, one has to consider splines over the knots obtained by extending a certainnumber of the knots from both the ends of the complete set of the knots. More speciﬁcally, the numberof knots that has to be added at each end is equal to the order of the splines. Often the knots are addedby replicating the terminal knots although there are some serious disadvantages of such an approach.All this is elaborated in full detail below in Proposition 1 and in the remark that follows it.The most natural and computationally stable way of evaluating values of a given spline is throughthe Taylor expansion at the closest knot. For this, it is convenient to have all derivatives at a knotdirectly accessible. To achieve this, the proposed implementation of a functional spline uses an objectthat holds the matrix of the derivatives at the knots. Such a matrix needs to satisfy certain conditionsto guarantee that the derivatives at the knots are continuous. In this section, we recap mathematicalproperties of splines that are fundamental for the implementation.

Splines with zero-boundary conditions at the endpoints

The splines involve knots at which the polynomials smoothly connect. A set of such knots is repre-sented as a vector ξ of n + k initial knots and the k terminal knots converge to thebeginning and the end of the range, respectively. Moreover and most importantly, the approach isstructurally elegant and because of it easier to implement. For all these reasons, we used it in ourpackage. In what follows, we elaborate some mathematical detail and notational conventions.Through the rest of the paper, we impose on a spline and all its derivatives of the order smallerthan the spline smoothness order the value of zero at both the endpoints of the range. In this case, ifwe consider knots ξ = ( ξ , . . . , ξ n + ) it is important to assume that n ≥ k , in order to have enoughknots to deﬁne at least one non-zero spline with the 2 k zero-boundary conditions at the endpoints.Indeed, if n = k −

1, then we have k + k between knot intervals. On each such intervala spline is equal to a polynomial of order k . The dimension of the space of such piecewise polynomialfunctions is k ( k + ) . However, at each internal knot there are k equations to make derivative upto order k (excluding) continuous. This reduces the initial unrestricted polynomials dimension by ( k − ) k dimensions to 2 k , but there are 2 k equations for the derivatives to be zero at the endpoints.We conclude that the dimension of the spline space is eventually reduced to zero, meaning that thereis only a function trivially equal to zero in this space.From now on S ξ k stands for the n − k + k -smoothed splines with thezero boundary conditions at the terminal knots of in the ordered knots given in vector ξ . Whenever,showing the dependence on either k or ξ or both is not important, they will be dropped from thenotation and thus, for example, S stands for S ξ k if both k and ξ are clear from the context.The requirement that n ≥ k allows to obtain non-trivial linear space, however, in order to retrievethe ﬂexibility of the unrestricted splines one needs even more knots. It is explained in the next resultwhose practical and theoretical consequences are discussed in the remark following the result. Proposition 1.

Let ζ = ( ζ , . . . , ζ n (cid:48) + ) be the knots considered for the k + n (cid:48) + dimensional linear space ofthe splines of the kth order without zero boundary conditions. Further, let n = n (cid:48) + k and ξ = ( ξ , . . . , ξ n + ) = ( ξ , . . . , ξ k − , ζ , . . . , ζ n (cid:48) + , ξ k + n (cid:48) + , . . . , ξ n (cid:48) + k + ) . The dimension of the space of the splines of the kth order with the zero boundary conditions at the endpoints over ξ is equal to k + n (cid:48) + , i.e. the same as that of the unrestricted splines over ζ . Moreover, the space of the splineswith the boundary conditions when restricted to ζ coincides with the space of the unrestricted splines over ζ .Proof. The result easily follows from the following count of the dimension of the splines with theimposed boundary conditions: there are n + k + ( n + )( k + ) coefﬁcients. To count free coefﬁcients, one subtracts the 2 k initial conditions and the nk continuity conditions at the n internal knots. This yields the dimension n + − k = n (cid:48) + k + Remark 1.

From a practical point of view, considering functional analysis using unrestricted splines of thek-th order over an interval [ a , b ] is equivalent to using the splines with the zero boundary conditions at the endsby adding k knots below a and k knots above b. This is because by the above result, when the splines with theboundary conditions and deﬁned over the extended set of knots are restricted in the argument to [ a , b ] , theyproduces the same linear space as the space of unrestricted splines over this interval.It should be noted, however, that the topological properties of the two spaces differ because the inner productswill differ at the extended boundaries. One can however obtain the topological equivalence either by convergingthe limit of the added knots to a for the lower knots and to b for the upper knots or by modifying the innerproduct of the zero-boundary condition splines by taking the integral only over [ a , b ] .We conclude that using the splines with the zero-boundary conditions is by no means restrictive and fromnow on we consider only this case. The splines can be represented in a variety ways. In Qin (2000), a general matrix representationwas proposed that allows for efﬁcient numerical processing of the operations on the splines. This wasutilized in Zhou et al. (2012) Zhou et al. (2008) and Reed Redd (2012) to represent and orthogonalize B -splines that was implemented in the R -package Orthogonal B-Spline Basis Functions . In our approachwe propose to represent a spline in a different way. Namely, we focus on the values of the derivatives atknots and the support of a spline. The goal is to achieve a better numerical stability as well as to utilizethe discussed efﬁciency of base splines having support only on small portion of the considered domain.Our approach constitutes the basis for spline treatment in the package

Splinets that accompanies thiswork.The fundamental fact that we use here is that for a given order, say k , and a vector of knot points ξ = ( ξ , . . . , ξ n + ) , a spline S ∈ S is uniquely deﬁned by the values of the ﬁrst 0, . . . , k derivativesat the knots. Here, by a natural convention that we use across the paper, the 0 derivative is thefunction itself. The values of derivatives at the knots allow for the Taylor expansions at the knotsbut they cannot be taken arbitrarily due to the smoothness at the knots. Since our computationalimplementation of the spline algebra fundamentally depends on the relation between the matrix ofderivatives values and the splines, the restrictions for the derivative matrix needs to be addressed infull detail. Matrix representation of splines

The values of the derivatives at the knots are at the center of the Taylor expansion representation ofa spline. Therefore the matrix of the derivative values has became the main component of an objectbelonging to our main class in the package. However, due to discontinuity at knots of the highestorder derivative (the order of the smoothness of the considered splines), there is some ambiguityin the representation of such a matrix. There are two natural representations, one is the symmetricwith respect to the endpoints of the spline support and the other is the one-sided, for example theright-hand-sided (RHS) limits. There are some beneﬁts each of these conventions and therefore we useboth as described below.From now on, we assume that we consider a spline with the full support range, i.e. not vanishingon any subinterval of the entire range of knots. For any spline function S ∈ S ξ k , we consider s j =( s j , . . . , s n + j ) is an n + j th -derivative of S , j =

0, . . . , k ,at the knots given in vector ξ = ( ξ , . . . , ξ n + ) that, as a general convention for all vectors (also theconvention in R ), is treated as a ( n + ) × ( n + ) × ( k + ) matrix S de f = [ s s . . . s k ] (1)Since the derivative of the k th order is not continuous at the knots while constant between them,one needs some convention how to deﬁne the values in s k . In the symmetric with respect to theendpoints of the support approach, one considers the RHS limits for the LHS half of the knots andthe LHS for the RHS half of the knots, where knots are considered only within the spline supportrange. This is the main convention that is adopted in the object and in the package since the symmetryis more natural for the splines with the zero boundary conditions on both the ends of the range.Consequently, we assume that the value of s ik is the RHS k th derivative at the knots ξ i , i ≤ n /2 andthe LHS derivative at the knots ξ i , i ≥ n /2 +

1. If n is odd, n = l + l , the undeﬁnedyet value of s l + k is set to zero while the LHS and RHS values of the k -derivatives at ξ l + k coincidewith s lk and s l + k , respectively. We note also that if n is even and n = l , then s lk = s l + k . Remark 2.

We would like to point out that the one-sided convention, another way of treating the highest order derivative, is often more convenient computationally since it does not depend on the support range of a particularspline. The value of the kth derivative at a knot is considered as the right hand side limit except for the lastknot ξ n + where it is assumed to be equal to zero, as there are no values on the right hand side of ξ n + . Theadvantage of the one-sided approach is the isomorphic relation between functional space and the space of thederivative matrices can be easily retrieved independently of the spline support range. Therefore, this approachwill be applied for deﬁning different algebraic and analytic operations on splines, see the section on the algebraand calculus of splines. The two forms of the matrices, symmetric and one-sided, being equivalent can be easilytransformed forth and back and a convenient function sym2one() is available in the package. The concepts areillustrated in Example 1. In general, S lies in the ( n + )( k + ) dimensional linear space of the ( n + ) × ( k + ) matrices.However, the matrices corresponding to legitimate splines occupy only a proper subspace thatcorrespond to the n − k + S need to satisfy. To obtain them explicitly, we note that S ∈ S over interval ( ξ i , ξ i + ] , i =

0, . . . , n , is given through its Taylor expansions S ( t ) = k − ∑ j = s ij ( t − ξ i ) j j ! + s i + δ i k ( t − ξ i ) k k ! = k − ∑ j = s i + j ( t − ξ i + ) j j ! + s i + ¯ δ i k ( t − ξ i + ) k k ! ,where δ i = I ( n /2, n + ] ( i ) and ¯ δ i = − δ i . Here and in what follows I A is the indicator function of a set A . Similarly, the derivatives, r =

1, . . . , k , are given through S ( r ) ( t ) = k − r − ∑ j = s i j + r ( t − ξ i ) j j ! + s i + δ i k ( t − ξ i ) k − r ( k − r ) ! = k − r − ∑ j = s i + j + r ( t − ξ i + ) j j ! + s i + ¯ δ i k ( t − ξ i + ) k − r ( k − r ) ! .To express the symmetry, we divide the knots into the left ones ξ L = ( ξ , . . . , ξ l + ) and introducethe following notation for the right half ones in the reverse order (cid:16) ξ R , . . . , ξ Rl + (cid:17) = ( ξ n + , . . . , ξ n − l ) , (2)where l = [ n /2 ] . We observe that for n even we have ξ l = ξ Rl + and ξ l + = ξ Rl , and for n odd ξ l + = ξ Rl + . Let us also deﬁne S R as the bottom half of the matrix S in the reverse order through s Rij = s n + − i j , i =

0, . . . , l + j =

0, . . . , k .The restrictive relations following from the above Taylor expansions can be split into the LHS and theRHS knots, for i =

0, . . . , l , r =

0, . . . , k − s i + r = k − r ∑ j = ( ξ i + − ξ i ) j j ! s i j + r , s Ri + r = k − r ∑ j = ( ξ Ri + − ξ Ri ) j j ! s Ri j + r . (3)We observe that for even n , the relations for the knots ξ l + and ξ Rl + are equivalent due to ξ l = ξ Rl + and ξ l + = ξ Rl . Consequently, the number of the above equations is equal to 2 ( l + ) k − k = nk + k sothat the dimension of the space of the matrices satisfying them is equal to ( n + )( k + ) − − nk − k = n + k + s lk = s l + k ). From this number, one has to subtract 2 k forthe zero boundary conditions at both the ends yielding n − k + n = ( l + ) . Namely, from (3),we reduce the full dimension ( n + )( k + ) by 2 ( l + ) k + = nk + k + s l + k =

0) to obtain the reduced dimension equal to n + k + k due to the zeroboundary conditions at the endpoints. From now on, the restricted space of matrices is identiﬁed with S and a matrix S ∈ S is identiﬁed with a spline S .The importance of the derived relations for numerical implementation of the spline objects is two-fold. Firstly, they can be used to deﬁne splines through specifying their derivatives values. Secondly,in applications that are computationally intensive, where a lot of algebraic operations are performedon the entries of the matrix S often, due to round-off errors, the matrix entries may cease to satisfy(3). In such a case, it maybe required to verify if the equations hold up to a required accuracy. Ifthe accuracy is not achieved some corrections of computational results need to be addressed and thederived equations can be also utilized for the purpose.The equations can be written in the matrix notation using a ( k + ) × ( k + ) lower triangular Toeplitz matrix that for α > A α =  α · · · α α α α α α k − ( k − ) ! α k − ( k − ) ! α k − ( k − ) ! α k − ( k − ) ! · · · α k − ( k − ) ! α k − ( k − ) ! α k − ( k − ) ! α k − ( k − ) ! · · · α α k − ( k − ) ! α k − ( k − ) ! α k − ( k − ) ! α k − ( k − ) ! · · · α α α k k ! α k − ( k − ) ! α k − ( k − ) ! α k − ( k − ) ! · · · α α α  . (4)The matrix S can be split to S L and S R , where S L = [ s . . . s l + ] is made of the ﬁrst l + S andthus corresponding to ξ L while S R = [ s Rij ] l + ki = j = corresponds to ξ R , as deﬁned before. Then (3) can bewritten as S L =  s · s · A ξ − ξ s · A ξ − ξ ... s i − · A ξ i − ξ i − ... s l · A ξ l + − ξ l  + (cid:104) . . . ∆ s Lk (cid:105) , S R =  S R · S R · A ξ R − ξ R S R · A ξ R − ξ R ... S Ri − · A ξ Ri − ξ Ri − ... S Rl · A ξ Rl + − ξ Rl  + (cid:104) . . . ∆ s Rk (cid:105) , (5)where ( l + ) × ( l + ) matrix ∆ is given in ∆ =  − − −  .For the equally spaced ξ , we have even simpler relations S L = (cid:20) s · S l · A α (cid:21) + (cid:104) . . . ∆ s Lk (cid:105) , S R = (cid:20) S R · S R l · A α (cid:21) + (cid:104) . . . ∆ s Rk (cid:105) , (6)for α = ( ξ n + − ξ ) / ( n + ) .To understand better the introduced notation and the symmetric and one-sided matrix representa-tions we present a simple example. Example 1.

We consider the cubic splines, i.e. k = and two examples of the matrices for equally spaced knotswith n = and n = . The matrices of the derivatives at the knots that are adopted in the Splinets -objecthave the following forms S =  .

00 0 .

00 0 . − − − − − − − − − − − − − − − . − − . − − − − − − − − − .

00 0 .

00 0 .  , S =  .

00 0 .

00 0 . − − − − − − − − − − . − − − − − − − − − − .

00 0 .

00 0 .  . In the ﬁrst and the last rows, we observe the zero boundary conditions. In the even number of knots case, the twohighest derivatives at the center are equal to each other, while for the odd number of knots, the highest derivative column has zero at the center. The transformation by means of sym2one() leads to the one-sided representations S (cid:48) =  .

00 0 .

00 0 . − − − − − − − − − − − − − − − − − − − − − − − − − − .

00 0 .

00 0 .  , S (cid:48) =  .

00 0 .

00 0 . − − − − − − − − − − − − − − − − − − − − .

00 0 .

00 0 .  . The left- and right-hand-side halves are in this example as follows S L =  .

00 0 .

00 0 . − − − − − − − − − − − − − − − . − − .  , S L =  .

00 0 .

00 0 . − − − − − − − − − − .  , S R =  .

00 0 .

00 0 . − − − − − − − − − − − . − − .  , S R =  .

00 0 .

00 0 . − − − − − − − − − − − − .  . Implementation of class ’Splinets’ in the object oriented programming

The efﬁciency gains by using our methods can be fully utilized only if a proper approach to the splinecalculus is taken. In particular, one must utilize small supports of the considered spline bases. If twosplines are having in common only a small portion of their supports, then the inner product betweenthem needs to be evaluated only over this common support. Thus for splines with small supportevaluations become computationally less demanding even if the number of knots and thus of basissplines is large. For this reason, we have implemented a spline object that contains information aboutthe spline support and we use this information in computational procedures.Outside of the support the values of the derivatives are zero and to utilize this in efﬁcient compu-tations, any spline function S of order k over knots in ξ = ( ξ , . . . , ξ n + ) is identiﬁed as S = { k , ξ , I , s , s , . . . , s k } ,where I = { ( i , i + m + ) , . . . , ( i N , i N + m N + ) } is a sequence of ordered pairs of indexes in {

1, . . . , n + } representing the intervals, the union of which is the support of a spline, i.e. the minimialclosed set outside of which spline vanishes. It implies that the knots outside the support would havethe corresponding entries in S equal to zero and thus there is no need to include them in the matrixof derivatives. The matrix S is thus divided into N blocks S r row-wise of the sizes ( m r + ) × k , r =

1, . . . , N . Thus the columns in the r -block s ( r ) , s ( r ) , . . . , s ( r ) k are m r + S at the knots of support, ξ i r + l , l =

0, . . . , m r +

1. It is important to pointthat the block S r is kept in the symmetric around center format as described in the previous section,except there n has to be replaced by m r and ( ξ , . . . , ξ n + ) by ( ξ i r , . . . , ξ i r + m r + ) .For the so deﬁned object the most elementary function is veriﬁcation if the object satisfy conditionof a spline with zero-boundary conditions for which (5) hold. It is implemented in is.spline() method for this class. Two functions gather() and subsample() are grouping two Splinets objectsinto one and subsampling from an

Splinets object to obtain another object, respectively. Another basicfunction is evaluate() , which evaluates splines in a

Splinets -object at a given vector of argumentvalues within the range of knots through the Taylor expansion. Finally, the generic function plot() utilizes spline evaluation to plot graphs through k · N points in-between knots, where k is the order ofa spline and N is the parameter passed by the method. Support sets

One of the most important of our implementation of the spline algebra is controlling the localityof a given spline by keeping information about its support sets. We envision that this feature willbe particularly useful for extension of the package to higher dimensions that we plan to do in the future, but even in the current version it plays its role. The main reason why it is important is that theorthonormalization we implemented to obtain the splinets is optimal in respect to the total supportsize. Thus our goal is to utilize this feature to improve efﬁciency in computations – the operationsoutside of the support set simply are not needed and thus not carried out. In the background of the allanalytical and algebraical operations on

Splinet -objects, the package carries out the evaluation of thesupport sets of the outputs. This evaluation is based on the set algebra operations on the support setswhich are kept in the ﬁeld supp of the object.All operations for maintaining the correct information about the support sets are carried in thebackground but it may happen that the actual support set is not matching the evaluated one. A typicalexample, when such a problem may occur is addition of two splines. Generally, the sum of twosplines has the support sitting on the union of the individual supports. However, if we add S to − S the support is an empty set and thus the general rule does not apply. For this reason, the packagehas a function exsupp that extracts the support from a Spline -object. This function should be usedwhenever the accurate support sets are needed and there is a suspicion of deviation in the object fromthe actual support set representation.

Building functional splines

The class of spline allows for quite arbitrarily general fields and the method is.spline() checks ifthe deﬁned object is indeed a spline. Since, it is not trivial to correctly deﬁne a proper matrix of thederivative at the knots, the function that ‘corrects’ a given matrix so it follows (5) is of interest and thusimplemented in the package. Using such a function provides the simplest way of deﬁning a properspline object. It can be also used to correct the values of the derivatives at the knots due to a rounduperror for a large number of spline evaluations. Another way to build a proper spline, is to randomlyselect a spline object. In the package, we actually have a ﬂexible method of generating random splineobjects. Finally, one can obtain a set of splines by utilizing a basis of splines, this method is presentedlater on in the paper where a number of different spline base are discussed. In what follows, the ﬁrsttwo methods of obtaining splines are discussed in further detail.

Generating an individual non-random spline

For given knots and an order of smoothness, n − k + S such that (5) is satisﬁed, by, for example,setting n − k + The ﬁrst row and the last column case (frlc):

The ﬁrst case is obtained by ﬁxing values of the highestorder derivative, i.e. the last column in S . Consider a ( m + ) × ( k + ) submatrix U = [ u ij ] m + ki , j = of S made of m + S . Assume that the ﬁrst row u · andthe last column u · k of U are known. Thus, in total, m + k + u i k − , i =

1, . . . , m +

1, which can berecursively computed using, for i =

1, . . . , m +

1, the relations u i k − = u i − · (cid:2) A ξ i − ξ i − (cid:3) · k − = (cid:2) u i − · A ξ i − ξ i − (cid:3) k − , (7)We note that in these relations u m , k is not present, so its value does not affect the other computedvalues in the matrix U . The ﬁrst row and the ﬁrst column case (frfc):

Another important special case is evaluation of theentries of U is when, instead of the last column entries (the values of the k -th derivative), theﬁrst column ones (the values of the spline) are known, i.e. u · is available together with theﬁrst row u k − (except the k th derivative) and we want to evaluate the rest of the entries of U except u m , k , which is not involved in the Taylor expansion formula based on the consideredknots. We can assume that we also know the value u k since u = u · (cid:2) A ξ − ξ (cid:3) k from which we can evaluate u k through the explicit formula u k = (cid:2) A ξ − ξ (cid:3) k (cid:16) u − u k − (cid:2) A ξ − ξ (cid:3) k − (cid:17) . However, by having an entry of the last column one can follow the evaluation of the next rowas in the previous case. In other words, to evaluate the coordinates of vectors u i k , i =

1, . . . , m and u m + k − we follow recurrent relations u i k − = u i − · (cid:2) A ξ i − ξ i − (cid:3) · k − , i =

1, . . . , m + u ik = (cid:2) A ξ i + − ξ i (cid:3) k (cid:16) u i + − u i k − (cid:2) A ξ i + − ξ i (cid:3) k − (cid:17) , i =

1, . . . , m . (8) The ﬁrst row and the last row case (frlr):

The ﬁnal case we consider is when the ﬁrst and the lastrows in U = [ u ij ] m + ki , j = are set. We note that u m + k correspond to the k th derivative on theinterval [ ξ m + , ξ m + ) and thus on can set it to an arbitrary number without having any effect onthe behavior of the spline on [ ξ , ξ m + ) . Thus disregarding this irrelevant entry, the dimensionof the space of the matrices U that corresponds to the space of splines is k + m +

1. It impliesthat by specifying the ﬁrst and the last row (with u m + k not contributing to the dimension ofthe splines) we have an overspeciﬁed matrix if m < k , the uniquely speciﬁed matrix if m = k and an underspeciﬁed matrix if m > k . To assure that other entries are deﬁned, we consideronly m =

0, . . . , k and note that if m < k , then k − m entries in the last (or the ﬁrst) row do notneed to be used. As we will see, it is convenient to assume that u m + k − m − is not used andthese entries can be evaluated based on the other values.Consider ﬁrst m =

0, in which the case the entire 2 × k + U is deﬁned but the entriesmust satisfy additional k conditions u k − = u · (cid:2) A ξ − ξ (cid:3) · k − ,or, otherwise, it does not correspond to a valid spline. One can correct this by changing thevalues on the left-hand-side (LHS) of the above to the one obtained from the right-hand-side(RHS). This corrects the underlying spline object to match the matrix restrictions.For m =

1, the middle row u · needs to be evaluated, which is done in two steps. First, as before,we evaluate u k − = u · (cid:2) A ξ − ξ (cid:3) · k − , (9)then the ﬁnal value u k is obtained by solving u k − = u · (cid:2) A ξ − ξ (cid:3) · k − = (cid:2) u k − u k (cid:3) (cid:2) A ξ − ξ (cid:3) k − k k − = u k − (cid:2) A ξ − ξ (cid:3) k − k − + u k (cid:2) A ξ − ξ (cid:3) k k − = u · (cid:2) A ξ − ξ (cid:3) · k − (cid:2) A ξ − ξ (cid:3) k − k − + u k (cid:2) A ξ − ξ (cid:3) k k − = u · (cid:2) A ξ − ξ (cid:3) · k − + u k (cid:2) A ξ − ξ (cid:3) k k − ,where the last equation follows from the fact that A ξ − ξ has ones on the diagonal. The explicitsolution is u k = (cid:16) u k − − u · (cid:2) A ξ − ξ (cid:3) · k − (cid:17) / (cid:2) A ξ − ξ (cid:3) k k − .Additional k − U to represent a validspline u k − = u · (cid:2) A ξ − ξ (cid:3) · k − .Again, if these equations are not satisﬁed, one can correct them by changing the LHS valuesto the RHS ones. This is permissible since the values on the LHS have not been used for theevaluation of other entries in the matrix.For a general m ≤ k −

1, we have to evaluate the m rows u m · given that the ﬁrst and the last,i.e. u · and u m + · , are given. It is sufﬁcient to provide equations needed to evaluate u m k anduse the ﬁrst-row-and-the-last-column case described above to evaluate all other entries. We use(9) to obtain u k − , then we consider u k − m .. k − = u · (cid:2) A ξ − ξ (cid:3) · k − m .. k − = u k − m .. k (cid:2) A ξ − ξ (cid:3) k − m .. k k − m .. k − = u k − m .. k − (cid:2) A ξ − ξ (cid:3) k − m .. k − k − m .. k − + u k (cid:2) A ξ − ξ (cid:3) k k − m .. k − = u k − m .. k − (cid:2) A ξ − ξ (cid:3) k − m .. k − k − m .. k − (cid:2) A ξ − ξ (cid:3) k − m .. k − k − m .. k − + u k (cid:2) A ξ − ξ (cid:3) k k − m .. k − (cid:2) A ξ − ξ (cid:3) k − m .. k − k − m .. k − + u k (cid:2) A ξ − ξ (cid:3) k k − m .. k − . Using matrices A ( j ) of sizes m × m , and vectors c ( j ) of sizes 1 × m , deﬁned by A ( j ) = (cid:104) A ξ j − ξ j − (cid:105) k − m .. k − k − m .. k − , c ( j ) = (cid:104) A ξ j − ξ j − (cid:105) k k − m .. k − ,we write in a compact manner u k − m .. k − = u k − m .. k − A ( ) + u k c ( ) , u k − m .. k − = u k − m .. k − A ( ) A ( ) + u k c ( ) A ( ) + u k c ( ) .Similarly, for j =

2, . . . , m + u j k − m .. k − = u j − · (cid:104) A ξ j − ξ j − (cid:105) · k − m .. k − = u j − k − m .. k (cid:104) A ξ j − ξ j − (cid:105) k − m .. k k − m .. k − = u j − k − m .. k − A ( j ) + u j − k c ( j ) .Using m × m matrices B ( r j ) = ∏ jl = r A ( l ) , r =

1, . . . , j , B ( j + j ) = I m , we obtain non-recurrentrelations u j k − m .. k − = u k − m .. k − B ( j ) + j ∑ r = u r − k c ( r ) B ( r + j ) . (10)For j = m + u m + k − m .. k − = u k − m .. k − B ( m + ) + m + ∑ r = u r − k c ( r ) B ( r + m + ) = u k − m .. k − B ( m + ) + u k c ( ) B ( m + ) + u (cid:62) m k C ( m ) = (cid:16) u k − m .. k − A ( ) + u k c ( ) (cid:17) B ( m + ) + u (cid:62) m k C ( m ) = u k − m .. k D ( m ) + u (cid:62) m k C ( m ) ,where D ( m ) = (cid:2) A ξ − ξ (cid:3) k − m .. k k − m .. k − B ( m + ) is an ( m + ) × m matrix, while m × m matrix C ( m ) is the row-wise concatenation of m row1 × m matrices c ( r ) B ( r + m + ) , r =

2, . . . , m +

1. In these equations only u m k is unknown thusthe problem reduces to ﬁnding the inverse E ( m ) of C ( m ) and evaluating u (cid:62) m k = (cid:16) u m + k − m .. k − − u k − m .. k D ( m ) (cid:17) E ( m ) . (11)Whenever m ≤ k , the vector u m k is computable from the above. When it is combined withthe ﬁrst row u · (or the last row u m + · ) all the remaining values can be computed as describedabove in the ﬁrst-row-last-column case. Since u m + k − m − is not used in the above equationstheir value have to agree with the values that has been evaluated. If they are not they arecorrected by setting them to the computed values.The special case of equally spaced knots simpliﬁes signiﬁcantly the computations. Indeed, inthis case we can set A to the common value of A ( j ) , j =

1, . . . , m +

1, and c to c ( j ) ’s, i.e. A = (cid:2) A ξ − ξ (cid:3) k − m .. k − k − m .. k − , c = (cid:2) A ξ − ξ (cid:3) k k − m .. k − .Then (10) becomes u j k − m .. k − = u k − m .. k − A j + j ∑ r = u r − k cA j − r ,and the matrices in (11) become D ( m ) = (cid:2) A ξ − ξ (cid:3) k − m .. k k − m .. k − A m , E ( m ) =  cA m − cA m − ... cAc  − . The special case of m = k is particularly important, since it leads to the unique matrix speciﬁca-tion. We summarize our ﬁndings for this case in the result below. Proposition 2.

Let U = [ u ij ] k + ki , j = be a ( k + ) × ( k + ) matrix that corresponds to the matrix of derivativesat knots ξ , . . . , ξ k + of a kth order of spline deﬁned over these knots, where the right-hand-side derivative isconsidered as the kth order derivative.If its ﬁrst row u · and the last row u k + · are both given, then all the remaining entries are uniquely deﬁned.In particular, the last column u · k + is given through u (cid:62) k k = ( u k + k − − u k D ) E , (12) where E =  c ( ) ∏ k + l = A ( l ) c ( ) ∏ k + l = A ( l ) ... c ( k ) A ( k + ) c ( k + )  − , D = (cid:2) A ξ − ξ (cid:3) k k − k + ∏ l = A ( l ) , are k × k and ( k + ) × k matrices, respectively, in which, for j =

1, . . . , k + , A ( j ) = (cid:104) A ξ j − ξ j − (cid:105) k − k − , c ( j ) = (cid:104) A ξ j − ξ j − (cid:105) k k − . All the remaining entries are recursively deﬁned, for i =

1, . . . , k through u i k − = u i − · (cid:2) A ξ i − ξ i − (cid:3) · k − = (cid:2) u i − · A ξ i − ξ i − (cid:3) k − .To illustrate different features of the matrix derivative adjustments, we put all three methods to thetask of ‘correcting’ a randomly chosen piecewise cubic polynomial function shown in Figure 1 ( Top) .We observe that the ﬁrst two methods, while providing quite accurate approximation close to zero,destabilize quickly – the method based on matching the highest order derivative biases away dueto over-smoothing, while deviation of the one that matches the values of the function at the knotsincreases variability. On the other hand, the last method obtained by matching the derivatives at thetwo endpoints produces quite stable smoother ‘correction’ of the initial function.It is clear that in the above consideration, the knots do not need start from ξ and the order canbe reversed from the right-to-left instead the left-to-right with all necessary but simple adjustments.All these properties can be used in many different ways to build the splines with a matrix of thederivatives satisfying (5). In our package, we implemented three basic approaches. Namely, thecenter-row-last-column (CR-LC), the center-row-ﬁrst-column (CR-FC), and the regular row match(RRM). All three methods are design for the case when the number of internal knots in the support isat least 2 ∗ k +

2. The cases with a smaller number of the knots in the support can be treated throughthe functional basis approach and function project() that is discussed later in this work. In whatfollows, we describe these constructions and the method for the spline objects that is implemented inthe package.

The CR-LC method

In this method, we assume ﬁrst that k + n − k knots. Thereason for excluding these knots are the zero boundary conditions that uniquely determine thespline once the values of S at the k + nd knot from each endpoint are determined together withthe value of the highest derivative over the interval between the k + st knot and the k + nd knot. It will be discussed later in further detail.For these remaining knots, there are n − k − k th order derivative over theintervals between the knots. In this approach, we set them to given values in the input. Theseleaves k more independent values to be set as the dimension of the spline space is n − k −

1. Weset them through the derivatives at the center s Ll + k − , where l = [ n /2 ] as before. It clearlyalso deﬁnes s Rl + k − (we use the Taylor expansion if n is even, which is equivalent to applying‘frlr’ with m = S Lk + l + · and S Rk + l + · and to consider‘mirror’ matrices (cid:101) S L = (cid:101) S L l − k − · and (cid:101) S R = (cid:101) S R l − k − · , where also the values of the k th deriva- − − − − − − − − − − − − − − . − . − . . . Figure 1: Top-Middle – The methods used in the direct spline building.

Top-Left:

The spline objectthat is not a spline but only a piecewise cubic polynomial without continuity at knots.

Top-Right:

Avalid spline object obtained by matching the highest derivative at knots.

Middle-Left:

A spline obtainedby matching the values of the function at knots.

Middle-Right:

The most stable ‘correction’ of thederivative matrix by matching derivatives at the endpoints (the RHS match is using the values thatcorrespond to a RHS continuation of the function, so they do not match the function on the graph).

Bottom – A spline given by a red dashed line is distorted and then reconstructed. In the LHS graph,all values at the knots are distorted by a random noise, while in the RHS graph only values of thederivatives are distorted. The thick broken deep-sky-blue line on both the graphs represents theresulting distorted piecewise polynomial function. The thin dotted lines going out of the boundsillustrate the CR-FC method, which tries to follow the values of the function at the knots. The RRMmethod represented by the dark-red lines seems to be the most accurate. The third line, olive-green,represents the CR-LC method. tives, i.e. ˜ s Li , k + and ˜ s Ri , k + , are assigned so that they are over the interval corresponding topairs ( (cid:101) S Li · , (cid:101) S Li + · ) and ( (cid:101) S Li · , (cid:101) S Li + · ) , respectively, i =

0, . . . , l − k −

1. It is clear that by settingthe values at the center knots as described above we have given the ﬁrst rows in (cid:101) S L and (cid:101) S R .Moreover, setting the k th derivatives over all the n − k knots (including over the intervalbetween the k + st knot and the k + nd knot from each end) is equivalent to setting the lastcolumns in each of the two matrices (cid:101) S L and (cid:101) S R . By applying the ‘frlc’ approach to these twomatrices, we correct all the entries in these two matrices so that the matrix correspond to aspline. We emphasize that we also obtain the k th derivatives s Lk , k + and s Rk , k + at the k + st knotcounting from each endpoint, respectively.Now, the values at these knots at the endpoints that were initially excluded, i.e. S L k + · and S R k + · , are handled through the ‘frlr’ approach. Since our method is symmetric, it is enoughto consider only one end-point, say the LHS. At this end, we have already the k + k + k th derivative over the interval between the k + st knot and the k + nd knot) and the k values S k − , which are zero because of the zeroboundary condition. Starting from the RHS and continuing to the LHS, one can obtain uniquelyall the remaining entries in S L k + · using ‘frlr’ approach for U = (cid:101) S k + · , which is the mirrorof S k + · with reassigned the k th derivatives. In fact, it guarantees the unique solution to theproblem (the case m = k ). The CR-FC method

The method is very similar to the previous one except this time we set values of the spline at n − k + k − k th derivative. All other entries can be evaluated as described in the ‘frfc’approach for the matrix U set to (cid:101) S L l − k · and (cid:101) S R l − k · , respectively, and then evaluating thevalues at the k + The RRM method

This methods uses only the ‘frlr’ approach which produces most stable results forlarge number of knots relatively to the order of the spline as illustrated in Figure 1. The methodstarts the same way as the two previous methods by setting k + S into S L and S R and, in the even n case, settling ﬁrst the values at the two central knotsusing ‘frlr’ with m =

0, as described before. Having then the central values settled, it uses ‘frlr’with m = k , starting from these central knots through the successive groups of k + (cid:101) S L and (cid:101) S R , respectively, does not exceed l − k + (cid:101) S L j ( k + ) · and (cid:101) S R j ( k + ) · , for j being integer part of ( l − k ) / ( k + ) . Thenthe ‘frlr’ approach is used for (cid:101) S Lj ( k + ) .. l − k + · and (cid:101) S Rj ( k + ) .. l − k + · , unless j ( k + ) = l − k +

1. Thelast step for the ﬁnal k + construct() . In Fig-ure 1 (Bottom) , we present an illustration of Splinets objects as results of application of the functions construct() . There a spline has been distorted by addition of some random noise to its matrix of thederivatives. Then the three above described methods has been used to correct the matrix of derivativeand construct valid splines. The methods RRM and CR-LC perform similarly with the RRM doing abetter job when only the derivatives are distorted (right). The CR-FC that tries to follows the values ofthe distorted function performs badly at the boundaries but improves its performance at the centerwhen the values of the function are not distorted (right).

Random spline generator

In many applications, it is convenient to have an effective simulations of random functions. Thepackage has implemented a simple approach through simulation of the error noise around the matrixof the derivatives at knots and then using our construction methods to obtain a valid spline. Formally,the model from which simulations are made can be written in the terms of the matrices of thederivatives T = S + C ( (cid:101) ( Σ , Θ )) ,where T is the ( n + ) × ( k + ) matrix of derivatives for a generated random spline, S is the analogousmatrix for its mean value spline, (cid:101) ( Σ , Θ ) is a ( n + ) × ( k + ) random matrix generated from themean-zero matrix valued normal distribution with the covariances Σ and Θ , ( n + ) × ( n + ) and ( k + ) × ( k + ) , respectively, non-negatively deﬁned matrices. Namely, for a matrix Z of iid standardnormal variables (cid:101) ( Σ , Θ ) = Σ Z Θ .Finally, C ( · ) is a chosen correction method of an arbitrary ( n + ) × ( k + ) matrix so it satisﬁes theconditions required for it to be a matrix of derivative for a spline as described in the previous sections.In the package implementation, this is one of the three methods: RRM, CR-LC, CR-FC. In Figure 2, − t 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . t0.0 0.2 0.4 0.6 0.8 1.0 − . . . . . t 0.0 0.2 0.4 0.6 0.8 1.0 . . . . t Figure 2:

Random samples of splines. The dashed lines in the middle of the packs of random splinesrepresent the mean value spline which is the same for all the cases. The vertical dashed lines arepassing through the knot locations.

Left-Top : Independent splines with standard normal variablesin the matrix of the derivatives, the RRM correction method.

Left-Bottom : The same but through theCR-LC correction method.

Right : Using smaller variances for knots closer to zero.we see samples random functions generated using the package implementation rspline() , usingdifferent variances and different spline correction methods.

Basic operations on splines

Splines are elements of a Hilbert space and thus their linear combination and the inner products arereadily deﬁned. In the representation of the package splines are kept in the main object (cid:101) S as a sequence {S , . . . , S d } of quadruples S j = (cid:110) k , ξ , I j , S j (cid:111) ,where, k is the common smoothness order, the common knots are ξ = ( ξ , ξ , . . . , ξ n + ) . The derivativematrix S j , j =

1, . . . , d for each spline is a sequence of matrices deﬁning the derivative over componentsof the support set given through pairs of the indices in I j . Here, for simplicity, we assume the one-sidedrepresentation of each matrix in S j and we identify I j with a support set in the range of the splines. Embedding a spline into higher dimension spaces of splines

One of the most interesting aspects of the spline spaces are their agility following from differentchoices of the knots. However, this is not that often utilized with most of the focus typically being onthe splines under the ﬁxed set of knots. In the proposed package, we provide tools to fully explorethe properties of the splines under different choices of knots. Any spline of a given order remains aspline of the same order if one considers it on a larger set of knots than the original one. However, this changes Splinets representation of the so reﬁned spline. It is thus important to have a functionthat embeds a given spline into the bigger space of splines residing on a reﬁned set of knots. In thepackage the function refine() does the task allowing conveniently represent splines from smallerspaced in the bigger more reﬁned spaces. This function will be utilized in the ﬁnal section of this work,where projections to spline spaces are discussed.

Linear combination

The linear combination of splines could be easily implemented if the splines have full supports. In thiscase, a linear combination of splines correspond simply to taking the same linear combination of thematrices of the values of the derivatives at the knots. Thus for two splines, S = { k , ξ , ( n + ) , S } and (cid:101) S = (cid:110) k , ξ , ( n + ) , (cid:101) S (cid:111) , their linear combination with coefﬁcients α and (cid:101) α has the matrix of thederivatives given by W = α S + (cid:101) α (cid:101) S .One of the important features of the package, is utilization of the support sets. Thus for general I and (cid:101) I , we utilize them as follows. First, we note that the support of the linear combination is at most J = I ∪ (cid:101) I . Then the matrix of the derivatives over this support for the linear combination is given by W which is obtained from W I · = α S I · W I · = α S I · + (cid:101) α (cid:101) S I · W I · = (cid:101) α (cid:101) S I · where I = I \ I , I = I ∩ (cid:101) I , and I = (cid:101) I \ I and subindexing the matrices by the supportcomponents stands for considering these parts of them that correspond to the respective componentsof the support.More general, denote the operation of linear transformation as (cid:96) ( (cid:101) S , P ) , where P = { p ij } is a m × d transformation matrix. The linear transformation operator deﬁne a map from an input Splinets -objectof size d to the same order Splinets -object of size m . The derivative matrix of the i th spline in theoutput can be calculated in the full support case as d ∑ j = p ij S j .Similarly to the case of a linear combination of two splines, the support sets can be utilized to improvethe computational efﬁciency. The operation of the linear combination of splines is implemented in lincomb() . Derivative and integral If t ∈ [ ξ i , ξ i + ] , i =

0, . . . , n , then S ( t ) = k ∑ l = ( t − ξ i ) l l ! s i , l .The derivative at this point is S (cid:48) ( t ) = k ∑ l = ( t − ξ i ) l l ! s i , l .Thus the derivative matrix of the ﬁrst derivative of a spline is given by S ( S (cid:48) ) = { k − ξ , I , s , ..., s k } .This simple evaluation of the spline derivative is implemented in deriva() .The indeﬁnite integral of a spline within each interval [ ξ i , ξ i + ] , is given by (cid:90) S ( t ) dt = k ∑ l = ( t − ξ i ) l l ! s i , l − + c i , (13)where i =

0, . . . , n and c i is an arbitrary real number. There are many ways to determine a speciﬁcinverse derivative function by specifying c i ’s. We choose the one that preserve the spline smoothnessbut will not necessary yield zero boundary condition on the RHS terminal knot. Set c =

0, for i =

1, . . . , n +

1. To achieve the continuity at the knots c i is iteratively calculated by c i = [ S ] i − A ∗ ξ i + − ξ i ,where A ∗ ξ i + − ξ i is slightly different from eq. (4). Now, it is a column vector of the Taylor coefﬁcientsfrom order 1 to order k + A ∗ α = (cid:32) α , α

2! , . . . , α ( k + ) ( k + ) ! (cid:33) T .One can notice that the resulting splines do not necessarily satisfy the boundary condition. Theboundary condition of the output is only guaranteed when the deﬁnite integral of the input spline overthe entire range of knots vanishes. For example, the derivative of a spline that satisﬁes the boundarycondition has this property. In this sense the operation constitute the inverse of the derivative. Morespeciﬁcally, if I is the integration operator and D is the differentiation operator, then ID = DI = I ,where I is the identity operator on S ξ k , and DS ξ k = S ξ k − , IS ξ k (cid:33) S ξ k + .This operation is implemented as a function integra() .The deﬁnite integral of a spline can be calculated through eq. (13). The integral of spline S ( t ) within interval [ ξ i , ξ i + ] is (cid:90) ξ i + ξ i S ( t ) dt = [ S ] i ,. [ A ∗ ξ i + − ξ i ] .,0 The deﬁnite integral of a spline is implemented in dintegra() . Inner product

Finally, we demonstrate how the topology induced by the inner product in the space of splines can beexpressed in the terms of the coefﬁcients of the matrices. The calculation of mutual inner products of aset of splines is implemented in gramian() . The algorithm for calculating inner product is establishedin the following proposition.

Proposition 3.

Let ˜ S be the n + k + dimensional space of ( n + ) × ( k + ) matrices S as in eq. (1) satisfyingeq. (3) . For S , ˜ S ∈ ˜ S , let us deﬁne the inner product (cid:104) S , ˜ S (cid:105) = (cid:16) . . . k + (cid:17) n ∑ i = ( A ∗ T ξ i + − ξ i · S i · ) ∗ ( A ∗ T ξ i + − ξ i · ˜ S i · ) , Here we use the following notations and conventions: for two r × vectors v and w , their convolution is a ( r − ) × vector deﬁned by v ∗ w =  p ∧ r ∑ m =( p − r + ) ∨ v p − m + w m  r − p = , while v · w is coordinate-wise multiplication of vectors. Moreover, for a matrix X , its i th row is denoted by X i · .Then ˜ S equipped with this inner product is isomorphic with the space of splines of the kth order spannedover the knots ξ , . . . , ξ n + equipped with the standard inner product of the square integrable functions.Proof. For a given set of knots ξ let S and ˜ S be the (unique) splines such that S ( S ) = S and S ( ˜ S ) = ˜ S . Then (cid:104) S , ˜ S (cid:105) = n ∑ i = (cid:90) ξ i + ξ i S ( t ) ˜ S ( t ) dt = n ∑ i = (cid:90) ξ i + ξ i k ∑ j = s ij ( t − ξ i ) j j ! k ∑ j = ˜ s ij ( t − ξ i ) j j ! dt = n ∑ i = k ∑ j , r = s ij ˜ s ir j ! r ! (cid:90) ξ i + ξ i ( t − ξ i ) j + r dt = n ∑ i = k ∑ j , r = s ij ˜ s ir j ! r ! ( ξ i + − ξ i ) j + r + j + r + = n ∑ i = k ∑ l = ( ξ i + − ξ i ) l + l + l ∧ k ∑ m =( l − k ) ∨ s il − m ˜ s im ( l − m ) ! m ! = n ∑ i = k ∑ l = l + l ∧ k ∑ m =( l − k ) ∨ s il − m ( ξ i + − ξ i ) l − m + ( l − m ) ! ˜ s im ( ξ i + − ξ i ) m + m ! ,which shows the isometry property of the mapping S . Bases of splines and their orthogonalizations

As we have seen in the previous sections, the direct approach to building splines requires a lot of careand often can be cumbersome. A much better way to build splines is through functional bases ofsplines. There are many possible choices of such bases but the most popular are the B -splines. Despitehaving many advantages, the B -splines do not constitute an orthogonal basis. Our main contributionis to implement an optimal orthogonalization of the B -splines introduced in Liu et al. (2019). Thepresentation of this spline basis beneﬁts from organizing them in the form of a net. In this framework,the derived orthogonal bases of splines are referred to as the splinets. B -splines This section uniﬁes the notation and provides the most fundamental facts about the B -splines. Themost convenient way to deﬁne the B -splines on the knots ξ = ( ξ , . . . , ξ n + ) , n =

0, 1, . . . is throughthe splines with the boundary conditions and using the recurrence on the order. Namely, once the B -splines of a certain order are deﬁned, then the B -splines of the next order are easily expressed bytheir ‘less one’ order counterparts. In the process, the number of the splines decreases by one and thenumber of the initial conditions (derivatives equal to zero) increases by one at each endpoint. We keepthe notation B ξ l , k , for the l th B -spline of the order k , l =

0, . . . , n − k . For the zero order splines, the B -spline basis is made of indicator functions B ξ l ,0 = I ( ξ l , ξ l + ] , l =

0, . . . n , (14)for the total of n + n + B -splines constitutean orthogonal basis.The following recursion relation leads to the B -splines of an arbitrary order k ≤ n . Supposenow that we have deﬁned B ξ l , k − , l =

0, . . . , n − k +

1. The B -splines of order k are deﬁned, for l =

0, . . . , n − k , by B ξ l , k ( x ) = x − ξ l ξ l + k − ξ l B ξ l , k − ( x ) + ξ l + + k − x ξ l + + k − ξ l + B ξ l + k − ( x ) . (15)It is also important to notice that the above evaluations need to be performed only over the jointsupport of the splines involved in the recurrence relation. The recurrent structure of the support isas follows. For zero order splines, the support of B ξ l ,0 is clearly [ ξ l , ξ l + ] , l =

0, . . . , n . If the supportsof B ξ l , k − ’s are [ ξ l , ξ l + k ] , l =

0, . . . , n − k −

1, then the support of B ξ l , k is the joint support of B ξ l , k − and B ξ l + k − , which is [ ξ l , ξ l + + k ] , l =

0, . . . , n − k . In order to translate these recursive relations to therelations between the matrices of the derivatives at the knots, we need the following result on the . . . . . . . − Figure 3:

Left:

The recursion in deﬁnition of the B -splines, the ﬁrst order splines (top) , the second orderspline (middle) , and the third order spline (bottom) ; Right:

The third order B -Splines on equidistantgrid (top) and the splinet build of them (bottom) .derivatives of the B -splines. This result follows from 15 and we omit the proof while the graphicalillustration of the recurrence is presented in Fig. 3- (Left) . Proposition 4.

For i =

0, . . . , k and l =

0, . . . , n − k + :d i B ξ l , k dx i ( x ) = i ξ l + k − ξ l d i − B ξ l , k − dx i − ( x ) + x − ξ l ξ l + k − ξ l d i B ξ l , k − dx i ( x )++ i ξ l + − ξ l + k + d i − B ξ l + k − dx i − ( x ) + ξ l + k + − x ξ l + k + − ξ l + d i B ξ l + k − dx i ( x ) . (16) The support of d i B ξ l , k / dx i is [ ξ l , ξ l + k + ] and if i = k, then d i B ξ l + k − / dx i ≡ . Let consider the zero th and ﬁrst order B -splines with the boundary conditions B ξ l , B ξ r , where l =

0, . . . , n , r =

0, . . . , n −

1. Then the corresponding ( n + ) × ( n + ) × S ( l ) =  ← l +  , l ≤ n , S ( r ) =  ξ r + − ξ r − ξ r + − ξ r + ← r + r < n .0 0... ...0 0  Using the recurrent relation 16 between the derivatives of B -splines, one can generalize therecurrence between matrix representation of the B -splines to the arbitrary order of splines. Using one sided representation of the splines, we have S ( k , l ) · j = ξ l + k − ξ l (cid:16) j · S ( k − l ) · j − + Λ l S ( k − l ) · j (cid:17) ++ ξ l + − ξ l + k + (cid:16) j · S ( k − l + ) · j − + Λ l + k + S ( k − l + ) · j (cid:17) , (17)where l =

0, . . . , n − k , j =

0, . . . , k and the diagonal ( n + ) × ( n + ) matrices Λ l ’s have ( ξ − ξ l , . . . , ξ n − ξ l ) on the diagonal. Here we assume that if j = k , then S ( k − l ) · j is a column made of zerosas the k th derivatives of the ( k − ) th order spline is always zero.The above algebraic relation is implemented in the package. To construct the B -spline basis oforder k , one uses so = splinet(xi, k); Bsplines=so$bs, where splinet() generates three different types of basis and organizing them as a list. The element inthe list labeled bs is always the Splinets -object corresponding to the B -spline basis.The string-ﬂag type in a Splinets -object indicates if the object is a basis, with type="bs" indicatingthat it is a B -splines basis, i.e. in the above example Bsplines@type="bs" . An example of the result isshown in Fig. 3 (Right-top) . There the equidistant case is presented and it should be noted that thealgorithm used accounts for the additional efﬁciency that this case yields in the computations.

Orthogonal spline bases

The orthonormalized bases implemented in this package are obtained by one of the following threeorthogonalization procedures applied to B -splines. The ﬁrst one is simply the Gram-Schmidt orthogo-nalization performed on the B -splines ordered by their locations, the second one is a symmetric (withrespect to the knot locations) version of the Gram-Schmidt, and, ﬁnally, the dyadic orthogonalizationinto a splinet which is our preferred method. All the methods have been discussed in detail in Liu et al.(2019). In the object representation of collections of splines, i.e. in the Splinets -class, the ﬁeld type speciﬁes which of the orthonormal basis one deals with. The function splinet() is generating theproper basis with the default form so=splinet(xi); Bsplines=so$bs; Splinet=so$spnt and returning a list of two

Splinets objects, so$bs and so$spnt build over the ordered knots xi . Theﬁrst object represents the basis made of the standard cubic B -splines and thus not orthogonal. Thesecond one represents the recommended orthonormal basis referred to as a cubic splinet. This is alsorepresented in the ﬁeld type which can be either dspnt or spnt , i.e. depending if the system is fullydyadic or not, respectively. The explanation of the dyadic structure of a splinet is given below. InFig. 3 (Left-bottom) , the splinet obtained from the equally spaced B -splines is presented.It is important to point out that the main computational engines of the orthogonalization processesare implemented to work on a generic gramian matrix H and returning P such that I = P (cid:62) HP . (18)These algorithms do not explicitly reference splines and can be applied in other contexts in which suchdiagonalization may deem important. They are available inside the package as the auxiliary functionsbut there are not explicitly referenced in the documentation. One-sided orthogonalization.

The function splinet can also generate other than the splinet types oforthogonal spline bases. One of them is the one-sided OS -splines, type="gsob" , which are obtainedby the classical Gram-Schmidt orthogonalization. If the B -splines have been obtained earlier in so$bs ,then the following command will produce the desired outcome so=splinet(xi, Bsplines=so$bs, type='gsob'); GramSchmidt=so$os The one-sided orthogonalization has two ﬂaws, ﬁrstly, it has a fairly large the total support setand, secondly, it is asymmetric even for the equally spaced case, as seen in Fig. 4 (Top) . The matrix P in(18) has the triangular form and thus 50% of the entries are non-zero. Symmetric approach.

The symmetric orthogonalization, type="twob" , alleviates some of the ﬂaws ofthe one-sided case. It utilizes a symmetrized version of the Gram-Schmidt orthogonalization andoriginally was proposed in Redd (2012). The two-sided orthogonalization addresses both the ﬂaws of − − − − − Third order OBspline basis, Type: dyadic splinet − − − Figure 4:

Top-Middle:

The one-sided (top) and symmetrized Gram-Schmidt orthogonalization (middle) for the B -splines from Fig. 3; Bottom:

The splinet on a dyadic net. the one-sided basis: it has a smaller total support set, and it is symmetric with respect to the center, asseen in Fig. 4 (Middle) . The matrix P in (18) has two equally sized rectangular blocks and thus its moresparse than the original by having only 25% of non-zero terms.While the symmetrized orthogonalizaton can be viewed as an improvement over the one-sidedorthogonalization, one can do even better as far as the total support size and the sparsity of the matrix P are concerned. Another orthogonalization procedure which utilizes the previous two approachesin a ‘telescopic/dyadic’ manner is far more optimal. This is the orthonormalization of the B -splinesthat is promoted in the package and in this work. The next subsection is devoted to description of thefunctionality of the splinets in their Splinets -implementation.

Splinets – optimal orthogonalization.

The dyadic algorithm for orthogonalization of the B -splines produces OB -splines that are residing onlyover a slightly bigger support (factor of log n , where n is the number of knots) than the total supportsize of the original B -splines. The obtained splines are symmetric with respect to the center but extendthe symmetry into a dyadic structure based on the support sets. Finally, the sparsity of the matrix P is dramatically improved over the two previous orthogonalization methods as seen in Fig. 5 (Right) ,where the sel-similar fractal structure of the non-zero matrix entries is presented. For a given set ofknots in xi , the splinet is evaluated simply by so = splinet(xi) which returns the Splinets -object so$os containing the splinet. The graphical presentation of the splinet on the dyadic grid as shown inFig. 4

Bottom can be simply obtained by plot(so$os) . If one prefers to have a graph without dyadicstructure it can be obtained by plot(so$os,type="simple") as seen in Fig. 3 (Right-Bottom) .The algorithm is most naturally described for the dyadic structure of knots that assumes that forsome positive integer N we have k N − = n , where k is the order of the splines and n is the numberof the internal knots, i.e. we do not count the endpoints. The resulting OB -splines are then located ona dyadic net with the N levels featuring increasing support set sizes and a decreasing number of basiselement. The OB -splines are also grouped into k -tuples of the neighboring splines. All these featuresare best seen in Fig. 4, Bottom . We observe that each of the OB -spline in the splinet inherits its locationfrom the corresponding B -spline. The location is naturally represented by the middle knot in theirsupport if the number of the knots in the support is odd, or by the average of the two middle knots ifthat number is even. This locations can be used to present any spline basis on a dyadic-net graph. Infact, the algorithms are implemented in such a way that any set of knots leads to a splinet that canbe represented on a dyadic net which may be incomplete if the number of knots does not satisfy thedyadic case restriction. Fig. 5 (Center/Bottom-Left) shows an example of such situation.As mentioned, all fundamental algorithms carrying computational burden of orthonormalizationare operating on matrices. Their computational efﬁciency can be improved if these generic algorithmsare implemented in a lower-level language, such as the C-language, and compiled to the machinecode. This approach has not been yet taken but it will be considered in future versions of the package.The special case of equally spaced knots correspond to H being a Toeplitz matrix. In this casecertain parts of the algorithms can be signiﬁcantly accelerated since the evaluation has to be performedonly for one k -tuple instead of 2 N − l − l in the dyadic net. In the package,this has been implemented in the fully dyadic equidistance knots case but not for the non-dyadiccase. To utilize this efﬁciency it may be advantageous for a large number of equally spaced knots tochoose them so that their number satisﬁes the fully dyadic condition because the gained efﬁciency maycompensate additional computations resulting from the larger number of knots. Alternatively, one candecompose a non-dyadic structure into smaller dyadic structures and perform the orthogonalizationon them. The latter approach is illustrated in the next section to demonstrate functionality of the Splinets -package.

Orthogonal projection to a space of splines

The spaces of splines are ﬁnite dimensional spaces of the Hilbert space of all square integrable functionsand thus any such a function can be projected in the orthogonal fashion into the linear space of splinesspanned over a particular set of knots. Any functional data analysis typically begins with projectingthe data to a functional ﬁnite dimensional subspace – the projection becomes a fundamental operationfor carrying out statistical data analysis. One can also perform a projection to smooth the data andthere a plethora of methods to target this goal. In the package, we have implemented orthogonalprojection in the function project() . Since actual functional data can be represented in a varietyof ways, the projection itself depends on the input format and, more speciﬁcally, the way the innerproduct of the input with a spline is deﬁned. We consider two type of inputs:

Splinets -objects andcolumns of pairs representing arguments and values of a discretization of functional data. . . . Third order OBspline basis, Type: Bspline functions . . . . . . . . . − Third order OBspline basis, Type: not fully dyadic splinet − − Figure 5:

Left:

A not-fully dyadic and not-equaly spaced case. Both the B -splines (top) and the splinet(bottom) are presented on a dyadic net. Right:

A self-similar structure of the non-zero entries of P corresponding to the orthonormalization of the B -splines leading to a splinet. The case of the thirdorder B -splines with equally spaced knots leading to a 1533 dimensional space.Independently of the input, the output of the function project() is a list, say onsp , made of thethree components: onsp$coeff – the matrix of coefﬁcients of the decomposition in the selected basis, onsp$basis – the Splinets -object representing the selected basis, onsp$sp – the

Splinets -object representing the projection of the input in the projection spline space.Additional information, such as the knots, the order, the type of the basis, can be retrieved easilyfrom the second component. Many of the algebraic operations on the splines are more convenientlyperformed on the matrix of coefﬁcients of their spline basis representations, than on the

Splinets -objects themselves. The coefﬁcient matrix onsp$coeff can be utilized for such computations and thecorresponding linear combination of onsp$basis can be used whenever the functional form of theresult is needed.

Basis decomposition.

The simplest projection obtained through project() is not, strictly speaking, aprojection but rather decomposition of a

Splinets -object to coefﬁcients in the given basis. If sp is a Splinets -object, then bdsp=project(sp); bdsp2=project(sp,type='bs');bdsp3=project(sp,type='gsob'); bdsp4=project(sp,type='twob'); have as its main output the matrices of coefﬁcients a ji , such that the j th input-spline sp has the form n − k + ∑ i = a ji OB i ,where j indexes the input splines, n is the number of the internal knots, k is the smoothness order,and OB i is the selected basis of splines controlled by the input type (the default is the splinet builton the same knots as the input spline). The possible choices of the bases are the splinet, the one- andtwo-sided orthonormal bases, or the B -splines, all built on the same knots as the input spline. Projecting splines.

The projection of

Splinets -objects over a given set of knots to the space of splinesover a different set of knots is obtained as the orthogonal projection. Namely, if S is the input Splinets -object then the output is denoted as P S where P is the orthogonal projection to the space spanned bythe spline space build by the second set of knots P S = n − k + ∑ i = a ji OB i , ( S − P S ) ⊥ P S . (19) - - Discrete functional data - - - Third order splinet basis - - - - Data projections + + + - . - . . . . Coefficents of the projections + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - . . . . + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - . - . . + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Figure 6:

Left-Top:

The original data S j j =

1, . . . , 10.

Right-Top:

The splinet { OB i } n − k + i = of the spaceto which the data are projected. Left-Bottom

The projection splines P S j . Right-Bottom

The projectioncoefﬁcients ( a ji ) n − k + j , i = for the data corresponding to the basis elements in the splinet. The verticallines represent the knot locations in the projection space.This is an extension of the previous case since the output functions may belong to a different spacethan the input functions. The result is obtained by embedding both the input splines and the projectionspace to the space of splines that contains both and evaluating the inner products between functionsin this space. The space of splines that contains both is build over the union of the two set of knotsand uses the package function refine() for that purpose. The following code will lead to the result if knots are different from sp@knotsbdsp=project(sp,knots); bdsp2=project(sp,knots,type='bs');bdsp3=project(sp,knots,type='gsob'); bdsp4=project(sp,knots,type='twob'); The results are represented in the spline space build over knots and thus the

Splinets -object in theoutput representing the projection spline satisﬁes bdsp$bs@knots=knots . Projecting discretized functional data.

The function project() works also when the input is a discretiza-tion of some continuous argument functional data. In this case, the input is a matrix having in theﬁrst column a set of arguments and in the remaining ones the corresponding values of a sample offunctional data. The input data are considered to be a piecewise constant functions with the valueover two subsequent arguments equal to the value in the input corresponding to the left-hand-sideargument. In this way, the discretized data can be viewed as functions and their inner products withany spline are well deﬁned. In the package, this speciﬁc inner product can be obtained by utilizingthe indeﬁnite integral implemented in the function integra() . Consequently, the projection P S of thefunctional data in S satisﬁes (19), if S represents the data as a piece-wise constant function . In Figure 6,we illustrate the output from project() for a sample of ten functional data. Example – an alternative orthonormal basis in the non-dyadic case

When the number of internal and equally spaced knots satisﬁes the relation n = k N − N , the splinet build upon them is computed faster than for irregularly spaced knots. The reason is that for each support level in the splinet the matrices of the derivatives for the k -tuples areidentical and need to be computed only once per level, compare Fig. 4 Bottom and Fig. 5

Left-Bottom .However, for the non-dyadic case the symmetries break and the efﬁciency is not implemented in thecurrent algorithm since it is difﬁcult to track where these efﬁciencies occur. One can use an alternativeapproach to build bases in the non-dyadic case by extracting the maximal dyadic sub-splinets andorthogonalizing the remaining B -splines with respect to them. In our ﬁrst example of utilizing thepackage, we illustrate how such an orthogonalization can be performed on a non-dyadic B -splinesof the ﬁrst order ( k = n =

43 equally spaced internal knots, which means that this is a not fully dyadic case:2 − < < −

1. The following code builds the normalized B -splines (and also the splinetimplemented in the package) k=1; n = 43; xi = seq(0, 1, length.out = n+2)spts=splinet(xi,order = k, norm = T); Bsp=spts$bs; OBsp0=spts$os The largest dyadic sub-splinet that can be build from a subset of subsequent knots is for n = − =

31 internal knots. Removing these knots plus the left-hand-side (LHS) end from the original knotsleaves us with 43 − =

11 internal knots. We can repeat the process on the remaining knots with thedyadic sub-splinet build over n = − = − − = n = − =

3. This threesplinets are adjacent and cover the whole range of the knots. We group all 45 knots as follows to buildthe dyadic splinet over each group ( ) , (

33 : 42 ) , (

42 : 45 ) .This is done in the following code in which the matrices of derivatives are only evaluated for thelargest sub-splinet and the remaining subsplinets inherit the respective matrices. This leaves us with only two B -splines labeled by 32 and 40, each overlapping two adjacent knotgroups, that have not been yet accounted in the orthogonalization. They are orgthogonalized withrespect to the dyadic subsplinets (from the right to the left). We present the code to demonstrate thefeatures of the package.

15 nt=seq2 net ( n1 , k ) ; nt2=seq2 net ( n2 , k ) ; nt3=seq2 net ( n3 , k )1617 f o r ( i in 1 : length ( nt2 ) )18 f o r ( j in 1 : length ( nt2 [ [ i ] ] ) ) nt2 [ [ i ] ] [ [ j ] ] =nt2 [ [ i ] ] [ [ j ] ] + n1+11920 f o r ( i in 1 : length ( nt3 ) )21 f o r ( j in 1 : length ( nt3 [ [ i ] ] ) ) nt3 [ [ i ] ] [ [ j ] ] =nt3 [ [ i ] ] [ [ j ] ] + n1+n2+22223 S1=subsample ( OBsp , n1 + 1 ) ; S2=subsample ( OBsp , n1+n2 +2)2425 S1L=subsample ( OBsp , nt [ [ 1 ] ] [ [ 1 ] ] ) ; S1R=subsample ( OBsp , nt2 [ [ length ( nt2 ) ] ] [ [ 1 ] ] )26 f o r ( i in 2 : length ( nt ) ) S1L=gather ( S1L , subsample ( OBsp , nt [ [ i ] ] [ [ length ( nt [ [ i ] ] ) ] ] ) )27 f o r ( i in ( length ( nt2 ) − 1 ) : 1 ) S1R=gather ( S1R , subsample ( OBsp , nt2 [ [ i ] ] [ [ 1 ] ] ) )2829 S2R=subsample ( OBsp , c ( nt3 [ [ 2 ] ] [ [ 1 ] ] , nt3 [ [ 1 ] ] [ [ 1 ] ] ) )30 S2R=subsample ( OBsp , c ( nt3 [ [ 2 ] ] [ [ 1 ] ] , nt3 [ [ 1 ] ] [ [ 1 ] ] ) )31 indL=vector ( ) ; f o r ( i in 1 : length ( nt2 ) ) indL=c ( indL , nt2 [ [ i ] ] [ [ length ( nt2 [ [ i ] ] ) ] ] )32 S2L=subsample ( OBsp , indL )3334 S2LR=gather ( S2L , S2R )35 - First order splinet basis - - - - − − − − − Figure 7:

Top:

The splinets for the non-dyadic ﬁrst order case: the implementation in the package ( left) and the one build in the example ( right ).

36 A=matrix ( c (1 , − InnPr [ 3 : 5 ] , − InnPr [ 5 : 4 ] ) , ncol=6)37 sc=1/ s q r t (1 −A[ 1 , 2 : dim (A)[2]]%*% t (A[ 1 , 2 : dim (A ) [ 2 ] ] ) ) ; A=sc [ 1 , 1 ] *A3839 OS2= l i ncomb ( gather ( S2 , S2LR ) ,A)40 OBsp@der [ n1+n2 +2]=OS2@der ; OBsp@supp [ n1+n2 +2]=OS2@supp4142 S1LR=gather ( S1L , S1R ) ; S1LR=gather ( S1LR , OS2 )4344 InP=gramian ( S1 , Sp2 = OS2 )4546 A=matrix ( c (1 , − InnPr , − InnPr [ 5 : 3 ] , InP ) , ncol=10)47 sc=1/ s q r t (1 −A[ 1 , 2 : dim (A)[2]]%*% t (A[ 1 , 2 : dim (A ) [ 2 ] ] ) ) ; A=sc [ 1 , 1 ] *A4849 OS1= l i ncomb ( gather ( S1 , S1LR ) ,A)50 OBsp@der [ n1 +1]=OS1@der ; OBsp@supp [ n1 +1]=OS1@supp

We note that the support level for each of these two splines is one above the largest level of thedyadic subsplinets that it overlaps as shown in Figure 7 (Left-Top) . The above code yields two differentorthogonalization of the B -splines, the implemented in the package given in OBsp0 and the alternativeone given in

OBsp . The two bases are differ as seen in Figure 7. The presented here alternative way oforthogonalizing the B -splines in a non-fully dyadic case will be considered in the full generality in thefuture releases of the package. Example – simple functional data analysis with

Splinets

We conclude this work with an example of functional data analysis utilizing the tools implementedin the package. The functional data are simulations from a rather complicated stochastic model, theso-called Slepian model that was developed in Podgórski et al. (2015). A computationally intensemodel involving a high-dimensional Gibbs sampler was developed for responses of a truck to anon-Gaussian loads at a transient (a high impact event on the road proﬁle). A functional data sampleof size 1000 is presented in Figure 8 (Top-left) . The data are discretized over 4095 equally spacedarguments. Here, instead of the complicated model, we want to ﬁt a linear functional data modelbased on the Karhunen-Lo`eve decomposition of the process that is residual to the mean function.In the ﬁrst step, we represent data as splines using project() to project data into splines X i ( t ) , i =

1, . . . , 1000 of the third order with 201 equally spaced knots. The ﬁrst ﬁve functional data togetherwith the mean spline µ ( t ) are presented in Figure 8 (Top-right) and the code applied to the matrix representation of the data in Truck that performs these evaluation is

In the next step, we perform the spectral decomposition of X i ( t ) by estimating the eigenvalues λ i ’sand the corresponding eigenfunctions f i ( t ) . This allows for representing the data according to the −100 −50 0 50 100 − −100 −50 0 50 100 − . − . . . . . S pe c t $ v a l ue s −100 −50 0 50 100 − − −3 −2 −1 0 1 2 3 − − − Figure 8:

Top: (left) , the mean truck response(thick line) and a single functional data (solid line) and its spline representation (dotted lines) (right) . Middle:

The ﬁve eigenfunctions (left) corresponding to the ﬁrst ﬁve largest eigenvalues and the orderedeigenvalues right . Bottom-left:

A single functional datum (thin-solid line), its reconstruction usingfull space of splines (dashed-line), and three reconstruction using the 10, 50, and 100 dimensionalsubspaces spanned by the eigenfunctions with the corresponding largest eigenvalues.

Bottom-right:

The scatter plot of standardized coefﬁcients in the Karhunen-Lo`eve decomposition for the ﬁrst twoeigenvalues (triangles) compared with a scatter plot of a standard normal bivariate distribution(squares).Karhunen-Lo`eve decomposition X ( t ) = µ ( t ) + ∞ ∑ i = (cid:112) λ i Z i f i ( t ) ,where Z i are uncorrelated random variables with the mean zero and variance one. The following codeperforms estimation of the spectral decomposition and reconstruct a single functional data point using10, 50, and 100 eigenfunctions, while in Figure 8 (Middle-left) we present the estimated eigenvalues inthe decreasing order and the ﬁrst ﬁve eigenfuctions are presented in Figure 8 (Middle-right) . Finally, the distributional properties of Z i ’s variables can be investigated by considering thecolumns of 1000 ×

197 matrix C standardized by the means and the square roots of the eigenvalues ascomputed in the code

13 Z1= (C[ ,1] − matrix ( MeanTruck%*%Spect $ vec [ , 1 ] , nrow=1000))/ s q r t ( Spect $ values [ 1 ] )14 Z2= (C[ ,2] − matrix ( MeanTruck%*%Spect $ vec [ , 2 ] , nrow=1000))/ s q r t ( Spect $ values [ 2 ] )

A scatter plot of Z1 and Z2 is shown in Figure 8 (Bottom-right) together with a scatter plot ofsimulated values from the bivariate standard normal variable. We observe that coefﬁcients obtainedfrom the data resemble the ones for the standard normal although the former appears to be slightlymore leptokurtic. Acknowledgement

The author would like to thank Xijia Liu who is the co-author of the

Splinets package for helpfuldiscussions on some parts of the paper.

Bibliography

O. Cho and M. J. Lai. A class of compactly supported orthonormal b-spline wavelets.

Splines andWavelets , pages 123–151, 2005. [p2]C. de Boor. A practical guide to splines. In

Applied Mathematical Sciences , 1978. [p1]T. N. Goodman. A class of orthogonal reﬁnable functions and wavelets.

Constructive approximation , 19(4):525–540, 2003. [p2]X. Liu, H. Nassar, and K. Podgórski. Splinets - efﬁcient orthonormalization of the b-splines.

ArXiv ,abs/1910.07341, 2019. [p2, 17, 19]J. Mason, G. Rodriguez, and S. Seatzu. Orthogonal splines based on b-splines – with applications toleast squares, smoothing and regularisation problems.

Numerical Algorithms , 5(1):25–40, 1993. [p2]T. Nguyen. Construction of spline type orthogonal scaling functions and wavelets. Honors ProjectPaper 19, Illinois Wesleyan University, 2015. [p2]A. Perperoglou, W. Sauerbrei, M. Abrahamowicz, and M. Schmid. A review of spline functionprocedures in r.

BMC Medical Research Methodology , 19(1):46, 2019. doi: 10.1186/s12874-019-0666-3.URL https://doi.org/10.1186/s12874-019-0666-3 . [p1]K. Podgórski, I. Rychlik, and J. Wallin. Slepian noise approach for gaussian and Laplace movingaverage processes.

Extremes , 18(4):665–695, 2015. doi: 10.1007/s10687-015-0227-z. URL https://doi.org/10.1007/s10687-015-0227-z . [p25]K. Qin. General matrix representations for b -splines. Vis. Comput. , 16:177–186, 2000. [p4]A. Redd. A comment on the orthogonalization of b -spline basis functions and their derivatives. Stat.Comput , 22:251–257, 2012. [p4, 19]L. Schumaker.

Spline functions: basic theory . Cambridge University Press, 2007. [p1]L. Zhou, J. Huang, and R. Carroll. Joint modeling of paired sparse functional data using principalcomponents.

Biometrika , 95:601–619, 2008. [p4]