[PDF] Adaptive spline fitting with particle swarm optimization

Abstract

In fitting data with a spline, finding the optimal placement of knots can significantly improve the quality of the fit. However, the challenging high-dimensional and non-convex optimization problem associated with completely free knot placement has been a major roadblock in using this approach. We present a method that uses particle swarm optimization (PSO) combined with model selection to address this challenge. The problem of overfitting due to knot clustering that accompanies free knot placement is mitigated in this method by explicit regularization, resulting in a significantly improved performance on highly noisy data. The principal design choices available in the method are delineated and a statistically rigorous study of their effect on performance is carried out using simulated data and a wide variety of benchmark functions. Our results demonstrate that PSO-based free knot placement leads to a viable and flexible adaptive spline fitting approach that allows the fitting of both smooth and non-smooth functions.

Full PDF

aa r X i v : . [ s t a t . C O ] J un Adaptive spline ﬁtting with particle swarm optimization

Soumya D. Mohanty ∗ Department of Physics and Astronomy, The University of Texas Rio Grande Valley,One West University Blvd., Brownsville, TX 78520, USA

Ethan Fahnestock † Department of Physics and Astronomy, The University of Rochester, 500 Wilson Blvd., Rochester, NY 14627, USA (Dated: March 2019)In ﬁtting data with a spline, ﬁnding the optimal placement of knots can signiﬁcantly improve thequality of the ﬁt. However, the challenging high-dimensional and non-convex optimization problemassociated with completely free knot placement has been a major roadblock in using this approach.We present a method that uses particle swarm optimization (PSO) combined with model selectionto address this challenge. The problem of overﬁtting due to knot clustering that accompanies freeknot placement is mitigated in this method by explicit regularization, resulting in a signiﬁcantlyimproved performance on highly noisy data. The principal design choices available in the methodare delineated and a statistically rigorous study of their eﬀect on performance is carried out usingsimulated data and a wide variety of benchmark functions. Our results demonstrate that PSO-basedfree knot placement leads to a viable and ﬂexible adaptive spline ﬁtting approach that allows theﬁtting of both smooth and non-smooth functions.

I. INTRODUCTION

A spline of order k is a piecewise polynomial functionthat obeys continuity conditions on its value and its ﬁrst k − knots , where thepieces join [1]. Splines play an important role in non-parametric regression [2–4], simply called curve ﬁttingwhen the data is one dimensional, where the outcome isnot assumed to have a predetermined form of functionaldependence on the predictor.It has long been recognized [5–8] that the quality ofa spline ﬁt depends signiﬁcantly on the locations of theknots deﬁning the spline. Determining the placement ofknots that is best adapted to given data has proven to bea challenging non-linear and non-convex, not to mentionhigh-dimensional, optimization problem that has resisteda satisfactory solution.A diverse set of methods have been proposed that ei-ther attempt this optimization problem head-on or solvean approximation to it in order to get a reasonable solu-tion. In the latter category, methods based on knot in-sertion and deletion [9–13] have been studied extensively.In these methods, one starts with a ﬁxed set of sites forknots and performs a step-wise addition or removal ofknots at these sites. The best number of knots is deter-mined by a model selection criterion such as GeneralizedCross Validation (GCV) [8, 14]. Step-wise change in knotplacement is not an eﬃcient exploration of the continuousspace of possible knot positions and the end result, whilecomputationally inexpensive to obtain and tractable tomathematical analysis, is not necessarily the best possi-ble [15]. Another approach explored in the literature is ∗ Electronic address: [email protected] † Electronic address: [email protected] the two-stage framework in which the ﬁrst stage identi-ﬁes a subset of active or dominant knots and the secondstage merges them in a data dependent way to obtain areduced set of knots [16–18]. These methods have showngood performance for low noise applications.In attempts at solving the optimization challengedirectly, general purpose stochastic optimization al-gorithms (metaheuristics) such as Genetic Algorithm(GA) [19], Artiﬁcial Immune System (AIS) [20] or thosebased on Markov Chain Monte Carlo (MCMC) [21], havebeen studied [22–25]. These methods have proven quitesuccessful in solving many challenging high-dimensionaloptimization problems in other ﬁelds and it is only nat-ural to employ them for the problem of free knot place-ment. However, GA and AIS are more suited to discreteoptimization problems rather than the inherently contin-uous one in knot optimization, and MCMC is compu-tationally expensive. Thus, there is plenty of scope forusing other metaheuristics to ﬁnd better solutions.It was shown in [26], and independently in [27], thatParticle Swarm Optimization (PSO) [28], a relatively re-cent entrant to the ﬁeld of nature inspired metaheuristicssuch as GA, is a promising method for the free knot place-ment problem. PSO is governed by a much smaller setof parameters than GA or MCMC and most of these donot appear to require much tuning from one problem toanother. In fact, as discussed later in the paper, essen-tially two parameters are all that need to be explored toﬁnd a robust operating point for PSO.An advantage of free knot placement is that a subset ofknots can move close enough to be considered as a singleknot with a higher multiplicity. A knot with multiplicity > SHAPES ), has the ﬂexibility to ﬁt non-smooth functionsas well as smooth ones without any change in algo-rithm settings. It uses model selection to determine thebest number of knots, and reduces estimation bias aris-ing from the regularization using a least squares derivedrescaling. Some of the elements of

SHAPES outlined abovewere explored in [30] in the context of a single examplewith a simple and smooth function. However, the cru-cial feature of allowing knots to merge was missing therealong with the step of bias reduction. (The bias reduc-tion step does not seem to have been used elsewhere tothe best of our knowledge.)Various design choices involved in

SHAPES are identiﬁedclearly and their eﬀects are examined using large-scalesimulations and a diverse set of benchmark functions.Most importantly,

SHAPES is applied to data with a muchhigher noise level than has traditionally been consideredin the ﬁeld of adaptive spline ﬁtting and found to havepromising performance. This sets the stage for furtherdevelopment of the adaptive spline methodology for newapplication domains.The rest of the paper is organized as follows. Sec. IIprovides a brief review of pertinent topics in spline ﬁt-ting. The PSO metaheuristic and the particular variantused in this paper are reviewed in Sec. III. Details of

SHAPES are described in Sec. IV along with the principaldesign choices. The setup used for our simulations is de-scribed in Sec. V. Computational aspects of

SHAPES areaddressed in Sec. VI. This is followed by the presentationof results in Sec. VII. Our conclusions are summarized inSec. VIII.

II. FITTING SPLINES TO NOISY DATA

In this paper, we consider the one-dimensional regres-sion problem y i = f ( x i ) + ǫ i , (1) i = 0 , , . . . , N − x = 0, x N − = 1, x i +1 > x i , with f ( x ) unknown and ǫ i drawn independently from N (0 , b f ( x ), given { y i } , of f ( x ).To obtain a non-trivial solution, the estimation prob-lem must be regularized by restricting b f ( x ) to some spec-iﬁed class of functions. One reasonable approach is to re-quire that this be the class of “smooth” functions, and ob-tain the estimate as the solution of the variational prob-lem, b f = arg min f " N − X i =0 ( y i − f ( x i )) + λ Z dx ( f ′′ ( x )) . (2)It can be shown that the solution belongs to the spaceof cubic splines deﬁned by { x i } as the set of knots.Consequently, b f is known as the smoothing spline esti-mate [3, 32]. In Eq. 2, the ﬁrst term on the right mea-sures the ﬁdelity of the model to the observations and thesecond term penalizes the “roughness”, measured by theaverage squared curvature, of the model. The trade-oﬀbetween these competing requirements is controlled by λ ≥

0, called the regulator gain or smoothing parameter.The best choice for λ is the principle issue in practi-cal applications of smoothing spline. The use of GCVto adaptively determine the value of λ was introducedin [33] and is used, for example, in the implementation ofsmoothing spline in the R [34] stats package. A scalar λ , adaptively selected or otherwise, is not well suited tohandle a function with a heterogeneous roughness distri-bution across its domain. The use of a spatially adaptivegain function, λ ( x ), has been investigated in diﬀerentforms [35–38] to address this issue.A diﬀerent regularization approach is to eschew an ex-plicit penalty term and regularize the ﬁtting problem byrestricting the number of knots to be ≪ N . This leadsto the regression spline [5] estimate in which b f ( x ) is rep-resented as a linear combination of a ﬁnite set of basisfunctions – the so-called B-spline functions [1, 39] be-ing a popular choice – that span the space of splinesassociated with the chosen knot sequence and polyno-mial order. Diﬀerent methods for adaptive selection ofthe number of knots, which is the main free parameterin regression spline, have been compared in [40]. Theasymptotic properties of smoothing and regression splineestimates have been analyzed theoretically in [41].Smoothing and regression splines are hybridized in the penalized spline [31, 42, 43] approach: the deviation ofthe spline model from the data is measured by the leastsquares function as in the ﬁrst term of Eq. 2 but thepenalty becomes a quadratic form in the coeﬃcients ofthe spline in the chosen basis set. As in the case ofsmoothing spline, adaptive selection of the scalar reg-ulator gain can be performed using GCV [31] and locallyadaptive gain coeﬃcients have been proposed in [44–47].The performance of alternatives to GCV for selection ofa scalar regulator gain have been investigated and com-pared in [48].While penalized spline is less sensitive to the numberof knots, it is still a free parameter of the algorithm thatmust be speciﬁed. Joint adaptive selection of the num-ber of knots and regulator gain has been investigatedin [8, 49] using GCV. Other model selection methods canalso be used for adaptive determination of the number ofknots (see Sec. II C). A. B-spline functions

Given a set of M knots b = ( b , b , . . . , b M − ), b i ∈ [0 , b i +1 > b i , and given order k of the spline polynomi-als, the set of splines that interpolates { ( y i , b i ) } , y i ∈ R ,forms a linear vector space of dimensionality M + k − τ = ( τ , τ , . . . , τ P − ), τ i +1 ≥ τ i with P > M knots, in which a knot can appearmore than once. The number of repetitions of any knotcannot be greater than k . Also, τ j = b for 0 ≤ j ≤ k −

1, and τ j = b M − for P − k ≤ j ≤ P −

1. Thespan of B-spline functions deﬁned over a knot sequencewith repetitions can contain functions that have jumpdiscontinuities in their values or in their derivatives. (Thedimensionality of the span is P − k .)The Cox-de Boor recursion relations [50] given belowprovide an eﬃcient way to compute the set of B-splinefunctions, { B i,k ( x ; τ ) } , for any given order. The recur-sions start with B-splines of order 1, which are piecewiseconstant functions B j, ( x ; τ ) = (cid:26) , τ j < = x < τ j +1 . (3)For 2 ≤ k ′ ≤ k , B j,k ′ ( x ) = ω j,k ′ ( x ) B j,k ′ − ( x ) + γ j +1 ,k ′ ( x ) B j +1 ,k ′ − ( x ) , (4) ω j,k ′ ( x ) = ( x − τ j τ j + k ′− − τ j , τ j + k ′ − = τ j , τ j + k ′ − = τ j , (5) γ j,k ′ ( x ) = (cid:26) − ω j,k ′ ( x ) , τ j + k ′ − = τ j , τ j + k ′ − = τ j . (6)In the recursion above, 0 ≤ j ≤ P − k ′ −

1. Fig. 1 providesan illustration of B-spline functions.The regression spline method is elegantly formulatedin terms of B-spline functions. The estimate is assumedto belong to the parametrized family of linearly combinedB-spline functions, f ( x ; α, τ ) = P − k − X j =0 α j B j,k ( x i ; τ ) , (7) FIG. 1: Cubic B-spline functions { B i, ( x ; τ ) } , i = 0 , , . . . , τ ) marked by squares. Forvisual clarity, alternate B-spline functions are shown in blackand gray. Knots with multiplicity > where α = ( α , α , . . . , α P − k − ). The least-squares esti-mate is given by b f ( x ) = f ( x ; b α, b τ ), where b α and b τ mini-mize L ( α, τ ) = N − X i =0 ( y i − f ( x i ; α, τ )) . (8) B. Regression and penalized spline with free knotplacement

The penalized spline estimate is found by minimizing L λ ( α, τ ) = L ( α, τ ) + λR ( α ) , (9)over the spline coeﬃcients (c.f. Eq. 7), where R ( α ) isthe penalty, while keeping the number of knots and knotlocations ﬁxed. In this paper, we choose R ( α ) = P − k − X j =0 α j , (10)for reasons explained below.Formally, the penalty function can be derived by sub-stituting Eq. 7 in the roughness penalty. This would leadto a quadratic form similar to the penalty in Eq. 10 butwith a kernel matrix that is not the identity matrix [51].The elements of this matrix would be Euclidean innerproducts of B-spline derivatives. However, using such apenalty adds a substantial computational burden in freeknot placement because it has to be recomputed everytime the knot placement changes. Computational aspectsof this problem are discussed in [42], where a simpliﬁedform of the roughness penalty is used that is based onthe diﬀerences of coeﬃcients of adjacent B-splines. Thisis a good approximation for the case considered in [42]of a large number of ﬁxed knots and closely spaced B-splines, but not necessarily for free knots that may besmall in number and widely spread out. Another perhapsmore important consideration is that repeated knots infree knot placement result in B-splines with discontinu-ous derivatives. This makes the kernel matrix particu-larly challenging for numerical evaluation and increasescode complexity. In this paper, we avoid the above is-sues by using the simple form of the penalty function inEq. 10 and leave the investigation of more appropriateforms to future work. We note that the exploration ofinnovative penalty functions is an active topic of research(e.g., [43, 52, 53]).While the reduction of the number of knots in regres-sion spline coupled with the explicit regularization of pe-nalized spline reduces overﬁtting, the ﬁt is now sensi-tized to where the knots are placed. Thus, the completemethod involves the minimization of L λ ( α, τ ) (c.f., Eq. 9)over both α and τ . (The method of regression spline withknot optimization and explicit regularization will be re-ferred to as adaptive spline in the following.)Minimization of L λ over α and τ can be nested asfollows. min τ,α L λ ( α, τ ) = min τ (cid:16) min α L λ ( α, τ ) (cid:17) . (11)The solution, b α ( τ ), of the inner minimization is expressedin terms of the ( P − k )-by- N matrix B ( τ ), with B m,n ( τ ) = B m,k ( x n ; τ ) , (12)as b α ( τ ) = y B T G − , (13) G = BB T + λ I , (14)where I is the ( P − k )-by-( P − k ) identity matrix. Theouter minimization over τ of F λ ( τ ) = L λ ( b α ( τ ) , τ ) , (15)needs to be performed numerically.Due to the fact that freely moveable knots can coin-cide, and that this produces discontinuities in B-splinefunctions as outlined earlier, curve ﬁtting by adaptivespline can accommodate a broader class of functions –smooth with localized discontinuities – than smoothingor penalized spline.The main bottleneck in implementing the adaptivespline method is the global minimization of F λ ( τ ) sinceit is a high-dimensional non-convex function having mul-tiple local minima. Trapping by local minima ren-ders greedy methods ineﬀective and high dimensionalitymakes a brute force search for the global minimum com-putationally infeasible. This is where PSO enters thepicture and, as shown later, oﬀers a way forward. C. Model selection

In addition to the parameters α and τ , adaptive splinehas two hyper-parameters, namely the regulator gain λ and the number of interior knots P − k − K parameters θ = ( θ , θ , . . . , θ K − ),AIC = 2 K − θ ln (cid:0) Λ( θ ) (cid:1) , (16)where Λ( θ ) is the likelihood function. The speciﬁc ex-pression for AIC used in SHAPES is provided in Sec. IV.

III. PARTICLE SWARM OPTIMIZATION

Under the PSO metaheuristic, the function to be opti-mized (called the ﬁtness function ) is sampled at a ﬁxednumber of locations (called particles ). The set of parti-cles is called a swarm . The particles move in the searchspace following stochastic iterative rules called dynami-cal equations . The dynamical equations implement twoessential features called cognitive and social forces. Theyserve to retain “memories” of the best locations found bythe particle and the swarm (or a subset thereof) respec-tively.Since its introduction by Kennedy and Eberhart [28],the PSO metaheuristic has expanded to include a largediversity of algorithms [55]. In this paper, we considerthe variant called local-best (or lbest ) PSO [56]. We beginwith the notation [57] for describing lbest PSO. • F ( x ): the scalar ﬁtness function to be minimized,with x = ( x , x , . . . , x d ) ∈ R d . In our case, x is τ , F is F λ ( τ ) (c.f., Eq. 15), and d = P − k − • S ⊂ R d : the search space deﬁned by the hypercube a j ≤ x j ≤ b j , = 1 , , . . . , d in which the globalminimum of the ﬁtness function must be found. • N p : the number of particles in the swarm. • x i [ k ] ∈ R d : the position of the i th particle at the k th iteration. • v i [ k ] ∈ R d : a vector called the velocity of the i th particle that is used for updating the position of aparticle. • p i [ k ] ∈ R d : the best location found by the i th par-ticle over all iterations up to and including the k th . p i [ k ] is called the personal best position of the i th particle. F ( p i [ k ]) = min ≤ j ≤ k F ( x i [ j ]) . (17) • n i [ k ]: a set of particles, called the neighborhood ofparticle i , n i [ k ] ⊆ { , , . . . , N p } \ { i } . There aremany possibilities, called topologies , for the choiceof n i [ k ]. In the simplest, called the global besttopology, every particle is the neighbor of everyother particle: n i [ k ] = { , , . . . , N p } \ { i } . Thetopology used for lbest PSO in this paper is de-scribed later. • l i [ k ] ∈ R d : the best location among the particlesin n i [ k ] over all iterations up to and including the k th . l i [ k ] is called the local best for the i th particle. F ( l i [ k ]) = min j ∈{ i }∪ n i [ k ] F ( p j [ k ]) . (18) • p g [ k ] ∈ R d : The best location among all the parti-cles in the swarm, p g [ k ] is called the global best . F ( p g [ k ]) = min ≤ i ≤ N p F ( p i [ k ]) . (19)The dynamical equations for lbest PSO are as follows. v i [ k + 1] = w [ k ] v i [ k ] + c ( p i [ k ] − x i [ k ]) r + c ( l i [ k ] − x i [ k ]) r , (20) x i [ k + 1] = x i [ k ] + z i [ k + 1] , (21) z ji [ k ] =  v ji [ k ] , − v j max ≤ v ji [ k ] ≤ v j max − v j max , v ji [ k ] < − v j max v j max , v ji [ k ] > v j max . (22)Here, w [ k ] is a deterministic function known as the iner-tia weight, c and c are constants, and r i is a diagonalmatrix with iid random variables having a uniform dis-tribution over [0 , velocity clamping .The iterations are initialized at k = 1 by indepen-dently drawing (i) x ji [1] from a uniform distribution over[ a j , b j ], and (ii) v ji [1] from a uniform distribution over[ a j − x ji [1] , b j − x ji [1]]. For termination of the iterations,we use the simplest condition: terminate when a pre-scribed number N iter of iterations are completed. Thesolutions found by PSO for the minimizer and the min-imum value of the ﬁtness are p g [ N iter ] and F ( p g [ N iter ])respectively. Other, more sophisticated, termination con-ditions are available [55], but the simplest one has servedwell across a variety of regression problems in our expe-rience.The second and third terms on the RHS of Eq. 20are the cognitive and social forces respectively. On aver-age they attract a particle towards its personal and lo-cal bests, promoting the exploitation of an already goodsolution to ﬁnd better ones nearby. The term contain-ing the inertia weight, on the other hand, promotes mo-tion along the same direction and allows a particle to re-sist the cognitive and social forces. Taken together, theterms control the exploratory and exploitative behaviourof the algorithm. We allow the inertia weight w [ k ] to decrease linearly with k from an initial value w max to aﬁnal value w min in order to transition PSO from an initialexploratory to a ﬁnal exploitative phase.For the topology, we use the ring topology with 2 neigh-bors in which n i [ k ] =  { i − , i + 1 } , i / ∈ { , N p }{ N p , i + 1 } , i = 1 { i − , } , i = N p . (23)The local best, l i [ k ], in the k th iteration is updated afterevaluating the ﬁtnesses of all the particles. The velocityand position updates given by Eq. 20 and Eq. 21 respec-tively form the last set of operations in the k th iteration.To handle particles that exit the search space, we usethe “let them ﬂy” boundary condition under which a par-ticle outside the search space is assigned a ﬁtness value of ∞ . Since both p i [ k ] and l i [ k ] are always within the searchspace, such a particle is eventually pulled back into thesearch space by the cognitive and social forces. A. PSO tuning

Stochastic global optimizers, including PSO, that ter-minate in a ﬁnite number of iterations do not satisfy theconditions laid out in [58] for convergence to the globaloptimum. Only the probability of convergence can beimproved by tuning the parameters of the algorithm fora given optimization problem.In this sense, most of the parameters involved in PSOare found to have fairly robust values when tested acrossan extensive suite of benchmark ﬁtness functions [59].Based on widely prevalent values in the literature, theseare: N p = 40, c = c = 2 . w max = 0 . w min = 0 . v j max = 0 . b j − a j ].Typically, this leaves the maximum number of iter-ations, N iter , as the principal parameter that needs tobe tuned. However, for a given N iter , the probability ofconvergence can be increased by the simple strategy ofrunning multiple, independently initialized runs of PSOon the same ﬁtness function and choosing the best ﬁt-ness value found across the runs. The probability ofmissing the global optimum decreases exponentially as(1 − P conv ) N runs , where P conv is the probability of success-ful convergence in any one run and N runs is the numberof independent runs.Besides N iter , therefore, N runs is the remaining param-eter that should be tuned. If the independent runs canbe parallelized, N runs is essentially ﬁxed by the availablenumber of parallel workers although this should not bestretched to the extreme. If too high a value of N runs isneeded in an application (say N runs ≥ P conv should be increased by tuning theother PSO parameters or by exploring a diﬀerent PSOvariant. In this paper, we follow the simpler way of tun-ing N runs by setting it to N runs = 4, the typical numberof processing cores available in a high-end desktop. Input: • y ← Data • N runs ← Number of PSO runs • N iter ← Maximum number of iterations • N knots ← { M , M , . . . , M max } ; Number of knots (notcounting repetitions) • λ ← Regulator gain

Execute:for M ∈ N knots do ⊲ Loop over models for r ∈ { , , . . . , N runs } do ⊲ (Parallel) loop over PSOruns b τ ( r ) ← arg min τ F λ ( τ ) using PSO ⊲ Best location b α ( r ) ← B-spline coeﬃcients corresponding to b τ ( r ) F ( M, r ) ← F λ ( b τ ( r )) ⊲ Best ﬁtness value end for r M ← arg min r F ( M, r ) ⊲ Best PSO runAIC( M ) ← AIC for F ( M, r M ) (c.f., Eq. 24) b f ( M ) ← Estimated function corresponding to b τ ( r M )and b α ( r M ) end for M best ← arg min M AIC( M ) ⊲ Model with lowest AIC b f ← b f ( M best ) b f ← Bias corrected b f (c.f., Sec. IV A) Output: • M best ⊲ Best model • Estimated, bias-corrected b f ⊲ Estimated function frombest model • F ( M best , r M best ) ⊲ Fitness of best modelFIG. 2: Pseudo-code for the

SHAPES algorithm. All quantitieswith parenthesized integer arguments stand for arrays, withthe argument as the array index.

IV.

SHAPES

ALGORITHM

The

SHAPES algorithm is summarized in the pseudo-code given in Fig. 2. The user speciﬁed parameters ofthe algorithm are (i) the number, N runs , of PSO to useper data realization; (ii) the number of iterations, N iter ,to termination of PSO; (iii) the set of models, N knots , overwhich AIC based model selection (see below) is used; (iv)the regulator gain λ . Following the standard initializa-tion condition for PSO (c.f., Sec. III), the initial knots foreach run of PSO are drawn independently from a uniformdistribution over [0 , SHAPES is speciﬁed by the number of non-repeating knots. For each model M ∈ N knots , F ( M, r M )denotes the ﬁtness value, where 1 ≤ r M ≤ N runs is thebest PSO run. The AIC value for the model is given byAIC = 4 M + F ( M, r M ) , (24)which follows from the number of optimized parametersbeing 2 M (accounting for both knots and B-spline co-eﬃcients) and the log-likelihood being proportional tothe least squares function for the noise model used here.(Additive constants that do not aﬀect the minimizationof AIC have been dropped.)The algorithm acts on given data y to produce (i) thebest ﬁt model M best ∈ N knots ; (ii) the ﬁtness value associ-ated with the best ﬁt model; (iii) the estimated function b f from the best ﬁt model. The generation of b f includesa bias correction step described next. A. Bias correction

The use of a non-zero regulator gain leads to shrinkagein the estimated B-spline coeﬃcients. As a result, thecorresponding estimate, b f , has a systematic point-wisebias towards zero. A bias correction transformation isapplied to b f as follows.First, the unit norm estimated function b u is obtained, b u = b f k b f k , (25)where k b f k = [ P b f i ] / is the L norm.Next, a scaling factor A is estimated as A = arg min a N − X i =0 ( y i − a b u ) . (26)The ﬁnal estimate is given by b f = A b u .As discussed earlier in Sec. II B (c.f. Eq. 9 and Eq. 10),the penalty used in this paper is one among several alter-natives available in the literature. For some forms of thepenalty, there need not be any shrinkage in the B-splinecoeﬃcients and the bias correction step above would beunnecessary. B. Knot merging and dispersion

In both of the mappings described in Sec. IV C, itis possible to get knot sequences in which a subset( τ i , τ i +1 , . . . , τ i + m − ) of 1 < m ≤ M − x j , x j +1 ) between two consecu-tive predictor values. There are two possible options tohandle such a situation. • Heal: Overcrowded knots are dispersed such thatthere is only one knot between any two consecu-tive predictor values. This can be done iterativelyby moving a knot to the right or left dependingon the diﬀerence in distance to the correspondingneighbors. • Merge: All the knots in an overcrowded set aremade equal to the rightmost knot τ i + m − until itsmultiplicity saturates at k . The remaining knots, τ i to τ i + m − − k , are equalized to the remaining right-most knot τ i + m − − k until its multiplicity staturatesto k , and so on. (Replacing rightmost by leftmostwhen merging is an equally valid alternative.) Fi-nally, if more than one set of merged knots remainwithin an interval ( x j , x j +1 ), they are dispersed byhealing.If only healing is used, SHAPES cannot ﬁt curves that havejump discontinuities in value or derivatives. Therefore,if it is known that the unknown curve in the data is freeof jump discontinuities, healing acts as an implicit regu-larization to enforce this condition. Conversely, mergingshould be used when jump discontinuities cannot be dis-counted.It is important to note that in both healing and merg-ing, the number of knots stays ﬁxed at M +2( k −

1) where M ∈ N knots . C. Mapping particle location to knots

For a given model M ∈ N knots , the search space forPSO is M dimensional. Every particle location, z =( z , z , . . . , z M − ), in this space has to be mapped to an M + 2( k −

1) element knot sequence τ before evaluatingits ﬁtness F λ ( τ ).We consider two alternatives for the map from z to τ . • Plain: z is sorted in ascending order. After sorting, k − z and z M − are prepended andappended respectively to z . These are the repeatedend knots as described in Sec. II A. • Centered-monotonic: In this scheme [60], thesearch space is the unit hypercube: z i ∈ [0 , ∀ i .First, an initial set of M knots is obtained from z = τ , (27) z ≤ i ≤ M − = τ i − τ i − τ i +1 − τ i − , (28) z M − = τ M − − τ − τ . (29)This is followed by prepending and appending k − τ and τ M − respectively to the initialknot sequence.In the plain map, any permutation of z maps into thesame knot sequence due to sorting. This creates degen-eracy in F λ , which may be expected to make the taskof global minimization harder for PSO. The centered-monotonic map is designed to overcome this problem: byconstruction, it assigns a unique τ to a given z . Moreover, τ is always a monotonic sequence, removing the need fora sorting operation. This map also has the nice normal-ization that the center of the search space at z i = 0 . ≤ i ≤ M −

2, corresponds to uniform spacing of theinterior knots.It should be noted here that the above two mapsare not the only possible ones. The importance of the“lethargy theorem” (degeneracy of the ﬁtness function)and using a good parametrization for the knots in regres-sion spline was pointed out by Jupp [7] back in 1978. Alogarithmic map for knots was proposed in [7] that, whilenot implemented in this paper, should be examined in fu-ture work.

D. Optimization of end knots

When ﬁtting curves to noisy one-dimensional data ina signal processing context, a common situation is thatthe signal is transient and localized well away from theend points x and x N − of the predictor. However, thelocation of the signal in the data – its time of arrivalin other words – may be unknown. In such a case, itmakes sense to keep the end knots free and subject tooptimization.On the other hand, if it is known that the curve occu-pies the entire predictor range, it is best to ﬁx the endknots by keeping z and z M − ﬁxed. (This reduces thedimensionality of the search space for PSO by 2.) E. Retention of end B-splines

The same signal processing scenario considered abovesuggests that, for signals that decay smoothly to zero attheir start and end, it is best to drop the end B-splinefunctions because they have a jump discontinuity in value(c.f., Fig 1). In the contrary case, the end B-splines maybe retained so that the estimated signal can start or endat non-zero values.

V. SIMULATION STUDY SETUP

We examine the performance of

SHAPES on simulateddata with a wide range of benchmark functions. In thissection, we present these functions, the simulation pro-tocol used, the metrics for quantifying performance, anda scheme for labeling test cases that is used in Sec. VII.(In the following, the terms “benchmark function” and“benchmark signal” are used interchangeably.)

A. Benchmark functions

The benchmark functions used in this study are listedin Table I and plotted in Fig. 3.Function f has a sharp change but is diﬀerentiable ev-erywhere. Functions f and f have jump discontinuities,and f has a jump discontinuity in its slope. Functions f and f are smooth but sharply peaked. Functions f to f all decay to zero at both ends and serve to modelsmooth but transient signals; f to f are designed torequire progressively higher number of knots for ﬁtting; f is an oscillatory signal that is typical for signal pro-cessing applications and expected to require the highestnumber of knots. In addition, f and f test the abilityof SHAPES to localize time of arrival.

TABLE I: The benchmark functions used in this paper. Thesources from which the functions have been obtained are: f to f [24]; f [23]; f [61, 62]; f [63]; f [30]. Functions f to f are introduced here. Expression Domain f ( x ) = 90(1 + e − x − . ) − x ∈ [0 , f ( x ) = ( (0 .

01 + ( x − . ) − (0 .

015 + ( x − . ) − ≤ x < . . ≤ x ≤ f ( x ) = 100 e −| x − | + (10 x − / x ∈ [0 , f ( x ) = sin( x ) + 2 e − x x ∈ [ − , f ( x ) = sin(2 x ) + 2 e − x + 2 x ∈ [ − , f ( x ) =  x (3 − x ) x (4 x − x + 7) − x ( x − ≤ x < . . ≤ x < . . ≤ x ≤ f ( x ) = B , ( x ; τ ) ; τ = ( τ , τ , . . . , τ ) τ i =  . , ≤ i ≤ . , ≤ i ≤ τ , . . . , τ ) = (0 . , . , . , . , . x ∈ [0 , f ( x ) = B , ( x ; τ ) + B , ( x − . τ ) x ∈ [0 , f ( x ) = B , ( x − . , τ ) + B , ( x − . τ ) x ∈ [0 , f ( x ) = e − ( x − . . sin (10 . π ( x − . x ∈ [0 , FIG. 3: Benchmark functions normalized to have SNR = 100.The function name is indicated in the upper left corner ofeach panel. The abscissa in each panel is identical to the oneshowing f . B. Data simulation

Following the regression model in Eq. 1, a simulateddata realization consists of pseudorandom iid noise drawnfrom N (0 ,

1) added to a given benchmark function thatis sampled uniformly at 256 points in [0 , SHAPES across a rangeof signal to noise ratio (SNR) deﬁned as,SNR = k f k σ , (30) where f is a benchmark function and σ is the standarddeviation – set to unity in this paper – of the noise.For each combination of benchmark function and SNR, SHAPES is applied to N R = 1000 independent data re-alizations. This results in 1000 corresponding estimatedfunctions. Statistical summaries, such as the point-wisemean and standard deviation of the estimate, are com-puted from this set of estimated functions. C. Metrics

The principal performance metric used in this paper isthe sample root mean squared error (RMSE):RMSE =  N R N R X j =1 k f − b f j k  / , (31)where f is the true function in the data and b f j its es-timate from the j th data realization. We use bootstrapwith 10 independently drawn samples with replacementfrom the set {k f − b f j k } to obtain the sampling error inRMSE.A secondary metric that is useful is the sample mean ofthe number of knots in the best ﬁt model. To recall, thisis the average of M best ∈ N knots over the N R data real-izations, where M best and N knots were deﬁned in Sec. IV.The error in M best is estimated by its sample standarddeviation. D. Labeling scheme

Several design choices in

SHAPES were described inSec. IV. A useful bookkeeping device for keeping trackof the many possible combinations of these choices is thelabeling scheme presented in Table II.Following this labeling scheme, a string such asLP 100 0 . λ = 0 .

1; maximum number of PSO iterations set to 50;end knots ﬁxed; end B-splines retained; merging of knotsallowed.

VI. COMPUTATIONAL CONSIDERATIONS

The results in this paper were obtained with a code im-plemented entirely in

Matlab [64]. Some salient pointsabout the code are described below.The evaluation of B-splines uses the eﬃcient algorithmgiven in [1]. Since our current B-spline code is not vector-ized, it suﬀers a performance penalty in

Matlab . (Weestimate that it is ≈

50% slower as a result.) Nonethe-less, the code is reasonably fast: A single PSO run ona single data realization, for the more expensive case of

PSO algorithm (Sec. III) L : lbest PSO ∗ Knot Map (Sec. IV C) P : C :Plain Centered-monotonicSNR (Eq. 30) (Numerical) λ (Eq. 9) (Numerical) N iter (Number of PSO iterations) (Numerical)End knots (Sec. IV D) F : Fixed V : VariableEnd B-splines (Sec. IV E) K : Keep D : DropKnot merging (Sec. IV B) M : Merge H : Heal TABLE II: Labeling scheme for a combination of designchoices in

SHAPES . The string labeling a combination is formedby going down the rows of the table and (a) picking one letterfrom the last two columns of each row, or (b) inserting thevalue of a numerical quantity. Numerical values in the keystring are demarcated by underscores on both sides. Thus, akey string looks like Y Y X X X Y Y Y where Y i and X i stand for letter and numerical entries respectively, and i is therow number of the table starting from the top. We have leftthe possibility open for replacing lbest PSO with some othervariant in the future. This is indicated by the ‘ ∗ ’ symbol inthe top row. SNR = 100, takes about 11 sec on an Intel Xeon (3.0GHz) class processor. It is important to note that therun-time above is speciﬁc to the set, N knots , of modelsused. In addition, due to the fact that the number ofparticles breaching the search space boundary in a givenPSO iteration is a random variable and that the ﬁtnessof such a particle is not computed, the actual run timesvary slightly for diﬀerent PSO runs and data realizations.The only parallelization used in the current code is overthe independent PSO runs. Proﬁling shows that ≈ ≈

45% isspent in evaluating B-splines. Further substantial savingin run-time is, therefore, possible if particle ﬁtness eval-uations are also parallelized. This dual parallelization iscurrently not possible in the

Matlab code but, giventhat we use N p = 40 particles, parallelizing all N p ﬁtnessevaluations can be expected to reduce the run-time byabout an order of magnitude. However, realizing such alarge number of parallel processes needs hardware accel-eration using, for example, Graphics Processing Units.The operations count in the most time-consumingparts of the code (e.g., evaluating B-splines) scales lin-early with the length of the data. Hence, the projectedratios above in run-time speed-up are not expected tochange much with data length although the overall run-time will grow linearly.The pseudorandom number streams used for the simu-lated noise realizations and in the PSO dynamical equa-tions utilized built-in and well-tested default generators.The PSO runs were assigned independent pesudorandomstreams that were initialized, at the start of processingany data realization, with the respective run number asthe seed. This (a) allows complete reproducibility of re-sults for a given data realization, and (b) does not breachthe cycle lengths of the pseudorandom number generators when processing a large number of data realizations. VII. RESULTS

The presentation of results is organized as follows.Sec. VII A shows single data realizations and estimatesfor a subset of the benchmark functions. Sec. VII B an-alyzes the impact of the regulator gain λ on estimation.Sec. VII C and Sec. VII D contain results for SNR = 100and SNR = 10 respectively. Sec. VII E shows the eﬀect ofthe bias correction step described in Sec. IV A on the per-formance of SHAPES for both SNR values. In Sec. VII F,we compare the performance of

SHAPES with two well-established smoothing methods, namely, wavelet-basedthresholding and shrinkage [65], and smoothing splinewith adaptive selection of the regulator gain [33]. Theformer follows an approach that does not use splinesat all, while the latter uses splines but avoids free knotplacement. As such, they provide a good contrast to theapproach followed in

SHAPES .In all applications of

SHAPES , the set of models usedwas N knots = { , , , , , , , , , } . The spacing between the models is set wider for higherknot numbers in order to reduce the computational bur-den involved in processing a large number of data realiza-tions. In an application involving just a few realizations,a denser spacing may be used.Fig. 4 shows the performance of lbest PSO across theset of benchmark functions as a function of the parameter N iter . Given that the ﬁtness values do not change in astatistically signiﬁcant way when going from N iter = 100to N iter = 200 in the SNR=100 case, we set it to theformer as it saves computational cost. A similar plot ofﬁtness values (not shown) for SNR = 10 is used to set N iter = 50 for the SNR = 10 case. A. Sample estimates

In Fig. 5, we show function estimates obtained with

SHAPES for arbitrary single data realizations. While notstatistically rigorous, this allows an initial assessment ofperformance when the SNR is suﬃciently high. Alsoshown with each estimate is the location of the knotsfound by

SHAPES .For ease of comparison, we have picked only the bench-mark functions ( f to f ) used in [26]. The SNR of eachfunction matches the value one would obtain using thenoise standard deviation tabulated in [26]. Finally, thealgorithm settings were brought as close as possible by(a) setting the regulator gain λ = 0, (b) using the plainmap (c.f., Sec. IV C), (c) keeping the end knots ﬁxed,and (d) allowing knots to merge. Diﬀerences remain inthe PSO variant (and associated parameters) used and,possibly, the criterion used for merging knots.0 Number of iterations M ean f i t ne ss LP_100_0.1_X_FKM

FIG. 4: Performance of lbest

PSO as a function of the num-ber of iterations, N iter , to termination. Each curve corre-sponds to one of the benchmark functions at SNR = 100and shows the mean ﬁtness value as a function of N iter . Themean ﬁtness value is an average over N R = 1000 data real-izations of the ﬁtness value corresponding to the best model(i.e., F ( M best , r M best ) deﬁned in Fig. 2). The error bars repre-sent ± σ deviations where σ is the sample standard deviation.The other algorithm settings used for this plot can be readoﬀ from the key string shown in the legend using Table II. We ﬁnd that

SHAPES has excellent performance at highSNR values: without any change in settings, it can ﬁtbenchmark functions ranging from spatially inhomoge-nous but smooth to ones that have discontinuities. Forthe latter,

SHAPES allows knots to coalesce into repeatedknots in order to improve the ﬁt at the location of the dis-continuities. The sample estimates in Fig. 5 are visuallyindistinguishable from the ones given in [26]. The sameholds for the sample estimates given in [24], which usesbenchmark functions f to f and SNRs similar to [26]. B. Regulator gain

While the aim of restricting the number of knots in re-gression spline is to promote a smoother estimate, it is animplicit regularization that does not guarantee smooth-ness. In the absence of an explicit regularization, a ﬁt-ting method based on free knot placement will exploitthis loophole to form spurious clusters of knots that ﬁtoutliers arising from noise and overﬁt the data. This is-sue becomes increasingly important as the level of noisein the data increases.Fig. 6 illustrates how adding the penalized spline reg-ulator helps mitigate this problem of knot clustering.Shown in the ﬁgure is one data realization and the corre-sponding estimates obtained with high and low values ofthe regulator gain λ . For the latter, sharp spikes appearin the estimate where the function value is not high but the noise values are. The method tries to ﬁt out thesevalues by putting more knots in the model and cluster-ing them to form the spikes. Since knot clustering alsoneeds large B-spline coeﬃcients in order to build a spike,a larger penalty on the coeﬃcients suppresses spuriousspikes.Fig. 7 and Fig. 8 present a statistically more rigorousstudy of the eﬀect of λ by examining the RMSE attainedacross the whole set of benchmark functions at diﬀerentSNR values. In both ﬁgures, the RMSE is shown foridentical algorithm settings except for λ , and in both weobserve that increasing the regulator gain improves theRMSE. (The lone case where this is not true is addressedin more detail in Sec. VII D.) The improvement becomesmore pronounced as SNR is lowered. (There is no statis-tically signiﬁcant eﬀect of λ on the number of knots inthe best ﬁt model at either SNR.)The higher values of the regulator gains in Fig. 7 andFig. 8 – λ = 0 . λ = 5 . C. Results for

SNR = 100

We have already selected some of the algorithm set-tings in the preceding sections, namely, the number ofiterations to use and the regulator gain for a given SNR.Before proceeding further, we need to decide on the re-maining ones.For the SNR = 100 case, it is clear that the end knotsand end B-splines must be retained because benchmarkfunctions f to f do not all decay to zero and the noiselevel is not high enough to mask this behavior. Similarly,knot merging is an obvious choice because discontinuitiesin some of the benchmark functions are obvious at thisSNR and they cannot be modeled under the alternativeoption of healing. The remaining choice is between thetwo knot maps: plain or centered-monotonic.As shown in Fig. 9, the RMSE is distinctly worsenedby the centered-monotonic map across all the benchmarkfunctions. This map also leads to a higher number ofknots in the best ﬁt model although the diﬀerence is notas signiﬁcant statistically. Thus, the clear winner here isthe map in which knots are merged.With all the design choices ﬁxed, the performance of SHAPES can be examined. This is done in Fig. 10 andFig. 11 where the point-wise sample mean and ± σ devi-ation, σ being the sample standard deviation, are shownfor all the benchmark functions. Note that the level ofnoise now is much higher than the examples studied in1 f f f f f f FIG. 5: Sample estimated functions for the benchmark functions f to f . In each panel: the solid black curve is theestimated function; triangles show the locations of its knots (vertically stacked triangles denote repeated knots); the dashedblack curve is the true function; gray dots represent the data realization. (In most cases, the solid and dashed curves are visuallyindistinguishable.) The SNRs (rounded to integer values) of the functions in order from f to f are 1104, 747, 506, 241, 633,and 254, respectively. The algorithm settings were LP SNR 0 100 FKM (c.f., Table II). Note that under these settings, theend B-splines are retained, which requires end knots to have the maximum allowed multiplicity. But this is a ﬁxed multiplicityin each plot and not shown for clarity. Sec. VII A.It is evident from these ﬁgures that

SHAPES is able toresolve diﬀerent types of discontinuities as well as the lo-cations of features such as peaks and sharp changes. Ininterpreting the error envelope, it should be noted thatthe errors at diﬀerent points are strongly correlated, afact not reﬂected in the point-wise standard deviation.Thus, a typical single estimate is not an irregular curvebounded by the error envelopes, as would be the case forstatistically independent point-wise errors, but a smoothfunction. Nonetheless, the error envelopes serve to indi-cate the extent to which an estimate can deviate fromthe true function.

D. Results for

SNR = 10

Here, we examine the case of high noise level at SNR =10. Fig. 12 and Fig. 13 show the point-wise samplemean and ± σ deviation, σ being the sample standarddeviation, for all the benchmark functions. The algo-rithm settings used are the same as in Sec. VII C for theSNR = 100 case except for the regulator gain and thenumber of PSO iterations: λ = 5 . N iter = 50 re-spectively.Unlike the SNR = 100 case, the high noise level masksmany of the features of the functions. For example, thediscontinuities and the non-zero end values for f to f are washed out. Thus, the algorithm settings to use are2 FIG. 6: Eﬀect of regulator gain, λ , on estimation. The solidand dashed curves are the estimates obtained with λ = 0 . λ = 5 . λ is the regulator gain for thepenalty term in Eq. 9. The true curve – benchmark function f with SNR = 10 – is shown with a dotted line and the graydots show the data realization. The interior knots in the bestmodel for λ = 0 . λ = 5 . λ = 0 . λ , the algorithmsettings – LP 10 λ

50 FKM (see Table II) – were identical forthe two estimated curves. R M SE LP_100_0.1_100_FKMLP_100_0.0_100_FKM f f f f f f f f f f Benchmark function M ean nu m be r o f k no t s FIG. 7: Eﬀect of regulator gain on (top panel) the root meansquared error (RMSE), and (bottom panel) the mean num-ber of knots in the best model for SNR = 100 benchmarkfunctions. In both panels, the solid and dotted curves corre-spond to λ = 0 . λ = 0 . ± σ deviations, where σ is the estimatedstandard deviation. R M SE LP_10_5.0_50_FKMLP_10_0.1_50_FKM f f f f f f f f f f Benchmark function M ean nu m be r o f k no t s FIG. 8: Eﬀect of regulator gain on (top panel) the root meansquared error (RMSE), and (bottom panel) the mean numberof knots in the best model for SNR = 10 benchmark functions.In both panels, the solid and dotted curves correspond to λ =5 . λ = 0 . ± σ deviations, where σ is the estimated standarddeviation. R M SE LP_100_0.1_100_FKMLC_100_0.1_100_FKM f f f f f f f f f f Benchmark function M ean nu m be r o f k no t s FIG. 9: Eﬀect of the map used for transforming PSO searchspace coordinates to knots on (top panel) the root meansquared error (RMSE), and (bottom panel) the mean numberof knots in the best model for SNR = 100 benchmark func-tions. The solid and dotted curves correspond to the plainand centered-monotonic maps respectively. The other algo-rithm settings used for this plot can be read oﬀ from the keystrings shown in the legend using Table II. The data pointscorrespond to the benchmark functions shown on the abscissa.The error bars show ± σ deviations, where σ is the estimatedstandard deviation. -20246810 0246810121405101520 -505100.2 0.4 0.6 0.8 1246810 0.2 0.4 0.6 0.8 1-202468101214 f f f f f f FIG. 10: Mean estimated functions (black curve) for bench-mark functions f to f at SNR = 100. The true functionsare shown as dotted curves but they are practically indis-tinguishable from the mean estimated functions. The graycurves show ± σ deviation from the mean function, where σ is the estimated standard deviation. The gray dots showan arbitrary data realization for the purpose of visualizingthe noise level. The abscissa has the same range for eachpanel. The algorithm settings used are given by the key stringLP 100 0 . not at all as clear cut as before. In fact, the resultspresented next show that alternative settings can showsubstantial improvements in some cases.First, as shown in Fig. 14, the estimation of f ac-tually improves signiﬁcantly when the regulator gain isturned down to λ = 0 .

1. While this is the lone outlierin the general trend between regulator gain and RMSE(c.f., Fig. 8), it points to the importance of choosing theregulator gain adaptively rather than empirically as donein this paper.Next, Fig. 15 examines the eﬀect of the knot map andits interplay with ﬁxing the end knots or allowing them tovary. Allowing the end knots to vary under either knotmap leads to a worse RMSE but the number of knotsrequired in the best ﬁt model is reduced, signiﬁcantly sofor f to f . A plausible explanation for this is thatthe high noise level masks the behavior of the functionsat their end points, and freeing up the end knots allows SHAPES to ignore those regions and focus more on theones where the function value is higher relative to noise. f f f f FIG. 11: Mean estimated functions (black curve) for bench-mark functions f to f at SNR = 100. The true functionsare shown as dotted curves but they are practically indis-tinguishable from the mean estimated functions. The graycurves show ± σ deviation from the mean function, where σ is the estimated standard deviation. The gray dots showan arbitrary data realization for the purpose of visualizingthe noise level. The abscissa has the same range for eachpanel. The algorithm settings used are given by the key stringLP 100 0 . Under a given end knot condition, the centered-monotonic map always performs worse in Fig. 15 albeitthe diﬀerence is statistically signiﬁcant for only a smallsubset of the benchmark functions. Remarkably, this be-havior is reversed for some of the benchmark functionswhen additional changes are made to the design choices.Fig. 16 shows the RMSE when the centered-monotonicmap and variable end knots are coupled with the drop-ping of end B-splines and healing of knots. Now, the per-formance is better for functions f to f relative to thebest algorithm settings found from Fig. 15: not only isthere a statistically signiﬁcant improvement in the RMSEfor these functions but this is achieved with a substan-tially smaller number of knots. This improvement comesat the cost, however, of signiﬁcantly worsening the RMSEfor the remaining benchmark functions. E. Eﬀect of bias correction

Fig. 17 shows the eﬀect of using the bias correctionstep described in Sec. IV A on RMSE. We see that biascorrection produces a statistically signiﬁcant reductionin RMSE for some of the benchmark functions, namely f to f , and that the reduction is more at higher SNRfor f . For the remaining benchmark functions, turningbias correction on or oﬀ has no statistically signiﬁcanteﬀect on RMSE.4 -2-10123 -2-10123-2-101234 -3-2-101230.2 0.4 0.6 0.8 1-2-101234 0.2 0.4 0.6 0.8 1-2-101234 f f f f f f FIG. 12: Mean estimated functions (black curve) for bench-mark functions f to f at SNR = 10. The true functions areshown as dotted curves. The gray curves show ± σ deviationfrom the mean function, where σ is the estimated standarddeviation. The gray dots show an arbitrary data realizationfor the purpose of visualizing the noise level. The abscissa hasthe same range for each panel. The algorithm settings usedare given by the key string LP 10 5 50 FKM, which can beexpanded using Table II. F. Comparison with other methods

In this section, we compare the performance of

SHAPES with

WaveShrink [65] and smoothing spline [32, 33] asimplemented in R [34] (called

R:sm.spl in this paper).The former is taken from the Matlab-based package

WaveLab [66] and it performs smoothing by threshold-ing the wavelet coeﬃcients of the given data and ap-plying non-linear shrinkage to threshold-crossing coeﬃ-cients. For both of these methods, we use default val-ues of their parameters with the following exceptions:for

WaveShrink we used the “Hybrid” shrinkage method,while for

R:sm.spl we use GCV to determine the regula-tor gain. For reproducibility, we list the exact commandsused to call these methods: • WaveShrink(Y,‘Hybrid’,L) : Y is the data to besmoothed and L is the coarsest resolution level ofthe discrete wavelet transform of Y . • smooth.spline(X,Y,cv=FALSE) : X is the set ofpredictor values, Y is the data to be smoothed, and cv=FALSE directs the code to use GCV for regulator -3-2-101234 -3-2-1012340.2 0.4 0.6 0.8 1-3-2-10123 0.2 0.4 0.6 0.8 1-3-2-10123 f f f f FIG. 13: Mean estimated functions (black curve) for bench-mark functions f to f at SNR = 10. The true functions areshown as dotted curves. The gray curves show ± σ deviationfrom the mean function, where σ is the estimated standarddeviation. The gray dots show an arbitrary data realizationfor the purpose of visualizing the noise level. The abscissa hasthe same range for each panel. The algorithm settings usedare given by the key string LP 10 5 50 FKM, which can beexpanded using Table II. SHAPES WaveShrink R:sm.spl f .

62 7 .

96 (4) 4 . f .

86 7 .

47 (4) 7 . f .

01 5 .

82 (3) 5 . f .

07 8 .

11 (4) 5 . f .

92 6 .

21 (3) 4 . f .

36 6 .

93 (3) 7 . TABLE III: RMSE values for

SHAPES , WaveShrink , and

R:sm.spl obtained with the same dataset as used for Fig. 10.The benchmark functions used are f to f at SNR = 100.In the case of WaveShrink , the RMSE is the lowest attainedover diﬀerent values of the parameter L . The best value of L is shown parenthetically. gain determination.Each method above was applied to the same datasetas used for producing Fig. 10. Statistical summaries,namely, the mean estimated function, the ± σ deviationfrom the mean, and RMSE were obtained following thesame procedure as described for SHAPES . A crucial detail:when applying

WaveShrink , we use L ∈ { , , . . . , } andpick the one that gives the lowest RMSE.The results of the comparison are shown in Table III,Fig. 18, and Fig. 19. Table III shows the RMSE valuesattained by the methods for the benchmark functions f to f , all normalized to have SNR = 100. Figure 18 showsmore details for the benchmark functions that producethe best and worst RMSE values for WaveShrink . Sim-ilarly, Fig. 19 corresponds to the best and worst bench-mark functions for

R:sm.spl .5 LP_10_0.1_50_FKM

FIG. 14: The mean estimated function (black) for benchmarkfunction f at SNR = 10 with regulator gain λ = 0 .

1. (Thedotted curve shows the true function.) The gray curves show ± σ deviation from the mean function, where σ is the esti-mated standard deviation. The gray dots show an arbitrarydata realization for the purpose of visualizing the noise level.The other algorithm settings can be read oﬀ from the keyshown in the plot legend using Table II. While noted in the caption of Fig. 18, it is worthre-emphasizing here that the error envelopes of

SHAPES shown in Fig. 18 and 19 are computed relative to themean estimated function of the method being compared R M SE LP_10_5.0_50_FKMLC_10_5.0_50_FKMLP_10_5.0_50_VKMLC_10_5.0_50_VKM f f f f f f f f f f Benchmark function M ean nu m be r o f k no t s FIG. 15: Comparison of plain and centered-monotonic mapsunder ﬁxed ( F ) and variable ( V ) end knot conditions atSNR = 10. The top and bottom panels respectively showRMSE and mean number of knots in the best model. Theother algorithm settings used for this plot can be read oﬀfrom the key strings shown in the legend using Table II. Thedata points correspond to the benchmark functions shown onthe abscissa. The error bars show ± σ deviations, where σ isthe estimated standard deviation. R M SE LP_10_5.0_50_FKMLC_10_5.0_50_VDH f f f f f f f f f f Benchmark function M ean nu m be r o f k no t s FIG. 16: Comparison of plain and centered-monotonic mapsunder the FKM and VDH algorithm settings respectively atSNR = 10. See Table II for the meaning of these and otheralgorithm settings given by the key strings in the legend. Thetop and bottom panels respectively show RMSE and meannumber of knots in the best model. The data points cor-respond to the benchmark functions shown on the abscissa.The error bars show ± σ deviations, where σ is the estimatedstandard deviation. R M SE SNR = 10Bias correction onBias correction off f f f f f f f f f f Benchmark function R M SE SNR = 100Bias correction onBias correction off

FIG. 17: The eﬀect of bias correction on root meansquared error (RMSE) for SNR = 10 (top) and SNR = 100(bottom) benchmark functions. Solid and dotted curvesin each panel correspond to bias correction switched onand oﬀ, respectively. The algorithm settings used wereLP SNR λ N iter

FKM in all the cases with (top) λ = 5 . N iter = 50, and (bottom) λ = 0 . N iter = 100. These arethe ﬁducial settings used for SNR = 10 and SNR = 100 inSec. VII D and Sec. VII C, respectively. The data points cor-respond to the benchmark functions shown on the abscissa.The error bars show ± σ deviations, where σ is the estimatedstandard deviation f :std.f :biasf :std.f :bias FIG. 18: Comparison of

SHAPES with alternate methods,

WaveShrink and

R:sm.spl , for benchmark functions f and f atSNR = 100. The left column of ﬁgure panels is associated with WaveShrink and the right with

R:sm.spl . In each column,(i) the benchmark function used is indicated in each panel, (ii) black curves correspond to

SHAPES , and (iii) gray curves tothe alternate algorithm. For each benchmark function, there are two panels: (i) The one labeled “std” shows the ± σ errorenvelopes relative to the estimated mean signal from the alternate method; (ii) the one labeled “bias” shows the diﬀerencebetween the true function and the estimated mean signal from each method. The abscissa has the same range for each panel.The ordinate values are identical across the panels in a given row. The dataset used and the algorithm settings for SHAPES arethe same as in Fig. 10. f :std.f :biasf :std.f :bias FIG. 19: Same as Fig. 18 except for the change in benchmark functions to f and f . SHAPES itself. This modiﬁcation elimi-nates visual confusion caused by the diﬀerent biases (i.e.,mean estimated functions) of the methods. However,the bias curves shown separately in these ﬁgures do usethe respective mean estimated function for each method.The actual mean estimated functions and correspondingerror envelopes of

SHAPES can be seen in Fig. 10.From Table III, we see that

SHAPES has the lowestRMSE in all cases. Fig. 18 and Fig. 19 show that thisarises from

SHAPES generally having both a lower estima-tion variance (where its error envelope nests within thatof the other methods) as well as a lower bias. Typically,

R:sm.spl has a lower variance than

SHAPES around sta-tionary points of the true function but the diﬀerence ismarginal. In some cases, such as f and f , the bias inthe SHAPES estimate is signiﬁcantly lower than that ofeither

WaveShrink or R:sm.spl . In general,

WaveShrink estimates are less smooth than those from either

SHAPES or R:sm.spl . This is manifested, for example, in therougher behavior of the mean estimated function from

WaveShrink . Both

WaveShrink and

R:sm.spl have amuch poorer resolution of the jump discontinuity in f compared to SHAPES (c.f., Fig. 10).

VIII. CONCLUSIONS

Our results show that the challenge of free knot place-ment in adaptive spline ﬁtting is solvable. The most im-portant element of the solution is the use of an eﬀectivemetaheuristic for knot optimization. We have shown thatlbest PSO is eﬀective in this task. Considering the f benchmark function for example, the best model foundby SHAPES reaches the vicinity of the highest number(= 18) of non-repeating knots considered in this paper.The good quality of the ﬁt obtained for f shows thatPSO was able to handle this high-dimensional optimiza-tion well.Relative to the SNRs used commonly in the literatureon adaptive spline ﬁtting, the values of SNR used in thispaper, namely SNR = 100 and SNR = 10, can be rankedrespectively as being moderate to low. For the former,discontinuities in function values or derivatives were welllocalized by SHAPES in all the cases. At the same time,the smooth parts of the benchmark functions were alsowell estimated. The estimates from

SHAPES for low SNR(= 10) had, naturally, more error. In particular, thenoise level in all the data realizations was high enoughto completely mask the presence of discontinuities and,thus, they were not well localized. Nonetheless, even witha conservative error envelope of ± σ around the meanestimated signal, the overall shape of the true functionis visually clear in all the examples. This shows thatthe estimated functions are responding at a statisticallysigniﬁcant level to the presence of the true function inthe data. Note that, being a non-parametric method, SHAPES can handle functions with qualitatively disparatebehaviors – from a simple change between two levels to oscillatory – without requiring any special tuning.Further characterization of the performance of

SHAPES for the low SNR case requires incorporating a hypothe-sis test, a topic that we are currently investigating. Forthe case of transient signals represented by f to f , theﬁtness value returned by SHAPES may itself serve as apowerful detection statistic since the estimated functionsdepart quite signiﬁcantly from the expected behavior un-der the null hypothesis. (Hypotheses testing for f wasexplored in this manner in [30] with a method that didnot include bias reduction.) However, this expectationmay need to be modiﬁed for the other benchmark func-tions, in particular for f . Another issue that comes upin a hypothesis test is that of bias correction: it must beused only if the detection threshold of the test is crossed,otherwise estimates under both the null and alternativehypotheses will get ampliﬁed. This creates an interestinginterplay between hypothesis testing and estimation thatneeds a careful scrutiny.The dependence of design choices on SNR, as eluci-dated in this paper, does not seem to have been fullyappreciated in the literature on adaptive spline ﬁtting,probably because the typical scenario considered is thatof high SNR. While performance of SHAPES for SNR =100 is found to be fairly robust to the design choicesmade, they have a non-negligible aﬀect at SNR = 10.The nature of the true function also inﬂuences the ap-propriate algorithm settings for the latter case. Fortu-nately, the settings were found to depend on only somecoarse features of a function, such as its behavior at databoundaries ( f to f ), whether it is transient ( f to f ),or whether it is oscillatory ( f ). Such features are of-ten well-known in a real-world application domain: itis unusual to deal with signals that have discontinuitiesas well as signals that are smooth and transient in thesame application. Hence, in most such cases, it shouldbe straightforward to pick the best settings for SHAPES .The inclusion of a penalized spline regulator was criti-cal in

SHAPES for mitigating the problem of knot cluster-ing. For all except one ( f ) benchmark functions consid-ered here, the regulator gain was determined empiricallyby simply examining a few realizations at each SNR withdiﬀerent values of the regulator gain λ . Ideally, however, λ should be determined adaptively from given data usinga method such as GCV. The case of f at SNR = 10 pro-vides a particularly good test bed in this regard: while λ = 5 . f improved signiﬁcantlywhen the gain was lowered to λ = 0 .

1. Thus, any methodfor determining λ adaptively must be able to handle thisextreme variation. We leave the additional reﬁnementof using an adaptive regulator gain in SHAPES to futurework.The extension of

SHAPES to multi-dimensional splinesand longer data lengths is the next logical step in itsdevelopment. It is likely that extending

SHAPES to thesehigher complexity problems will require diﬀerent PSOvariants than the one used here.8The codes used in this paper for the FKM option (c.f.,Sec. VII C) are available in a

GitHub repository at theURL: https://github.com/mohanty-sd/SHAPES.git

IX. ACKNOWLEDGEMENTS [1] C. de Boor, A Practical Guide to Splines (Applied Math-ematical Sciences), Springer, 2001.[2] E. J. Wegman, I. W. Wright, Splines in statistics, Journalof the American Statistical Association 78 (382) (1983)351–365.[3] G. Wahba, Spline models for observational data, Vol. 59,Siam, 1990.[4] W. H¨ardle, Applied nonparametric regression, no. 19,Cambridge university press, 1990.[5] S. Wold, Spline functions in data analysis, Technometrics16 (1) (1974) 1–11.[6] H. G. Burchard, Splines (with optimal knots) are better,Applicable Analysis 3 (4) (1974) 309–319.[7] D. L. Jupp, Approximation to data by splines with freeknots, SIAM Journal on Numerical Analysis 15 (2) (1978)328–343.[8] Z. Luo, G. Wahba, Hybrid adaptive splines, Journal ofthe American Statistical Association 92 (437) (1997) 107–116.[9] P. L. Smith, Curve ﬁtting and modeling with splines us-ing statistical variable selection techniques, Tech. rep.,NASA, Langley Research Center, Hampton, VA, reportNASA,166034 (1982).[10] T. Lyche, K. Mørken, A data-reduction strategy forsplines with applications to the approximation of func-tions and data, IMA Journal of Numerical analysis 8 (2)(1988) 185–208.[11] J. H. Friedman, B. W. Silverman, Flexible parsimonioussmoothing and additive modeling, Technometrics 31 (1)(1989) 3–21.[12] J. H. Friedman, Multivariate adaptive regression splines,The Annals of Statistics 19 (1) (1991) 1–67.[13] C. J. Stone, M. H. Hansen, C. Kooperberg, Y. K. Truong,et al., Polynomial splines and their tensor products inextended linear modeling: 1994 wald memorial lecture,The Annals of Statistics 25 (4) (1997) 1371–1470.[14] G. H. Golub, M. Heath, G. Wahba, Generalized cross-validation as a method for choosing a good ridge param-eter, Technometrics 21 (2) (1979) 215–223.[15] S. Zhou, X. Shen, Spatially adaptive regression splinesand accurate knot selection schemes, Journal of theAmerican Statistical Association 96 (453) (2001) 247–259.[16] H. Park, J.-H. Lee, B-spline curve ﬁtting based on adap-tive curve reﬁnement using dominant points, Computer-Aided Design 39 (6) (2007) 439–451.[17] H. Kang, F. Chen, Y. Li, J. Deng, Z. Yang, Knot calcula-tion for spline ﬁtting via sparse optimization, Computer- Aided Design 58 (2015) 179–188.[18] J. Luo, H. Kang, Z. Yang, Knot calculation for spline ﬁt-ting based on the unimodality property, Computer AidedGeometric Design 73 (2019) 54–69.[19] M. Mitchell, An Introduction to Genetic Algorithms byMelanie Mitchell, Bradford, 1998.[20] E. ¨Ulker, A. Arslan, Automatic knot adjustment usingan artiﬁcial immune system for b-spline curve approxi-mation, Information Sciences 179 (10) (2009) 1483–1494.[21] P. J. Green, Reversible jump markov chain montecarlo computation and bayesian model determination,Biometrika 82 (4) (1995) 711–732.[22] J. Pittman, Adaptive splines and genetic algorithms,Journal of Computational and Graphical Statistics 11 (3)(2002) 615–638.[23] I. DiMatteo, C. R. Genovese, R. E. Kass, Bayesian curve-ﬁtting with free-knot splines, Biometrika 88 (4) (2001)1055–1071.[24] F. Yoshimoto, T. Harada, Y. Yoshimoto, Data ﬁt-ting with a spline using a real-coded genetic algorithm,Computer-Aided Design 35 (8) (2003) 751 – 760.[25] S. Miyata, X. Shen, Adaptive free-knot splines, Journalof Computational and Graphical Statistics 12 (1) (2003)197–213.[26] A. G´alvez, A. Iglesias, Eﬃcient particle swarm optimiza-tion approach for data ﬁtting with free knot b-splines,Computer-Aided Design 43 (12) (2011) 1683–1692.[27] S. D. Mohanty, Particle swarm optimization and regres-sion analysis I, Astronomical Review 7 (2) (2012) 29–35.[28] J. Kennedy, R. C. Eberhart, Particle swarm optimiza-tion, in: Proceedings of the IEEE International Confer-ence on Neural Networks: Perth, WA, Australia, Vol. 4,IEEE, 1995, p. 1942.[29] S. D. Mohanty, Spline based search method for unmod-eled transient gravitational wave chirps, Physical ReviewD 96 (2017) 102008. doi:10.1103/PhysRevD.96.102008 .[30] S. D. Mohanty, Swarm Intelligence Methods for Statisti-cal Regression, Chapman and Hall/CRC, 2018.[31] D. Ruppert, M. P. Wand, R. J. Carroll, Semiparametricregression, Vol. 12, Cambridge University Press, 2003.[32] C. H. Reinsch, Smoothing by spline functions, Nu-merische mathematik 10 (3) (1967) 177–183.[33] P. Craven, G. Wahba, Smoothing noisy data with splinefunctions, Numerische mathematik 31 (4) (1978) 377–403.[34] R Core Team, R: A Language and Environment for Sta-tistical Computing, R Foundation for Statistical Com-puting, Vienna, Austria (2019). URL [35] G. Wahba,

In discussion

Wavelet shrinkage: Asymp-topia? with discussion and a reply by Donoho, D. L.,Johnstone, I. M., Kerkyacharian, G. and Picard, D.,Journal of the Royal Statistical Society Series B 57 (2002)545–564.[36] C. B. Storlie, H. D. Bondell, B. J. Reich, A locallyadaptive penalty for estimation of functions with vary-ing roughness, Journal of Computational and GraphicalStatistics 19 (3) (2010) 569–589.[37] Z. Liu, W. Guo, Data driven adaptive spline smoothing,Statistica Sinica (2010) 1143–1163.[38] X. Wang, P. Du, J. Shen, Smoothing splines with varyingsmoothing parameter, Biometrika 100 (4) (2013) 955–970.[39] H. B. Curry, I. J. Schoenberg, On spline distributions andtheir limits-the polya distribution functions, Bulletin ofthe American Mathematical Society 53 (11) (1947) 1114–1114.[40] M. P. Wand, A comparison of regression spline smoothingprocedures, Computational Statistics 15 (4) (2000) 443–462.[41] G. Claeskens, T. Krivobokova, J. D. Opsomer,Asymptotic properties of penalized spline estimators,Biometrika 96 (3) (2009) 529–544.[42] P. H. Eilers, B. D. Marx, Flexible smoothing with b-splines and penalties, Statistical science (1996) 89–102.[43] P. H. Eilers, B. D. Marx, M. Durb´an, Twenty years ofp-splines, SORT: statistics and operations research trans-actions 39 (2) (2015) 0149–186.[44] D. Ruppert, R. J. Carroll, Theory & methods: Spatially-adaptive penalties for spline ﬁtting, Australian & NewZealand Journal of Statistics 42 (2) (2000) 205–223.[45] T. Krivobokova, C. M. Crainiceanu, G. Kauermann, Fastadaptive penalized splines, Journal of Computational andGraphical Statistics 17 (1) (2008) 1–20.[46] F. Scheipl, T. Kneib, Locally adaptive bayesian p-splineswith a normal-exponential-gamma prior, ComputationalStatistics & Data Analysis 53 (10) (2009) 3533–3552.[47] L. Yang, Y. Hong, Adaptive penalized splines for datasmoothing, Computational Statistics & Data Analysis108 (2017) 70–83.[48] T. Krivobokova, Smoothing parameter selection in twoframeworks for penalized splines, Journal of the RoyalStatistical Society: Series B (Statistical Methodology)75 (4) (2013) 725–741.[49] D. Ruppert, Selecting the number of knots for penalizedsplines, Journal of computational and graphical statistics11 (4) (2002) 735–757.[50] C. de Boor, On calculating with b-splines, Journal ofApproximation Theory 6 (1) (1972) 50 – 62.[51] J. Ramsay, B. W. Silverman, Functional data analysis, Springer series in statistics, Springer-Verlag New York,1997.[52] M. J. Lindstrom, Penalized estimation of free-knotsplines, Journal of Computational and Graphical Statis-tics 8 (2) (1999) 333–352.[53] V. Goepp, O. Bouaziz, G. Nuel, Spline regres-sion with automatic knot selection, arXiv preprintarXiv:1808.01770.[54] H. Akaike, Information theory and an extension of themaximum likelihood principle, in: Selected Papers of Hi-rotugu Akaike, Springer, 1998, pp. 199–213.[55] A. P. Engelbrecht, Fundamentals of computationalswarm intelligence, Vol. 1, Wiley Chichester, 2005.[56] J. Kennedy, Small worlds and mega-minds: eﬀects ofneighborhood topology on particle swarm performance,in: Proceedings of the 1999 Congress on Evolution-ary Computation-CEC99 (Cat. No. 99TH8406), Vol. 3,IEEE, 1999, pp. 1931–1938.[57] M. E. Normandin, S. D. Mohanty, T. S. Weerathunga,Particle swarm optimization based search for gravita-tional waves from compact binary coalescences: Perfor-mance improvements, Physical Review D 98 (4) (2018)044029.[58] F. J. Solis, R. J.-B. Wets, Minimization by random searchtechniques, Mathematics of Operations Research 6 (1)(1981) 19–30.[59] D. Bratton, J. Kennedy, Deﬁning a standard for particleswarm optimization, in: Swarm Intelligence Symposium,2007. SIS 2007. IEEE, IEEE, 2007, pp. 120–127.[60] C. Leung, Estimation of unmodeled gravitational wavetransients with spline regression and particle swarmoptimization, SIAM Undergraduate Research Online(SIURO) 8.[61] D. Denison, B. Mallick, A. Smith, Automatic bayesiancurve ﬁtting, Journal of the Royal Statistical Society:Series B (Statistical Methodology) 60 (2) (1998) 333–350.[62] M. Li, Y. Yan, Bayesian adaptive penalized splines, Jour-nal of Academy of Business and Economics 2 (2006) 129–141.[63] T. Lee, On algorithms for ordinary least squares regres-sion spline ﬁtting: a comparative study, Journal of statis-tical computation and simulation 72 (8) (2002) 647–663.[64] Matlab Release2018b, The MathWorks,Inc., Natick, Massachusetts, United States.; .[65] D. L. Donoho, I. M. Johnstone, Adapting to unknownsmoothness via wavelet shrinkage, Journal of the ameri-can statistical association 90 (432) (1995) 1200–1224.[66] X. Huo, M. Duncan, O. Levi, J. Buckheit, S. Chen,D. Donoho, I. Johnstone, About wavelab.URL.[65] D. L. Donoho, I. M. Johnstone, Adapting to unknownsmoothness via wavelet shrinkage, Journal of the ameri-can statistical association 90 (432) (1995) 1200–1224.[66] X. Huo, M. Duncan, O. Levi, J. Buckheit, S. Chen,D. Donoho, I. Johnstone, About wavelab.URL