[PDF] A doubly relaxed minimal-norm Gauss-Newton method for underdetermined nonlinear least-squares problems

Abstract

When a physical system is modeled by a nonlinear function, the unknown parameters can be estimated by fitting experimental observations by a least-squares approach. Newton's method and its variants are often used to solve problems of this type. In this paper, we are concerned with the computation of the minimal-norm solution of an underdetermined nonlinear least-squares problem. We present a Gauss-Newton type method, which relies on two relaxation parameters to ensure convergence, and which incorporates a procedure to dynamically estimate the two parameters, as well as the rank of the Jacobian matrix, along the iterations. Numerical results are presented.

Full PDF

aa r X i v : . [ m a t h . NA ] J a n A DOUBLY RELAXED MINIMAL-NORM GAUSS–NEWTONMETHOD FOR NONLINEAR LEAST-SQUARES

FEDERICA PES ∗ AND

GIUSEPPE RODRIGUEZ ∗ Abstract.

When a physical system is modeled by a nonlinear function, the unknown parameterscan be estimated by ﬁtting experimental observations by a least-squares approach. Newton’s methodand its variants are often used to solve problems of this type. In this paper, we are concernedwith the computation of the minimal-norm solution to an underdetermined nonlinear least-squaresproblem. We present a Gauss–Newton type method, which relies on two relaxation parameters toensure convergence, and which incorporates a procedure to dynamically estimate the two parametersduring iteration, as well as the rank of the Jacobian matrix. Numerical results are presented.

Key words. nonlinear least-squares problem, minimal-norm solution, Gauss–Newton method,parameter estimation

AMS subject classiﬁcations.

1. Introduction.

Let us assume that F ( x ) = [ F ( x ) , . . . , F m ( x )] T is a nonlineartwice continuously Frech´et-diﬀerentiable function with values in R m , for any x ∈ R n .For a given b ∈ R m , we consider the nonlinear least-squares data ﬁtting problemmin x ∈ R n k r ( x ) k , r ( x ) = F ( x ) − b , (1.1)where k · k denotes the Euclidean norm and r ( x ) = [ r ( x ) , . . . , r m ( x )] T is the residualvector function between the model expectation F ( x ) and the vector b of measureddata. The solution to the nonlinear least-squares problem gives the best model ﬁt tothe data in the sense of the minimum sum of squared errors. A common choice forsolving a nonlinear least-squares problem consists of applying Newton’s method andits variants, such as the Gauss–Newton method [2, 12, 13].The Gauss–Newton method is based on the construction of a sequence of linearapproximations to r ( x ). Chosen an initial point x (0) and denoted by x ( k ) the currentapproximation, then the new approximation is x ( k +1) = x ( k ) + s ( k ) , k = 0 , , , . . . , (1.2)where the step s ( k ) is computed as a solution to the linear least-squares problemmin s ∈ R n k J ( x ( k ) ) s + r ( x ( k ) ) k . (1.3)Here J ( x ) represents the Jacobian matrix of the function F ( x ).The solution to (1.3) may not be unique: this situation happens when the matrix J ( x ( k ) ) does not have full column rank, in particular, when m < n . It is importantto keep in mind that during the iteration the rank of J ( x ( k ) ) may vary. To makethe solution unique, the new iterate x ( k +1) is often obtained by solving the followingminimal-norm linear least-squares problem  min s ∈ R n k s k s. t. min s ∈ R n k J ( x ( k ) ) s + r ( x ( k ) ) k . (1.4) ∗ Department of Mathematics and Computer Science, via Ospedale 72, 09124 Cagliari, Italy, [email protected], [email protected] n order to select solutions exhibiting diﬀerent degrees of regularity, the term k s k in (1.4) is sometimes substituted by k L s k , where L ∈ R p × n ( p ≤ n ) is a matrix whichincorporates available a priori information on the solution. Typically, L is a diagonalweighting matrix or a p × n discrete approximation of a derivative operator.The case p > n can be easily reduced to the previous assumption by performing acompact L = QR factorization, and substituting L by the triangular matrix R . Whena regularization matrix is introduced, problem (1.4) becomes  min s ∈ R n k L s k s. t. min s ∈ R n k J ( x ( k ) ) s + r ( x ( k ) ) k . (1.5)Both (1.4) and (1.5) impose some kind of regularity on the update vector s forthe solution x ( k ) and not on the solution itself. The problem of imposing a regularityconstraint directly on the solution x of problem (1.1) is studied in [6, 7, 8, 14]. In [14],this approach is expanded to the minimization of a suitable seminorm.Unfortunately, the algorithms developed in the above papers occasionally lack toconverge. Indeed, to ensure the computation of the minimal-norm solution, at the k thiteration the Gauss–Newton approximation is projected orthogonally to the null spaceof the Jacobian J ( x ( k ) ). This projection sometimes causes the residual to increaseand the method to diverge.This problem is dealt with in [3] by considering a convex combination of theGauss–Newton approximation and its orthogonal projection. In this paper, we im-prove the convergence of the method presented in [3] by introducing a second relax-ation parameter in the MNGN algorithm described in [14] as well as a procedure toautomatically tune it. Furthermore, we adopt a technique to estimate the rank of thematrix J ( x ( k ) ), and consider a model proﬁle x for the solution.The paper is structured as follows. In Section 2, we reformulate Theorem 3.1from [14] by introducing a model proﬁle for the solution and a parameter that repre-sents the step length of the iterative method. Then, we discuss why the convergenceof the method may not be ensured. In Section 3, we describe an algorithm which in-troduces a second parameter to control the size of the correction vector that providesthe minimal-norm solution, and which estimates automatically such parameter. Sincethe rank of the matrix J ( x ( k ) ) can vary at each iteration, Section 4 explains how itcan be estimated. In Section 5 we extend the computation of the minimal-norm solu-tion to the minimal- L -norm solution, where L is a regularization matrix. Numericalexamples can be found in Section 6.

2. Nonlinear minimal-norm solution.

We begin by recalling the deﬁnitionof the singular value decomposition (SVD) of a matrix J [10], which will be neededlater. The SVD is a matrix decomposition of the form J = U Σ V T , where U = [ u , . . . , u m ] ∈ R m × m and V = [ v , . . . , v n ] ∈ R n × n are matrices withorthonormal columns. The non-zero elements of the diagonal matrix Σ ∈ R m × n arethe singular values σ ≥ σ ≥ · · · ≥ σ r >

0, with r = rank( J ) ≤ min( m, n ).Let us now discuss the computation of the minimal-norm solution to the nonlinearproblem (1.1)  min x ∈ R n k x − x k s. t. min x ∈ R n k F ( x ) − b k , (2.1) here x ∈ R n is an a priori estimate of the desired solution. We consider an iterativemethod of the type (1.2) based on the following ﬁrst-order linearization of the problem  min s ∈ R n k x ( k ) − x + α s k s. t. min s ∈ R n k J k s + r k k , (2.2)where J k = J ( x ( k ) ) is the Jacobian of F in x ( k ) and r k = r ( x ( k ) ) is the residual vector.Formulations (2.1) and (2.2) are diﬀerent from the approach discussed in [14], as weintroduce a model proﬁle x and we include the step length α ∈ R in the analysis. Thedamping parameter α is indispensable to ensure convergence of the iterative method.We estimate it by the Armijo–Goldstein principle [1, 9], but it can be chosen by anyprocedure which guarantees a reduction in the norm of the residual.We denote (2.2) as the minimal-norm Gauss–Newton (MNGN) method. Thecorresponding iteration is deﬁned by the following theorem. Theorem 2.1.

Let x ( k ) ∈ R n and let e x ( k +1) = x ( k ) + α e s ( k ) be the Gauss–Newton iteration for (1.1) , where the step e s ( k ) is determined by solving (1.4) andthe step length α by any strategy which ensures convergence. Then, the iteration x ( k +1) = x ( k ) + α s ( k ) deﬁned by (2.2) is given by x ( k +1) = e x ( k +1) − V V T (cid:0) x ( k ) − x (cid:1) , (2.3) where rank( J k ) = r and the columns of the matrix V = [ v r +1 , . . . , v n ] are orthonor-mal vectors in R n spanning the null space of J k .Proof . The proof follows the pattern of that of Theorem 3.1 in [14]. Let U Σ V T be the singular value decomposition of the matrix J k . The norm of the diﬀerencebetween x ( k +1) and the model proﬁle can be expressed as k x ( k +1) − x k = k V T ( x ( k ) − x + α s ) k = k α y + z ( k ) k , with y = V T s and z ( k ) = V T (cid:0) x ( k ) − x (cid:1) . Replacing J k by its SVD and setting b ( k ) = U T r k , we can rewrite (2.2) as the following diagonal constrained least-squaresproblem  min y ∈ R n k α y + z ( k ) k s. t. min y ∈ R n k Σ y + b ( k ) k . Solving the second minimization problem uniquely determines the components y i = − σ − i b ( k ) i , i = 1 , . . . , r , while the entries y i , i = r + 1 , . . . , n , are left undetermined.Their value can be found by solving the ﬁrst problem. From k α y + z ( k ) k = r X i =1 − α b ( k ) i σ i + z ( k ) i ! + n X i = r +1 (cid:16) αy i + z ( k ) i (cid:17) , we obtain y i = − z ( k ) i α = − α v Ti ( x ( k ) − x ), i = r + 1 , . . . , n . Then, the solution to (2.2),that is, the next approximation to the solution of (2.1), is x ( k +1) = x ( k ) + αV y = x ( k ) − α r X i =1 b ( k ) i σ i v i − n X i = r +1 ( v Ti ( x ( k ) − x )) v i , here the last summation can be written in matrix form as V V T (cid:0) x ( k ) − x (cid:1) , and thecolumns of V = [ v r +1 , . . . , v n ] are a basis for N ( J k ).It is immediate (see [14, Theorem 3.1]) to prove that e x ( k +1) = x ( k ) + α e s ( k ) = x ( k ) − α r X i =1 b ( k ) i σ i v i , from which (2.3) follows.Summarizing, the MNGN method consists of the iteration x ( k +1) = x ( k ) + α k s ( k ) , where the step is s ( k ) = e s ( k ) − α k t ( k ) , with e s ( k ) = − r X i =1 b ( k ) i σ i v i , t ( k ) = V V T (cid:0) x ( k ) − x (cid:1) . (2.4)Since P N ( J k ) = V V T is the orthogonal projector onto N ( J k ), the above theoremshows that the ( k + 1)th iterate of the MNGN method is orthogonal to the null spaceof J k .We determine the step length α k by the Armijo condition [1, 5] f ( x ( k ) + α k e s ( k ) ) ≤ f ( x ( k ) ) + µα k ∇ f ( x ( k ) ) T e s ( k ) , where µ is a constant in (0 , f ( x ) = k r ( x ) k and ∇ f ( x ) = J ( x ) T r ( x ), it reads k r ( x ( k ) + α k e s ( k ) ) k ≤ k r k k + 2 µα k r Tk J k e s ( k ) . Note that r Tk J k e s ( k ) = −k J k e s ( k ) k . The Armijo–Goldstein principle [2, 9] sets µ = and determines the scalar α k as the largest number in the sequence 2 − i , i = 0 , , . . . , for which it holds k r k k − k r ( x ( k ) + α k e s ( k ) ) k ≥ α k k J k e s ( k ) k . (2.5)Theorem 2.1 shows that the correction vector t ( k ) , which allows to compute thenormal solution at each step, is independent of the damping parameter α . The resultis that in some numerical examples the method fails to converge, because projectingthe solution orthogonally to the null space of J k causes the residual to increase. Tounderstand how this can happen, a second-order analysis of the objective function isrequired.The second-order Taylor approximation to the function f ( x ) = k r ( x ) k at x ( k +1) = x ( k ) + α s is f ( x ( k +1) ) ≃ f ( x ( k ) ) + α ∇ f ( x ( k ) ) T s + 12 α s T ∇ f ( x ( k ) ) s . (2.6) he gradient and the Hessian of f ( x ), written in matrix form, are given by ∇ f ( x ) = J ( x ) T r ( x ) , ∇ f ( x ) = J ( x ) T J ( x ) + Q ( x ) , where Q ( x ) = m X i =1 r i ( x ) ∇ r i ( x ) , and ∇ r i ( x ) is the Hessian matrix of r i ( x ). By replacing the expression of f and α s = α e s − t in (2.6), where e s is the Gauss–Newton step and t is in the null space of J k , the following approximation is obtained12 k r k +1 k ≃ k r k k + α r Tk J k s + 12 α s T (cid:0) J Tk J k + Q k (cid:1) s = 12 k r k k + α r Tk J k e s + 12 α e s T (cid:0) J Tk J k + Q k (cid:1) e s − α t T Q k e s + 12 t T Q k t . The ﬁrst two terms containing second derivatives (the matrix Q k ) are damped bythe α parameter. If the function F is mildly nonlinear, the third term, depending uponthe vector t , is negligible. In the presence of a strong nonlinearity, its contribution tothe residual is sensible and may lead to its growth. This shows that it is indispensableto control the step length for the correction vector t .

3. Choosing the projection step length.

The occasional nonconvergence inthe computation of the minimal-norm solution to a nonlinear least-squares problemwas discussed in [3], where the authors propose an iterative method based on a convexcombination of the Gauss–Newton and the minimal-norm Gauss–Newton iterates.Following our notation, it can be expressed in the form x ( k +1) = (1 − γ k ) h x ( k ) + e s ( k ) i + γ k h x ( k ) + e s ( k ) − V V T x ( k ) i , (3.1)where the parameters γ k ∈ [0 , k = 0 , , . . . , form a sequence converging to zero.The standard Gauss–Newton method is obtained by setting γ k = 0, while γ k = 1leads to the minimal-norm Gauss–Newton method. In their numerical examples, theauthors adopt the sequences γ k = (0 . k +1 and γ k = (0 . k .It is immediate to rewrite (3.1) in the form x ( k +1) = x ( k ) + e s ( k ) − γ k V V T x ( k ) , showing that the method proposed in [3] is equivalent to the application of theundamped Gauss–Newton method, whose convergence is not theoretically guaran-teed [2], with a damped correction to favour the decrease of the norm of the solution.The numerical experiments reported in the paper show that the minimization of theresidual is sped up if γ k quickly converges to zero, while the norm of the solutiondecreases faster if γ k has a slower decay. The choice of the sequence of parameters ap-pears to be critical to tune the performance of the algorithm, and no adaptive choicefor γ k is proposed.In this paper, we propose to introduce a second relaxation parameter, β k , tocontrol the step length of the minimal-norm correction t ( k ) deﬁned in (2.4). The newiterative method is denoted by MNGN2 and it takes the form x ( k +1) = x ( k ) + α k e s ( k ) − β k t ( k ) , (3.2) here e s ( k ) is the step vector produced by the Gauss–Newton method and t ( k ) is theprojection vector which makes the norm of x ( k +1) minimal, without changing thevalue of the linearized residual.Our numerical tests showed that it is important to choose both α k and β k adap-tively during iteration. A simple solution is to let β k = α k and apply the Armijo–Goldstein principle (2.5) with s ( k ) = e s ( k ) − t ( k ) , in place of e s ( k ) , to estimate α k . Thisapproach proves to be eﬀective in the computation of the minimal-norm solution, butits convergence is often rather slow. To speed up iteration we propose a procedure toadaptively choose the value of β k . Algorithm 1

Outline of the MNGN2 method.

Require: nonlinear function F , data vector b , Require: initial solution x (0) , model proﬁle x , tolerance η for residual increase Ensure: approximation x ( k +1) of minimal-norm least-squares solution k = 0, β = 1 repeat k = k + 1 compute e s ( k ) by the Gauss–Newton method (1.3) compute α k by the Armijo–Goldstein principle (2.5) compute t ( k ) by (2.4) if β < then β = 2 β end if e x ( k +1) = x ( k ) + α k e s ( k ) e ρ k +1 = k F ( e x ( k +1) ) − b k + ε M x ( k +1) = e x ( k +1) − β t ( k ) ρ k +1 = k F ( x ( k +1) ) − b k while ( ρ k +1 > e ρ k +1 + δ ( e ρ k +1 , η )) and ( β > − ) do β = β/ x ( k +1) = e x ( k +1) − β t ( k ) ρ k +1 = k F ( x ( k +1) ) − b k end while β k = β until convergenceThis procedure is outlined in Algorithm 1. Initially, we set β = 1. At eachiteration, we compute the residual at the Gauss–Newton iteration e x ( k +1) and at thetentative iteration x ( k +1) = e x ( k +1) − β t ( k ) . At line 11 of the algorithm we add themachine epsilon ε M to the actual residual to deal with the case where it is zero.Subtracting the vector β t ( k ) may cause the residual to increase. We accept suchan increase if k r ( x ( k +1) ) k ≤ k r ( e x ( k +1) ) k + δ (cid:0) k r ( e x ( k +1) ) k , η (cid:1) , (3.3)where δ ( ρ, η ) is a function determining the maximal increase allowed in the residual ρ , and η > β is halved and the residualis recomputed until (3.3) is veriﬁed or β becomes excessively small. To allow β toincrease, we tentatively double it at each iteration (see line 8 in the algorithm) beforeapplying the above procedure. ur experiments showed that a reasonable value for the residual increase is δ ( ρ, η ) = ηρ , with η = 8. Anyway, we noticed that in cases where the residual stag-nates, accepting a large increase in the residual may lead to nonconvergence. Anotherissue is that we are willing to accept a relatively large increase in a small residual,but a small increase in a large one, so that a ﬁxed multiple of the residual is not wellsuited to model the increase.To overcome these diﬃculties, we consider δ ( ρ, η ) = ρ η , and choose η at each stepby the adaptive procedure described in Algorithm 2. When at least k res iterationshave been performed, we compute the linear polynomial which ﬁts the logarithm ofthe last k res residuals in the least-squares sense. To detect if the residual stagnatesor increases, we check if the slope M of the regression line exceeds − − . If thishappens, the value of η is doubled. The eﬀect on the algorithm is to enhance thedecrease of the residual and reduce the diminution of the norm. To recover a sensibledecrease in the norm, if at a subsequent step the residual reduction accelerates (e.g., M < − η is halved. In our experiments, we initialize η to and set k res = 5. Algorithm 2

Adaptively determine the residual increase δ ( ρ, η ). Require: actual residual ρ , starting tolerance η , iteration index k Require: residuals θ j = ρ k − k res + j , j = 1 , . . . , k res Ensure: residual increase δ ( ρ, η ) M min = − − , M max = − if k ≥ k res then compute regression line p ( t ) = M t + N of ( j, log( θ j )), j = 1 , . . . , k res if M > M min then η = 2 η else if M < M max then η = η/ end if end if δ ( ρ, η ) = ρ η To detect convergence, we interrupt the iteration as soon as k x ( k +1) − x ( k ) k < τ k x ( k +1) k or k α k e s ( k ) k < τ, (3.4)or when a ﬁxed number of iteration N max is exceeded. The second stop conditionin (3.4) detects the slow progress of the relaxed Gauss–Newton iteration algorithm.This often happens close to the solution. The stop tolerance is set to τ = 10 − .

4. Estimating the rank of the Jacobian.

In order to apply Theorem 2.1to compute the minimal-norm solution by (2.3), the rank r of the Jacobian matrix J k = J ( x ( k ) ) should be known in advance. As the rank may vary during iteration,here we set r k = rank( J k ). The knowledge of r k for each k = 0 , , . . . , is not generallyavailable, making it necessary to estimate its value at each iteration step, to avoidnonconvergence or a breakdown of the algorithm.In such situations, it is common to consider the numerical rank r ǫ,k of J k , some-times denoted as ǫ -rank, where ǫ represents a chosen tolerance. The numerical rankis deﬁned in terms of the singular values σ ( k ) i of J k , as the integer r ǫ,k such that σ ( k ) r ǫ,k > ǫ ≥ σ ( k ) r ǫ,k +1 . heorem 2.1 can be adapted to this setting, by simply replacing at each iteration therank r with the numerical rank r ǫ,k .Determining the numerical rank is a diﬃcult task for discrete ill-posed problems,as the singular values decay monotonically to zero. In this case, the numerical rankplays the role of a regularization parameter and is estimated by suitable methods,which often require information about the noise level and type; see, e.g., [11, 15].When the problem is locally rank-deﬁcient, meaning that the rank of J ( x ) dependson the evaluation vector x , the numerical rank r ǫ,k can be determined, in principle,by choosing a suitable value of ǫ . Numerical experiments show that a ﬁxed valueof ǫ does not always lead to a correct estimation of r ǫ,k , but that it is preferable todetermine the ǫ -rank by searching for a sensible gap between σ ( k ) r ǫ,k and σ ( k ) r ǫ,k +1 .To locate such a gap, we adopt a heuristic approach already applied in [4] for thesame purpose, in a diﬀerent setting. At each step, we compute the ratios ρ ( k ) i = σ ( k ) i σ ( k ) i +1 , i = 1 , , . . . , r k − , where either r k is the (exact) rank of J k (i.e., such that σ r k +1 = 0), or r k = min( m, n ).Then, we consider the index set I k = n i ∈ { , , . . . , r k − } : ρ ( k ) i > R and σ ( k ) i > τ o . In our numerical simulations, R = 10 and τ is the same tolerance used in (3.4). Anindex i belongs to I k if there is a signiﬁcant gap between σ ( k ) i and σ ( k ) i +1 , and σ ( k ) i isnumerically nonzero. If the set I k is empty, we set r ǫ,k = r k . Otherwise, we consider ρ ( k ) j = max i ∈I k ρ ( k ) i , and we deﬁne r ǫ,k = j . This amounts to selecting the largest gap between “large”and “small” singular values.

5. Nonlinear minimal- L -norm solution. The introduction of a regulariza-tion matrix L ∈ R p × n , p ≤ n , in least-squares problems was originally connectedto the numerical treatment of linear discrete ill-posed problems, and in particular toTikhonov regularization. The use of a regularization matrix is also justiﬁed in un-derdetermined least-squares problems, as in (1.5), in order to select a solution withparticular features, such as smoothness or sparsity, among the inﬁnite possible solu-tions.While in (1.5) the seminorm k L s k is minimized over all the update vectors s whichminimize the linearized residual, here we seek to compute the minimal- L -norm solu-tion to the nonlinear problem (1.1), that is the vector x which solves the constrainedproblem  min x ∈ R n k L ( x − x ) k s. t. min x ∈ R n k F ( x ) − b k . (5.1)Similarly to Section 2, we consider an iterative method of the type (1.2), where thestep s ( k ) is the solution of the linearized problem  min s ∈ R n k L ( x ( k ) − x + α s ) k s. t. min s ∈ R n k J k s + r k k . (5.2) e will denote this as the minimal- L -norm Gauss–Newton (MLNGN) method.We recall the deﬁnition of the generalized singular value decomposition (GSVD)of a matrix pair ( J, L ). Let J ∈ R m × n and L ∈ R p × n be matrices with rank( J ) = r and rank( L ) = p . Assume that m + p ≥ n andrank (cid:18)(cid:20) JL (cid:21)(cid:19) = n, which corresponds to requiring that N ( J ) ∩ N ( L ) = { } . The GSVD of the matrixpair ( J, L ) is deﬁned as the factorization J = U Σ J W − , L = V Σ L W − , where U ∈ R m × m and V ∈ R p × p are matrices with orthonormal columns u i and v i ,respectively, and W ∈ R n × n is nonsingular. If m ≥ n ≥ r , the matrices Σ J ∈ R m × n and Σ L ∈ R p × n have the formΣ J =  O n − r C I d O ( m − n ) × n  , Σ L =  I p − r + d O p × d S  , where d = n − p , C = diag( c , . . . , c r − d ) , < c ≤ c ≤ · · · ≤ c r − d < ,S = diag( s , . . . , s r − d ) , > s ≥ s ≥ · · · ≥ s r − d > , (5.3)with c i + s i = 1, for i = 1 , . . . , r − d . The identity matrix of size k is denoted by I k ,while O k and O k × ℓ are zero matrices of size k and k × ℓ , respectively; a matrix blockhas to be omitted when one of its dimensions is zero. The scalars γ i = c i s i are called generalized singular values , and they appear in nondecreasing order.If r ≤ m < n , the matrices Σ J ∈ R m × n and Σ L ∈ R p × n take the formΣ J =  O m − r O m × ( n − m ) C I d  , Σ L =  I p − r + d O p × d S  , where the blocks are deﬁned as above.Let J k = U Σ J W − , L = V Σ L W − be the GSVD of the matrix pair ( J k , L ). Weindicate by w i the column vectors of the matrix W , and by b w j the rows of W − , thatis W = [ w , . . . , w n ] , W − =  b w ... b w n  . Let us observe that N ( J k ) = span( w , . . . , w n − r ), if r = rank( J k ); see [14] for a proof. Theorem 5.1.

Let x ( k ) ∈ R n and let e x ( k +1) = x ( k ) + α e s ( k ) be the Gauss–Newtoniteration for (1.1) , where the step e s ( k ) is determined by solving (1.5) and the step ength α is chosen by any strategy which ensures convergence. Then, the iteration x ( k +1) = x ( k ) + α s ( k ) for (5.2) , is given by x ( k +1) = e x ( k +1) − W c W (cid:0) x ( k ) − x (cid:1) , (5.4) where c W ∈ R ( n − r ) × n contains the ﬁrst n − r rows of W − , and W ∈ R n × ( n − r ) iscomposed of the ﬁrst n − r columns of W .Proof . The proof proceeds analogously to that of Theorem 4.2 in [14]. Replacing J k and L with their GSVD and setting y = W − s , z ( k ) = W − (cid:0) x ( k ) − x (cid:1) , and b ( k ) = U T r k , (5.2) can be rewritten as the following diagonal least-squares problem  min y ∈ R n k Σ L ( α y + z ( k ) ) k s. t. min y ∈ R n k Σ J y + b ( k ) k . When m ≥ n , the diagonal linear system in the constraint is solved by a vector y with entries y i =  − b ( k ) i c i − n + r , i = n − r + 1 , . . . , p, − b ( k ) i , i = p + 1 , . . . , n. The components y i , for i = 1 , . . . , n − r , can be determined by minimizing the norm k Σ L ( α y + z ( k ) ) k = n − r X i =1 (cid:16) αy i + z ( k ) i (cid:17) + p X i = n − r +1 − α b ( k ) i γ i − n + r + s i − n + r z ( k ) i ! , (5.5)where γ i = c i s i are the generalized singular values of the matrix pair ( J k , L ). Theminimum of (5.5) is reached for y i = − α z ( k ) i = − α b w i ( x ( k ) − x ), i = 1 , . . . , n − r , andthe solution to (5.2), that is, the next approximation to the solution of (5.1), is x ( k +1) = x ( k ) + αW y = x ( k ) − n − r X i =1 z ( k ) i w i − α p X i = n − r +1 b ( k ) i c i − n + r w i − α n X i = p +1 b ( k ) i w i , (5.6)where the ﬁrst summation in the right-hand side can be rewritten as W c W ( x ( k ) − x ).Applying the same procedure to (1.5), we obtain e x ( k +1) = x ( k ) − α p X i = n − r +1 b ( k ) i c i − n + r w i − α n X i = p +1 b ( k ) i w i , from which (5.4) follows. Since solving (5.2) for m < n leads to a formula similarto (5.6), with b ( k ) i − n + m in place of b ( k ) i , the validity of (5.4) is conﬁrmed.As in the computation of the minimal-norm solution, the iteration based on (5.4)fails to converge without a suitable relaxation parameter β k for the projection vector t ( k ) = W c W ( x ( k ) − x ). We adopted an iteration similar to (3.2), choosing β k byadapting Algorithms 1 and 2 to this setting. It is important to note that e P N ( J k ) = W c W is an oblique projector onto N ( J k ). t the same time, the rank of the Jacobian is estimated at each step by applyingthe procedure described in Section 4 to the diagonal elements c ( k ) j , j = 1 , . . . , r k − d ,of the GSVD factor Σ J of J k ; see (5.3). In this case, at each step, we compute theratios ρ ( k ) i = c ( k ) i +1 c ( k ) i , i = 1 , , . . . , r k − d − , where r k is the (exact) rank of J k .Actually, the GSVD routine computes the matrix W − , but the matrix W isneeded for the computation of both the vectors e s ( k ) and t ( k ) . To reduce the compu-tational load, we compute at each iteration the LU factorization P W − = LU , andwe use it to solve the linear system with two right-hand sides W − (cid:2) t ( k ) e s ( k ) (cid:3) = (cid:20)c W ( x ( k ) − x ) n − r r e y (cid:21) , where e y ∈ R r contains the last r components of the vector y appearing in (5.6), and k denotes the zero vector of size k .

6. Test problems and numerical results.

The MNGN2 method, deﬁned by(3.2), was implemented in the Matlab programming language; the software is availablefrom the authors. To investigate its performance and compare it to the methoddeveloped in [3], which will be denoted as CKB, we performed numerical experimentson four test examples that will be illustrated in the following. The ﬁrst examplewas presented in [3], the other ones were constructed in order to highlight particulardiﬃculties in the computation of the normal solution.For each example, we repeated the computation 100 times, varying the startingpoint x (0) by letting its components be uniformly distributed random numbers in[0 , x was set to the zero vector.We consider a numerical test a “success” if the algorithm converges within thenumber of iterations allowed according to the stop condition (3.4), letting τ = 10 − and N max = 500. We measure over all the tests the average of both the numberof iterations performed and the norm of the converged solution. We also report thenumber of “successes”.In the following, the MNGN2 algorithm (3.2) will be denoted by diﬀerent names,according to the particular implementation. In the method denoted by MNGN2 α ,we let α k = β k and determine α k by the Armijo–Goldstein principle. Algorithm 1is denoted by MNGN2 αβ , when δ ( ρ, η ) = ηρ , with η = 8. The same algorithmwith δ ( ρ, η ) = ρ η , and η estimated by Algorithm 2, is labeled as MNGN2 αβδ . Thealgorithm (3.1) developed in [3] will be denoted by CKB when γ k = (0 . k +1 , andby CKB when γ k = (0 . k .In all the methods, the rank of the Jacobian is estimated at each step by theprocedure described in Section 4. Example 1.

The ﬁrst example we consider has been introduced in [3]. Let F : R → R be the nonlinear function deﬁned by F ( x ) = x − ( x − − x − − . The equation F ( x ) = 0 represents an elliptic paraboloid in R with vertex V =(1 , , T . able 6.1 Results for Example 1. method iterations k e x k α

328 3.6816 100MNGN2 αβ

155 3.9818 24MNGN2 αβδ

31 3.6842 100CKB

26 3.7323 100CKB αβ , but those developed in this paper pro-duce a solution with a slightly smaller norm than the CKB methods. The MNGN2 αβδ method reaches the same solution of MNGN2 α , but it is about 10 times faster. Example 2.

Let F : R n → R m be the nonlinear function F ( x ) = [ F ( x ) , F ( x ) , . . . , F m ( x )] T , x ∈ R n , m ≤ n, (6.1)deﬁned by F i ( x ) = ( S ( x ) , i = 1 ,x i − ( x i − c i ) , i = 2 , . . . , m, (6.2)where S ( x ) = n X j =1 (cid:18) x j − c j a j (cid:19) − n -ellipsoid with center c = ( c , . . . , c n ) T and whose semiaxes are the componentsof the vector a = ( a , . . . , a n ) T .The ﬁrst order partial derivatives of F i ( x ) are ∂F i ∂x k =  a k ( x k − c k ) , i = 1 , k = 1 , . . . , n,x i − c i , i = 2 , . . . , m, k = i − ,x i − , i = k = 2 , . . . , m, , otherwise . Setting z j = 2 x j − c j a j and y j = x j − c j , for j = 1 , . . . , n , the Jacobian matrix of F is J ( x ) =  z z z · · · z m − z m · · · z n y x y x . . . . . .. . . . . . y m x m −  . (6.3)The locus of the solutions is the intersection between the hypersurface deﬁned by S ( x ) = 0 and by the pairs of planes x i − = 0, x i − c i = 0, i = 2 , . . . , m . igure 6.1 . Convergence of problem (6.2) for m = 2 and n = 3 , with a = (1 , , T , c =(2 , , T , and x (0) = ( , , T . The solutions are in the intersection between the sphere and theunion of the two planes. The blue dots are the iterations of the MNGN αβδ method, and the redones correspond to the CKB method. The black circle is the minimal-norm solution. If a = e = (1 , . . . , T and c = 2 e , the minimal-norm solution is x † =  ξ n,m , , . . . , | {z } m − , ξ n,m , . . . , ξ n,m | {z } n − m  T , with ξ n,m = 2 − ( n − m + 1) − / , while if c = (2 , , . . . , T it is x † = (1 , , . . . , T . Itis immediate to observe that in the last situation the Jacobian (6.3) is rank-deﬁcientat x † . This case is illustrated in Figure 6.1, where the iterations of the MNGN2 αβδ and the CKB methods are reported too. The iterations performed are 20 and 24,respectively; the computed solutions are substantially coincident.Table 6.2 displays the results obtained for m = 8, n = 10, and the same parametervectors as above. In this case, we applied the algorithms to both the solution of theminimal-norm problem, and the computation of the minimal- L -norm solution with L = D and L = D , i.e., the discrete approximations of the ﬁrst and the secondderivatives.In the ﬁrst case, except for MNGN2 α , all the methods are eﬀective, but theMNGN2 algorithms recover a lesser norm than CKB methods. The situation is sim-ilar when L = D , but all the methods experience some failures, many failures inthe case of MNGN2 α . This method is totally unsuccessful in the third case, wherethere is a substantial advantage in the use of the MNGN2 αβδ algorithm in terms ofnumber of successes and, in comparison to the CKB methods, of the weighted normminimization. able 6.2 Results for Example 2 with m = 8 , n = 10 , a = e , and c = (2 , , . . . , T . method iterations k L e x k L = I MNGN2 α

119 1.0000 50MNGN2 αβ

27 1.0606 100MNGN2 αβδ

31 1.0587 100CKB

37 1.4846 100CKB

20 1.5362 100 L = D MNGN2 α

203 1.0000 60MNGN2 αβ

119 1.0013 92MNGN2 αβδ

80 1.1622 97CKB

29 1.5538 99CKB

82 1.7161 89 L = D MNGN2 α

208 0.9234 2MNGN2 αβ

176 1.0406 63MNGN2 αβδ

128 1.2292 91CKB

36 2.3604 82CKB

33 2.2344 17We remark that all methods exhibit a large number of failures if the rank esti-mation procedure of Section 4 is not applied, as the Jacobian becomes approximatelyrank-deﬁcient in a neighborhood of the normal solution.

Example 3.

Let F be a nonlinear function such as (6.1), with F i ( x ) = S ( x ) ( x i − c i ) , i = 1 , . . . , m, (6.4)and S ( x ) deﬁned as in the previous example. The ﬁrst order derivatives of F i ( x ) are ∂F i ∂x k =  a i ( x i − c i ) + S ( x ) , k = i, a k ( x k − c k )( x i − c i ) , k = i. Setting y i = x i − c i , for i = 1 , . . . , m , and z j = x j − c j a j , for j = 1 , . . . , n , the Jacobianmatrix can be represented as J ( x ) = S ( x ) I m × n + 2 yz T , where I m × n includes the ﬁrst m rows of an identity matrix of size n . The Jacobianturns out to be a diagonal plus rank-1 matrix. This structure may be useful to reducecomplexity when solving large scale problems.When S ( x ) = 0, the matrix J ( x ) has rank 1. Indeed, in this case the compactSVD of the Jacobian is J ( x ) = y k y k (2 k y kk z k ) z T k z k , so that the only non-zero singular value is 2 k y kk z k . We may assume that the Jacobianis approximately rank-deﬁcient in the surroundings of a solution. igure 6.2 . Convergence of problem (6.4) for m = 2 and n = 3 , with a = (1 , , T , c =(2 , , T , and x (0) = (0 , , T . The locus of the solutions is the sphere and the line intersectionof the two planes. The blue dots are the iterations of the MNGN αβδ method, and the red onescorrespond to the CKB method. The black circle is the minimal-norm solution. The locus of the solutions is the union of the n -ellipsoid and the intersectionbetween the planes x i = c i , i = 1 , . . . , m .If a = e and c = 2 e , the minimal-norm solution x † depends on the dimensions m and n : if m < n − √ n + , then it is x † = (2 , , . . . , | {z } m , , . . . , | {z } n − m ) T , otherwise, it is x † = (cid:18) − √ nn (cid:19) e . If c = (2 , , . . . , T , it is x † = (1 , , . . . , T . The case m = 2, n = 3, is displayed inFigure 6.2, together with the iterations of the algorithms MNGN2 αβδ and CKB . Inthis test, the last algorithm fails.Table 6.3 reports the results obtained for m = 8 and n = 10. In this case, theMNGN2 αβδ algorithm is superior to all the other methods in the computation of boththe minimal-norm and the minimal L -weighted norm solutions.Since this example is interesting in itself as a test problem, we report some furthercomments on it. If m = n , the locus of the solutions is the union of the n -ellipsoidand the point x = c . The spectrum of J ( x ) is σ ( J ( x )) = (cid:8) S ( x ) + 2 y T z , S ( x ) , . . . , S ( x ) (cid:9) , able 6.3 Results for Example 3 with m = 8 , n = 10 , a = e , and c = (2 , , . . . , T . method iterations k L e x k L = I MNGN2 α

33 2.0508 17MNGN2 αβ

13 2.3382 100MNGN2 αβδ

38 1.5438 100CKB

27 2.3430 100CKB

13 2.3646 100 L = D MNGN2 α

25 2.0062 9MNGN2 αβ

14 1.9261 100MNGN2 αβδ

27 1.4625 100CKB

27 2.3179 100CKB

13 2.4364 100 L = D MNGN2 α

18 2.0030 14MNGN2 αβ

15 1.8924 100MNGN2 αβδ

22 1.3493 100CKB

27 2.7855 100CKB

14 3.0272 100where the eigenvalue S ( x ) has algebraic multiplicity n −

1. The Jacobian matrix isinvertible if and only if S ( x ) = 0. If this condition is met, the inverse is obtained bythe Sherman–Morrison formula J ( x ) − = 1 S ( x ) I n − S ( x )( S ( x ) + 2 z T y ) yz T . Example 4.

Let F be the nonlinear function (6.1) with components F i ( x ) = 12 S ( x ) (cid:0) x i + 1 (cid:1) , i = 1 , . . . , m, and S ( x ) deﬁned as above. The locus of the solutions is the n -ellipsoid.Setting y i = x i + 1, for i = 1 , . . . , m , and z j = x j − c j a j , for j = 1 , . . . , n , theJacobian matrix can be expressed as J ( x ) = S ( x ) D m,n ( x ) + yz T , where D m,n ( x ) is an m × n diagonal matrix whose main diagonal consists of the vector x . Indeed, ∂F i ∂x k =  x i S ( x ) + x i − c i a i (cid:0) x i + 1 (cid:1) , k = i,x k − c k a k (cid:0) x i + 1 (cid:1) , k = i. When S ( x ) = 0, the Jacobian J ( x ) is rank-1.If a = e , the locus of the solutions is the n -sphere centered in c with unitaryradius. If c = 2 e , the minimal-norm solution is x † = (cid:18) − √ nn (cid:19) e , hile if c = (2 , , . . . , T it is x † = (1 , , . . . , T .Table 6.4 displays the results for the last case, when m = 8 and n = 10. We seethat the algorithms are roughly equivalent, except the CKB method which convergesonly in 20% of the tests and produces a solution with a large norm. Table 6.4

Results for Example 4 with m = 8 , n = 10 , a = e , and c = (2 , , . . . , T . method iterations k e x k α

202 1.0146 79MNGN2 αβ

201 1.0387 95MNGN2 αβδ

208 1.0351 96CKB

224 2.1372 20CKB

224 1.0324 94

7. Conclusions.

This paper explores the computation of the minimal-( L -)normsolution of nonlinear least-squares problems, and the reasons for the occasional lackof convergence of Gauss–Newton methods. We propose to introduce two diﬀerentrelaxation parameters that improve the eﬃciency of the iterative method. The ﬁrstparameter is determined by applying the Armijo–Goldstein principle, while two tech-niques are developed to estimate the second one. In numerical experiments performedon four test problems, our method proves to be the most eﬀective, compared to otherapproaches based on a single damping parameter. Acknowledgements.

The work of the authors was partially supported by theRegione Autonoma della Sardegna research project “Algorithms and Models for Imag-ing Science [AMIS]” (RASSR57257, intervento ﬁnanziato con risorse FSC 2014-2020- Patto per lo Sviluppo della Regione Sardegna), and the INdAM-GNCS researchproject “Tecniche numeriche per l’analisi delle reti complesse e lo studio dei problemiinversi”. Federica Pes gratefully acknowledges CRS4 (Centro di Ricerca, Sviluppo eStudi Superiori in Sardegna) for the ﬁnancial support of her Ph.D. scholarship.

REFERENCES[1]

L. Armijo , Minimization of functions having Lipschitz continuous ﬁrst partial derivatives ,Pac. J. Math., 16 (1966), pp. 1–3.[2] ˚A. Bj¨orck , Numerical Methods for Least Squares Problems , SIAM, Philadelphia, 1996.[3]

S. L. Campbell, P. Kunkel, and K. Bobinyec , A minimal norm corrected underdeterminedGauß–Newton procedure , Applied Numerical Mathematics, 62 (2012), pp. 592–605.[4]

A. Concas, S. Noschese, L. Reichel, and G. Rodriguez , A spectral method for bipartizinga network and detecting a large anti-community , J. Comput. Appl. Math., 373 (2020),p. 112306.[5]

J. E. Dennis Jr. and R. B. Schnabel , Numerical methods for unconstrained optimizationand nonlinear equations , SIAM, 1996.[6]

J. Eriksson , Optimization and regularization of nonlinear least squares problems . Ph.D. The-sis, Ume˚a University, Sweden, 1996.[7]

J. Eriksson and P. A. Wedin , Regularization methods for nonlinear least squares problems.part i: Exactly rank-deﬁcient problems , tech. rep., Ume˚a University, Sweden, 1996.[8]

J. Eriksson, P. A. Wedin, M. E. Gulliksson, and I. S¨oderkvist , Regularization methodsfor uniformly rank-deﬁcient nonlinear least-squares problems , J. Optim. Theory Appl., 127(2005), pp. 1–26.[9]

A. A. Goldstein , Constructive Real Analysis , Harper and Row, 1967.[10]

G. H. Golub and C. F. Van Loan , Matrix Computations , The John Hopkins University Press,Baltimore, third ed., 1996. 1711]

P. C. Hansen , Rank–Deﬁcient and Discrete Ill–Posed Problems , SIAM, Philadelphia, 1998.[12]

P. C. Hansen, V. Pereyra, and G. Scherer , Least Squares Data Fitting with Applications ,Johns Hopkins University Press, 2012.[13]

J. M. Ortega and W. C. Rheinboldt , Iterative Solution of Nonlinear Equations in SeveralVariables , Academic Press, New York, 1970.[14]

F. Pes and G. Rodriguez , The minimal-norm Gauss-Newton method and some of its regu-larized variants , Electron. Trans. Numer. Anal., 53 (2020), pp. 459–480.[15]