Sequential adaptive elastic net approach for single-snapshot source localization
aa r X i v : . [ s t a t . M E ] M a y Sequential adaptive elastic net approach forsingle-snapshot source localization
Muhammad Naveed Tabassum and Esa Ollila Aalto University, Dept. of Signal Processing and Acoustics, P.O. Box 15400, FI-00076 Aalto,Finland. a) This paper proposes efficient algorithms for accurate recovery of direction-of-arrivals (DoAs)of sources from single-snapshot measurements using compressed beamforming (CBF). InCBF, the conventional sensor array signal model is cast as an underdetermined complex-valued linear regression model and sparse signal recovery methods are used for solving theDoA finding problem. We develop a complex-valued pathwise weighted elastic net (c-PW-WEN) algorithm that finds solutions at knots of penalty parameter values over a path (orgrid) of EN tuning parameter values. c-PW-WEN also computes Lasso or weighted Lassoin its path. We then propose a sequential adaptive EN (SAEN) method that is based onc-PW-WEN algorithm with adaptive weights that depend on previous solution. Extensivesimulation studies illustrate that SAEN improves the probability of exact recovery of truesupport compared to conventional sparse signal recovery approaches such as Lasso, elasticnet or orthogonal matching pursuit in several challenging multiple target scenarios. Theeffectiveness of SAEN is more pronounced in the presence of high mutual coherence.c (cid:13) [http://dx.doi.org(DOI number)][XYZ] Pages: 1–12
I. INTRODUCTION
Acoustic signal processing problems generally em-ploy a system of linear equations as the data model.For overdetermined linear systems, the least square es-timation (LSE) is often applied, but in underdeterminedor ill-conditioned problems, the LSE is no longer uniquebut the optimization problem has infinite number of so-lutions. In these cases, additional constraints such asthose promoting sparse solutions, are commonly usedsuch as the Lasso (Least Absolute Shrinkage and Selec-tion Operator) or the elastic net (EN) penalty whichis an extension of Lasso based on a convex combinationof ℓ and ℓ penalties of the Lasso and ridge regression.It is now a common practice in acousticapplications to employ grid based sparse signalrecovery methods for finding source parameters, e.g.,direction-of-arrival (DoA) and power. This approach,referred to as compressive beamforming (CBF) , wasoriginally proposed in and has then emerged as one ofthe most useful approaches in problems where only fewmeasurements are available. Since the pioneering workof , the usefulness of CBF approach has been shownin a series of papers . In this paper, we address theproblem of estimating the unknown source parameterswhen only a single snapshot is available.Existing approaches in grid-based single-snapshotCBF problem often use the Lasso for sparse recovery.Lasso, however, often performs poorly in the cases whensources are closely spaced in angular domain or when a) E-mail: (muhammad.tabassum, esa.ollila)@aalto.fi there exists large variations in the source powers. Thesame holds true when the grid used for constructing thearray steering matrix is dense, which is the case, whenone aims for high-resolution DoA finding. The problemis due to the fact that Lasso has poor performance whenthe predictors are highly correlated (cf. ). Anotherproblem is that Lasso lacks group selection ability. Thismeans that when two sources are closely spaced in theangular domain, then Lasso tends to choose only one ofthem in the estimation grid but ignores the other. ENis often performing better in such cases as it enforcessparse solution but has a tendency to pick and reject thecorrelated variables as a group unlike the Lasso. Fur-thermore, EN also enjoys the computational advantagesof the Lasso and adaptation using smartly chosen data-dependent weights can further enhance performance.The main aim in this paper is to improve over theconventional sparse signal recovery methods especiallyin the presence of high mutual coherence of basis vectoror when non-zero coefficients have largely varying ampli-tudes. This former case occurs in CBF problem when adense grid is used for constructing the steering matrix orwhen sources arrive to a sensor array from either neigh-bouring or oblique angles. To this end, we propose a sequential adaptive approach using the weighted elasticnet (WEN) framework. To achieve this in a computa-tionally efficient way, we propose a homotopy method that is a complex-valued extension of the least angles re-gression and shrinkage (LARS) algorithm for weightedLasso problem, which we refer to as c-LARS-WLasso.The developed c-LARS-WLasso method is numericallycost effective and avoids an exhaustive grid-search overcandidate values of the penalty parameter. J. Acoust. Soc. Am. / 22 May 2018 Sequential adaptive elastic net 1 n this paper, we assume that the number of non-zero coefficients (i.e., number of sources K arriving ata sensor array in CBF problem) is known and proposea complex-valued pathwise (c-PW)-WEN algorithm thatutilizes c-LARS-WLasso along with PW-LARS-EN al-gorithm proposed in to compute the WEN path. c-PW-WEN computes the K -sparse WEN solutions overa grid of EN tuning parameter values and then selectsthe best final WEN solution. We also propose a novel sequential adaptive elastic net (SAEN) approach thatapplies adaptive c-PW-WEN sequentially by decreasingthe sparsity level (order) from 3 K to K in three stages.SAEN utilizes smartly chosen adaptive (i.e., data depen-dent) weights that are based on solutions obtained in theprevious stage.Application of the developed algorithms is illustratedin the single-snapshot CBF DoA estimation problem. InCBF, accurate recovery depends heavily on the user spec-ified angular estimation grid (the look directions of in-terests) which determines the array steering matrix con-sisting of array response vectors to look directions (es-timation grid points) of interests. A dense angular gridimplies high mutual coherence, which indicates a poorrecovery region for most sparse recovery methods . Ef-fectiveness of SAEN compared to state-of-the art sparsesignal recovery algorithms are illustrated via extensivesimulation studies.The paper is structured as follows. Section II dis-cusses the WEN optimization problem and its benefitsover the Lasso and adaptive Lasso. We introduce c-LARS-WLasso method in Section III. In Section IV,we develop the c-PW-WEN algorithm that finds the K -sparse WEN solutions over a grid of EN tuning parametervalues. In Section V, the SAEN approach is proposed.Section VI layouts the DoA’s estimation problem fromsingle-snapshot measurements using CBF. The simula-tion studies using large variety of set-ups are provided inSection VII. Finally, Section VIII concludes the paper. Notations : Lowercase boldface letters are used forvectors and uppercase for matrices. The ℓ -norm andthe ℓ -norm are defined as k a k = √ a H a and k a k = P ni =1 | a i | , respectively, where | a | = √ a ∗ a = p a R + a I denotes the modulus of a complex number a = a R + a I .The support A of a vector a ∈ C p is the index set of itsnonzero elements, i.e., A = supp( a ) = { j ∈ { , . . . , p } : a j = 0 } . The ℓ -(pseudo)norm of a is defined as k a k = | supp( a ) | , which is equal to the total number of nonzeroelements in it. For a vector β ∈ C p (resp. matrix X ∈ C n × p ) and an index set A K ⊆ { , , . . . , p } of cardinality |A K | = K , we denote by β A K (resp. X A K ) the K × n × K matrix) restricted to componentsof β (resp. columns of X ) indexed by the set A K . Let a ⊗ b denotes the Hadamard (i.e., element-wise) productof a ∈ C p and b ∈ C p and by a ⊘ b we denote the element-wise division of vectors. We denote by h a , b i = a H b theusual Hermitian inner product of C p . Finally, diag( a )denotes a p × p matrix with elements of a as its diagonalelements. II. WEIGHTED ELASTIC NET FRAMEWORK
We consider the linear model, where the n complex-valued measurements are modeled as y = X β + ε , (1)where X ∈ C n × p is a known complex-valued design ormeasurement matrix, β ∈ C p is the unknown vector ofcomplex-valued regression coefficients and ε ∈ C n is thecomplex noise vector. For ease of exposition, we considerthe centered linear model (i.e., we assume that the inter-cept is equal to zero). In this paper, we deal with under-determined or ill-posed linear model, where p > n , andthe primary interest is to find a sparse estimate of the un-known parameters β given y ∈ C n and X ∈ C n × p . In thispaper, we assume that the sparsity level, K = k β k , i.e.,the number of non-zero elements of β is known. In DoAfinding problem using compressed beamforming, this isequivalent to assuming that the number of sources arriv-ing at a sensor array is known.The WEN estimator finds the solution ˆ β ∈ C p to thefollowing constrained optimization problemminimize β ∈ C p k y − X β k subject to P α (cid:0) β ; w (cid:1) ≤ t, (2)where t ≥ P α (cid:0) β ; w (cid:1) = p X j =1 w j (cid:16) α | β j | + (1 − α )2 | β j | (cid:17) is the WEN constraint (or penalty) function, vector w = ( w , . . . , w p ) ⊤ , w i ≥ i = 1 , . . . , p , collectsthe non-negative weights, and α ∈ [0 ,
1] is an
EN tuningparameter . Both the weights w and α are chosen by theuser. When the weights are data-dependent, we refer thesolution as adaptive EN (AEN) . Note that AEN is anextension of the adaptive Lasso .The constrained optimization problem in (2) can alsobe written in an equivalent penalized formˆ β ( λ, α ) = arg min β ∈ C p k y − X β k + λP α (cid:0) β ; w (cid:1) , (3)where λ ≥ penalty (or regularization) parame-ter . The problems in (2) and (3) are equivalent due toLagrangian duality and either can be solved. Herein, weuse (3).Recall that EN tuning parameter α ∈ [0 ,
1] offers ablend amongst the Lasso and the ridge regression. Thebenefit of EN is its ability to select correlated variablesas a group, which is illustrated in Figure 1. The ENpenalty has singularities at the vertexes like Lasso, whichis a necessary property for sparse estimation. It alsohas strictly convex edges which then help in selectingvariables as a group, which is a useful property whenhigh correlations exists between predictors. Moreover,for w = (a vector of ones) and α = 1, (3) results in a LassoEN Least squares estimateLow-correlation contoursSparse selection (a)
LassoEN Least squares estimateHigh-correlation contoursGroup selectionSparse selection (b)FIG. 1. (Color online) The Lasso solution is often at thevertices (corners) but the EN solution can occur on edges aswell, depending on the correlations among the variables. Inthe uncorrelated case of (a), both the Lasso and the EN hassparse solution. When the predictors are highly correlated asin (b), the EN has a group selection in contrast to the Lasso.
Lasso solution and for w = and α = 0 we obtain theridge regression estimator.Then WEN solution can be computed using any al-gorithm that can find the non-weighted ( w = ) ENsolution. To see this, let us write ˜ X = X diag( w ) − and˜ β = β ⊗ w . Then, WEN solution is found by applyingthe following steps.1. Solve the (non-weighted) EN solution on trans-formed data ( y , ˜ X ):˜ β ( λ, α ) = arg min β ∈ C p k y − ˜ X β k + λP α (cid:0) β (cid:1) , where P α ( β ) = P α ( β ; p ) is the EN penalty.2. WEN solution for the original data ( y , X ) isˆ β ( λ, α ) = ˜ β ⊘ w Yet, the standard (i.e., non-weighted, w j ≡ j = 1 , . . . , p ) EN estimator may perform inconsistentvariable selection. The EN solution depends largely on λ (and α ) and tuning these parameters optimally is a diffi-cult problem. The adaptive Lasso obtains oracle vari-able selection property by using cleverly chosen adaptiveweights for regression coefficients in the ℓ -penalty. Weextend this idea to WEN-penalty, coupling it with activeset approach , where WEN is applied to only nonzero (ac-tive) coefficients. Our proposed adaptive EN uses datadependent weights, defined asˆ w j = ( / | ˆ β init ,j | , ˆ β init ,j = 0 ∞ , ˆ β init ,j = 0 , j = 1 , . . . , p (4)Above ˆ β init ∈ C p denotes a sparse initial estimator of β .The idea is that only nonzero coefficients are exploited, i.e., basis vectors with ˆ β init ,j = 0 are omitted from themodel, and thus the dimensionality of the linear model isreduced from p to K = k ˆ β init k . Moreover, larger weightmeans that the corresponding variable is penalized moreheavily. The vector ˆ w = ( ˆ w , . . . , ˆ w p ) ⊤ can be writtencompactly as ˆ w = p ⊘ (cid:12)(cid:12) ˆ β init (cid:12)(cid:12) , (5)where notation | β | means element-wise application ofthe absolute value operator on the vector, i.e., | β | =( | β | , . . . , | β p | ) ⊤ . III. COMPLEX-VALUED LARS METHOD FORWEIGHTED LASSO
In this section, we develop the c-LARS-WLasso algo-rithm which is a complex-valued extension of the LARSalgorithm for weighted Lasso framework. This is thenused to construct a complex-valued pathwise (c-PW-)WEN algorithm. These methods compute solution (3)at particular penalty parameter λ values, called knots ,at which a new variable enters (or leaves) the active setof nonzero coefficients. Our c-PW-WEN exploits the c-LARS-WLasso as its core computational engine.Let ˆ β ( λ ) denote a solution to (3) for some fixed value λ of the penalty parameter in the case that Lasso penaltyis used ( α = 1) with unit weights (i.e., w = p ). Alsorecall that predictors are normalized so that k x j k = 1, j = 1 , . . . , p . Then note that the solution ˆ β ( λ ) needs toverify the generalized Karush-Kuhn-Tucker conditions.That is, ˆ β ( λ ) is a solution to (3) if and only if it verifiesthe zero sub-gradient equations given by h x j , rrr ( λ ) i = λ ˆ s j for j = 1 , . . . , p (6)where rrr ( λ ) = y − X ˆ β ( λ ) and ˆ s j ∈ sign { ˆ β j ( λ ) } , meaningthat ˆ s j = e θ , where θ = arg { ˆ β j ( λ ) } , if ˆ β j ( λ ) = 0 andsome number inside the unit circle, ˆ s j ∈ { x ∈ C : | x | ≤ } , otherwise. Taking absolute values of both sides ofequation (6), one notices that at the solution, condition |h x j , rrr ( λ ) i| = λ holds for the active predictors whereas |h x j , rrr ( λ ) i| ≤ λ holds for non-active predictors. Thus as λ decreases and more predictors are joined to the activeset, the set of active predictors become less correlatedwith the residual. Moreover, the absolute value of thecorrelation, or equivalently the angle between any activepredictor and the residual is the same. In the real-valuedcase, the LARS method exploits this feature and linearityof the Lasso path to compute the knot-values, i.e., thevalue of the penalty parameters where there is a changein the active set of predictors.Let us briefly recall the main principle of the LARSalgorithm. LARS starts with a model having no vari-ables (so β = ) and picks a predictor that has maximalcorrelation (i.e., having smallest angle) with the residual rrr = y . Suppose the predictor x is chosen. Then, themagnitude of the coefficient of the selected predictor isincreased (toward its least-squares value) until one hasreached a step size such that another predictor (say, pre-dictor x ) has the same absolute value of correlation with J. Acoust. Soc. Am. / 22 May 2018 Sequential adaptive elastic net 3 he evolving residual rrr = y − (stepsize) x , i.e., the up-dated residual makes equal angles with both predictorsas shown in Fig. 2. Thereafter, LARS moves in the newdirection, which keeps the evolving residual equally cor-related (i.e., equiangular) with selected predictors, untilanother predictor becomes equally correlated with theresidual. After that, one can repeat this process until allpredictors are in the model or to specified sparsity level. x x x r = y q q x x r q q step-size FIG. 2. Starting from all zeros, LARS picks predictor x thatmakes least angle (i.e., θ < θ ) with residual rrr and moves inits direction until θ = θ where LARS picks x and changesdirection. Then, LARS repeats this procedure and selectsnext equicorrelated predictor and so on until some stoppingcriterion. We first consider the weighted Lasso (WLasso) prob-lem (so α = 1) by letting X = X diag( w ) − and β = β ⊗ w . We write ˆ β ( λ ) for the solution of the optimiza-tion problem (3) in this case. Let λ denotes the small-est value of λ such that all coefficients of the WLassosolution are zero, i.e., ˆ β ( λ ) = . It is easy to seethat λ = max j |h x j , y i| for j = 1 , , . . . , p . Let A =supp { ˆ β ( λ ) } denote the active set at the regularizationparameter value λ < λ . The knots λ > λ > · · · > λ K are defined as smallest values of the penalty parametersafter which there is a change in the set of active predic-tors, i.e., the order of sparsity changes. The active set ata knot λ k is denoted by A k = supp { ˆ β ( λ k ) } . The activeset A thus contains a single index as A = { j } , where j is predictor that becomes active first, i.e., j = arg max j ∈{ ,...,p } |h x j , y i| . By definition of the knots, one has that A k =supp { ˆ β ( λ k ) } ∀ λ ∈ ( λ k − , λ k ] and A k = A k +1 for all k = 1 , . . . , K .The c-LARS-WLasso outlined in Algorithm 1 is astraightforward generalization of LARS-Lasso algorithmto complex-valued and weighted case. It does not havethe same theoretical guarantees as its real-valued coun-terpart to solve the exact values of the knots. Namely,LARS algorithm uses the property that (in the real-valued case) the Lasso regularization path is continuousand piecewise linear with respect to λ ; see . In thecomplex-valued case, however, the solution path betweenthe knots is not necessarily linear . Hence the c-LARS-WLasso may not give precise values of the knots in allthe cases. However, simulations validate that the algo-rithm finds the knots with reasonable precision. Future work is needed to provide theoretical guarantees of thealgorithm to find the knots. Algorithm 1: c-LARS-WLasso algorithm input : y ∈ C n , X ∈ C n × p , w ∈ R p and K . output : {A k , λ k , ˆ β ( λ k ) } Kk =0 initialize : β (0) = p × , A = {∅} , ∆ = p × , theresidual rrr = y . Set X ← X diag( w ) − . Compute λ = max j |h x j , rrr i| and j = arg max j |h x j , rrr i| , where j = 1 , . . . , p . for k = 1 , . . . , K do Find the active set A k = A k − ∪ { j k } and itsleast- squares direction δ to have [ ∆ ] A k = δ : δ = 1 λ k − ( X H A k X A k ) − X H A k rrr k − . Define vector β ( λ ) = β ( k − + ( λ k − − λ ) ∆ , for0 < λ ≤ λ k − , and the corresponding residual as rrr ( λ ) = y − X β ( λ )= y − X β ( k − − ( λ k − − λ ) X∆ = rrr k − − ( λ k − − λ ) X A k δ The knot λ k is the largest λ -value 0 < λ ≤ λ k − s.t. h x ℓ , rrr ( λ ) i = λe θ , ℓ
6∈ A k (7)where a new predictor (at index j k +1
6∈ A k )becomes active, thus verifying |h x j k +1 , rrr ( λ k ) i| = λ k from (7). Update the values at a knot λ k : β ( k ) = β ( k − + ( λ k − − λ k ) ∆ rrr k = y − X β ( k ) . The Lasso solution is ˆ β ( λ k ) = β ( k ) . { ˆ β ( λ k ) = ˆ β ( λ k ) ⊘ w } Kk =0 Below we discuss how to solve the knot λ k and theindex j k +1 in step 5 of the c-LARS-WLasso algorithm. Solving step 5:
First we note that h x ℓ , rrr ( λ ) i = h x ℓ , rrr k − − ( λ k − − λ ) X A k δ i = h x ℓ , rrr k − i − ( λ k − − λ ) h x ℓ , X A k δ i = c ℓ − ( λ k − − λ ) b ℓ , (8)where we have written c ℓ = h x ℓ , rrr k − i and b ℓ = h x ℓ , X A k δ i . First we need to find λ for each ℓ
6∈ A k ,such that |h x ℓ , rrr ( λ ) i| = λ holds. Due to (7) and (8) thismeans finding 0 < λ ≤ λ k − such that c ℓ − ( λ k − − λ ) b ℓ = λe θ . (9)Let us reparametrize such that λ = λ k − − γ ℓ . Thenidentifying 0 < λ ≤ λ k − is equivalent to identifying the uxiliary variable γ ℓ ≥
0. Now (9) becomes c ℓ − ( λ k − − λ k − + γ ℓ ) b ℓ = ( λ k − − γ ℓ ) e θ ⇔ c ℓ − γ ℓ b ℓ = ( λ k − − γ ℓ ) e θ ⇔ | c ℓ − γ ℓ b ℓ | = ( λ k − − γ ℓ ) ⇔ | c ℓ | − γ ℓ Re( c ℓ b ∗ ℓ ) + γ ℓ | b ℓ | = λ k − − λ k − γ ℓ + γ ℓ . The last equation implies that γ ℓ can be found by solvingthe roots of the second-order polynomial equation, Aγ ℓ + Bγ ℓ + C = 0, where A = | b ℓ | − , B = 2 λ k − − c ℓ b ∗ ℓ )and C = | c ℓ | − λ k − . The roots are γ ℓ = { γ ℓ , γ ℓ } andthe final value of γ ℓ will be γ ℓ = ( min( γ ℓ , γ ℓ ) , if γ ℓ > γ ℓ > { max( γ ℓ , γ ℓ ) } + , otherwisewhere ( t ) + = max(0 , t ) for t ∈ R . Thus finding largest λ that verifies (7) is equivalent to finding smallest non-negative γ ℓ . Hence the variable that enters to active set A k +1 is j k +1 = arg min ℓ k γ ℓ and the knot is thus λ k = λ k − − γ j k +1 . Thereafter, solution ˆ β ( λ k ) at the knot λ k is simple to find in step 6 . IV. COMPLEX-VALUED PATHWISE WEIGHTED ELAS-TIC NET
Next we develop a complex-valued and weighted ver-sion of PW-LARS-EN algorithm proposed in , and re-ferred to as c-PW-WEN algorithm. Generalization tocomplex-valued case is straightforward. The essentialdifference is that c-LARS-WLasso Algorithm 1 is usedinstead of the (real-valued) LARS-Lasso algorithm. Thealgorithm finds the K th knot λ K and the correspondingWEN solutions at a dense grid of EN tuning parametervalues α and then picks final solution for best α -value.Let λ ( α ) denotes the smallest value of λ such that allcoefficients in the WEN estimate are zero, i.e., ˆ β ( λ , α ) = . The value of λ ( α ) can be expressed in closed-form : λ ( α ) = max j α (cid:12)(cid:12)(cid:12)(cid:12) h x j , y i w j (cid:12)(cid:12)(cid:12)(cid:12) , j = 1 , . . . , p. The c-PW-WEN algorithm computes K -sparse WEN so-lutions for a set of α values in a dense grid[ α ] = { α i ∈ [1 ,
0) : α = 1 < · · · < α m < } . (10)Let A ( λ ) = supp { ˆ β ( λ, α ) } denote the active set (i.e.,nonzero elements of WEN solution) for a given fixed reg-ularization parameter value λ ≡ λ ( α ) < λ ( α ) and forgiven α value in the grid [ α ]. The knots λ ( α ) > λ ( α ) > · · · > λ K ( α ) are the border values of the regularizationparameter after which there is a change in the set of ac-tive predictors. Since α is fixed we drop the dependencyof the penalty parameter on α and simply write λ or λ K instead of λ ( α ) or λ K ( α ). The reader should howeverkeep in mind that the value of the knots are different forany given α . The active set at a knot λ k is then denoted shortly by A k ≡ A ( λ k ) = supp { ˆ β ( λ k , α ) } . Note that A k = A k +1 for all k = 1 , . . . , K . Note that it is assumedthat a non-zero coefficient does not leave the active set forany value λ > λ K , that is, the sparsity level is increasingfrom 0 (at λ ) to K (at λ K ).First we let that Algorithm 1 can be written asc-LARS-WLasso( y , X , w , K ) then we can write it as let { λ k , ˆ β ( λ k ) } = c-LARS-WLasso( y , X , w ) (cid:12)(cid:12) k for extracting the k th knot (and the corresponding solu-tion) from a sequence of the knot-solution pairs found bythe c-LARS-WLasso algorithm. Next note that we canwrite the EN objective function in augmented form asfollows:12 k y − X β k + λP α ( β ) = 12 k y a − X a ( η ) β k + γ (cid:13)(cid:13) β (cid:13)(cid:13) (11)where γ = λα and η = λ (1 − α ) , (12)are new parameterizations of the tuning and shrinkageparameter pair ( α, λ ), and y a = y0 ! and X a ( η ) = X √ η I p ! are the augmented forms of the response vector y andthe predictor matrix X , respectively. Note that (11) re-sembles the Lasso objective function with y a ∈ R n + p andthat X a ( η ) is an ( n + p ) × p matrix.It means that we can compute the K -sparse WENsolution at K th knot for fixed α using the c-LARS-WLasso algorithm. Our c-PW-WEN method is givenin algorithm 2. It computes the WEN solutions at theknots over a dense grid (10) of α values. After Step 6of the algorithm we have solution at the K th knot for agiven α i value on the grid. Having the solution ˆ β ( λ K , α i )available, we then in steps 7 to 9, compute the residualsum of squares (RSS) of the debiased WEN solution atthe K th knot (having K nonzeros): RSS ( α i ) = k y − X A K ˆ β LS ( λ K , α i ) k , where A K = supp( ˆ β ( λ K , α i )) is the active set at the K th knot and ˆ β LS ( λ K , α i ) is the debiased LSE , defined asˆ β LS ( λ K , α i ) = X + A K y , (13)where X A K ∈ C n × K consists of the K active columns of X associated with the active set A K and X + A K denotesits Moore-Penrose pseudo inverse.While sweeping through the grid of α values andcomputing the WEN solutions ˆ β ( λ K , α i ), we chooseour best candidate solution as the WEN estimateˆ β ( λ K , α ı ) that had the smallest RSS value, i.e., ı =arg min i RSS ( α i ), where i ∈ { , . . . , m } . J. Acoust. Soc. Am. / 22 May 2018 Sequential adaptive elastic net 5 lgorithm 2: c-PW-WEN algorithm. input : y ∈ C n , X ∈ C n × p , w ∈ R p , [ α ] ∈ R m (recall α = 1), K and debias . output: ˆ β K ∈ C p and A K ∈ R K . { λ k ( α ) , ˆ β ( λ k , α ) } Kk =0 =c-LARS-WLasso (cid:0) y , X , w , K (cid:1) for i = 2 to m do for k = 1 to K do ˜ η k = λ k ( α i − ) · (1 − α i ) (cid:8) γ k , ˆ β ( λ k , α i ) (cid:9) =c-LARS-WLasso (cid:0) y a , X a (˜ η k ) , w ) (cid:12)(cid:12) k λ k ( α i ) = γ k /α i A K = supp { ˆ β ( λ K , α i ) } ˆ β LS ( λ K , α i ) = X + A K y RSS ( α i ) = k y − X A K ˆ β LS ( λ K , α i ) k , ı = arg min i RSS ( α i ) ˆ β K = ˆ β ( λ K , α ı ) and A K = supp { ˆ β K } if debias then ˆ β A K = X + A K y V. SEQUENTIALLY ADAPTIVE ELASTIC NET
Next we turn our attention on how to choose theadaptive (i.e., data dependent) weights in c-PW-WEN.In adaptive Lasso , one ideally uses the LSE or, if p > n ,the Lasso as an initial estimator ˆ β init to construct theweights given in (4). The problem is that both the LSEand the Lasso estimator have very poor accuracy (highvariance) when there exists high correlations betweenpredictors, which is the condition we are concerned inthis paper. This lowers the probability of exact recoveryof the adaptive Lasso significantly.To overcome the problem above, we devise a sequen-tial adaptive elastic net (SAEN) algorithm that obtainsthe K -sparse solution in a sequential manner decreasingthe sparsity level of the solution at each iteration and us-ing the previous solution as adaptive weights for c-PW-WEN. The SAEN is described in algorithm 3. SAENruns the c-PW-WEN algorithm three times. At first step,it finds a standard (unit weight) c-PW-WEN solution for3 K nonzero (active) coefficients which we refer to as ini-tial EN solution ˆ β init . The obtained solution determinesthe adaptive weights via (4) (and hence the active setof 3 K nonzero coefficients) which is used in the secondstep to computes the c-PW-WEN solution that has 2 K nonzero coefficients. This again determines the adaptiveweights via (4) (and hence the active set of 2 K nonzerocoefficients) which is used in the second step to computethe c-PW-WEN solution that has the desired K nonzerocoefficients. It is important to notice that since we startfrom a solution with 3 K nonzeros, it is quite likely thatthe true K non-zero coefficients will be included in theactive set of ˆ β init which is computed in the first step ofthe SAEN algorithm. Note that the choice of 3 K is sim- ilar to CoSaMP algorithm which also uses 3 K as aninitial support size. Using 3 K also usually guaranteesthat X A K is well conditioned which may not be the caseif larger value than 3 K is chosen. Algorithm 3:
SAEN algorithm input : y ∈ C n , X ∈ C n × p , [ α ] ∈ R m and K . output : ˆ β K ∈ C p (cid:8) ˆ β init , A K (cid:9) = c-PW-WEN (cid:0) y , X , p , [ α ] , K, (cid:1) (cid:8) ˆ β , A K (cid:9) =c-PW-WEN (cid:0) y , X A K , K ⊘ (cid:12)(cid:12) ˆ β init A K (cid:12)(cid:12) , [ α ] , K, (cid:1) (cid:8) ˆ β K , A K (cid:9) =c-PW-WEN (cid:0) y , X A K , K ⊘ (cid:12)(cid:12) ˆ β A K (cid:12)(cid:12) , [ α ] , K, (cid:1) VI. SINGLE-SNAPSHOT COMPRESSIVE BEAMFORM-ING
Estimating the source location, in terms of its DoA,plays an important role in many applications. In , it wasobserved that CS algorithms can be applied for DoA esti-mation (e.g. of sound sources) using sensor arrays whenthe array output y is be expressed as sparse (underdeter-mined) linear model by discretizing the DoA parameterspace. This approach is referred to as compressive beam-forming (CBF), and it has been subsequently used in aseries of papers (e.g., ).In CBF, after finding the sparse regression estimator,its support can be mapped to the DoA estimates on thegrid. Thus the DoA estimates in CBF are always selectedfrom the resulting finite set of discretized DoA parame-ters. Hence the resolution of CBF is dependent on thedensity of the grid (spacing ∆ θ or grid size p ). Densergrid implies large mutual coherence of the basis vectors x j (here equal to the steering vectors for the DoAs onthe grid) and thus a poor recovery region for most sparseregression techniques.The proposed SAEN estimator can effectively miti-gate the effect of high mutual coherence caused by dis-cretization of the DoA space with significantly better per-formance than state-of-the-art compressed sensing algo-rithms. This is illustrated in Section VII via extensivesimulation studies using challenging multi-source set-upsof closely-spaced sources and large variation of sourcepowers.We assume narrowband processing and a far-fieldsource wave impinging on an array of sensors with knownconfiguration. The sources are assumed to be located inthe far-field of the sensor array (i.e., propagation radius ≫ array size). A uniform linear array (ULA) of n sensors(e.g. hydrophones or microphones) is used for estimat-ing the DoA θ ∈ [ − ◦ , ◦ ) of the source with respectto the array axis. The array response (steering or wave-front vector) of ULA for a source from DoA (in radians) ∈ [ − π/ , π/
2) is given by a (cid:0) θ (cid:1) , √ n (cid:2) , e π sin θ , . . . , e π ( n −
1) sin θ (cid:3) ⊤ , where we assume half a wavelength inter-element spac-ing between sensors. We consider the case that K < n sources from distinct DoAs θ = ( θ , . . . , θ K ) arrive at asensor at some time instant t . A single-snapshot obtainedby ULA can then be modeled as y ( t ) = A ( θ ) s ( t ) + ε ( t ) (14)where θ = ( θ , . . . , θ K ) ⊤ collects the DoAs, the matrix A ( θ ) = [ a ( θ ) · · · a ( θ K )] , A ∈ C n × K is the dictionaryof replicas also known as the array steering matrix, s ( t ) ∈ C K contains the source waveforms and ε ( t ) is complexnoise at time instant t .Consider an angular grid of size p (commonly p ≫ n )of look directions of interest:[ ϑ ] = { ϑ i ∈ [ − π/ , π/
2) : ϑ < · · · < ϑ p } . Let the i th column of the measurement matrix X in themodel (1) be the array response for look direction ϑ i , so x i = a ( ϑ i ). Then, if the true source DoAs are containedin the angular grid, i.e., θ i ∈ [ ϑ ] for i = 1 , . . . , K , thenthe snapshot y in (14) (where we drop the time index t )can be equivalently modeled by (1) as y = X β + ε where β is exactly K -sparse ( k β k = K ) and nonzeroselements of β maintain the source waveforms s . Thus,identifying the true DoAs is equivalent to identifyingthe nonzero elements of β , which we refer to as CBF-principle . Hence, sparse regression and CS methods canbe utilised for estimating the DoAs based on a singlesnapshot only. We assume that the number of sources K is known a priori.Besides SNR, also the size p or spacing ∆ θ of the gridgreatly affect the performance of CBF methods. Thecross-correlation (coherence) between the true steeringvector and steering vectors on the grid depends on boththe grid spacing and obliqueness of the target DoAs w.r.t.to the array. Moreover, the values of cross-correlationsin the gram matrix | X H X | also depend on the distancebetween array elements and configuration of the sen-sors array . Let us construct a measure, called max-imal basis coherence (MBC) , defined as the maximumabsolute value of the cross-correlations among the truesteering vectors a ( θ j ) and the basis a ( ϑ i ), ϑ i ∈ [ ϑ ] \ { θ j } , j ∈ { , . . . , K } ,MBC = max j max ϑ ∈ [ ϑ ] \{ θ j } (cid:12)(cid:12) a ( θ j ) H a ( ϑ ) (cid:12)(cid:12) . (15)Note that steering vectors a ( θ ), θ ∈ [ − π/ , π/ a ( θ ) H a ( θ ) = 1). MBC valuemeasures the obliqueness of the incoming DoA to the ar-ray. The higher the MBC value, the more difficult it is for any CBF method to distinguish the true DoA in thegrid. Note that the value of MBC also depends on thegrid spacing ∆ θ .Fig. 3 shows the geometry of DoA estimation prob-lem, where the target DoA’s have varying basis coherenceand it increases with the level of obliqueness (inclination)on either side of the normal to the ULA axis. We definea straight DoA when the angle of incidence of the targetDoAs is in the shaded sector in Fig. 3. This the regionwhere the MBC has lower values. In contrast, an obliqueDoA is defined when the angle of incidence of the targetis oblique with respect to the array axis. θ nn-12 s (t)1 d Source d sin θ ¼ Straight DoAs
FIG. 3. (Color online) The straight and oblique DoAs exhibitdifferent basis coherence.
Consider the case of ULA with n = 40 elementsreceiving two sources at straight DoAs, θ = − ◦ and θ = 2 ◦ , or at oblique DoAs, θ = 44 ◦ and θ = 52 ◦ .The above scenarios correspond to set-up 2 and set-up 3of Section VII; see also Table II. Angular separation be-tween the DoA’s is 8 ◦ in both the scenarios. In Table I wecompute the correlation between the true steering vectorsand with a neighboring steering vector a ( ϑ ) in the grid. TABLE I. Correlation between true steering vector at DoAs θ and θ and a steering vector at angle ϑ on the grid in a twosource scenario set-ups with either straight or oblique DoAs. θ θ ϑ correlationTrue straight DoAs − ◦ ◦ − ◦ − ◦ ◦ ◦ oblique DoAs ◦ ◦ ◦ ◦ ◦ ◦ These values validate the fact highlighted in Fig. 3.Namely, a target with an oblique DoA w.r.t. the arrayhas a larger maximal correlation (coherence) with the ba-sis steering vectors. This makes it difficult for the sparse
J. Acoust. Soc. Am. / 22 May 2018 Sequential adaptive elastic net 7 ecovery method to identify the true steering vector a ( θ i )from the spurious steering vector a ( ϑ ) that simply has avery large correlation with the true one. Due to this mu-tual coherence, it may happen that neither a ( θ i ) or a ( ϑ )are assigned a non-zero coefficient value in ˆ β or perhapsjust one of them in random fashion. VII. SIMULATION STUDIES
We consider seven simulation set-ups. First five set-ups use grid spacing ∆ θ = 1 ◦ (leading to p = 180 lookdirections in the grid [ ϑ ]) and the last two employ sparsergrid ∆ θ = 2 ◦ (leading to p = 90 look directions in thegrid). The number of sensors n in the ULA are n = 40 forset-ups 1-5 and n = 30 for set-ups 6-7. Each set-up has K ∈ { , , } sources at different (straight or oblique)DoAs θ = ( θ , . . . , θ K ) ⊤ and the source waveforms aregenerated as s k = | s k | · e Arg( s k ) , where source powers | s k | ∈ (0 ,
1] are fixed for each set-up but the source phasesare randomly generated for each Monte-Carlo trial asArg( s k ) ∼ Unif(0 , π ), for k = 1 , . . . , K . Table II speci-fies the values of DoA-s and power of the sources used inthe set-ups. Also the MBC values (15) are reported foreach case. TABLE II. Details of all the set-ups tested in this paper. Firstfive set-ups have grid spacing ∆ θ = 1 ◦ and last two ∆ θ = 2 ◦ .Set-ups | s i | θ [ ◦ ] MBC1 [0 . , ,
1] [ − , ,
6] 0.8142 [0 . ,
1] [ − ,
2] 0.8143 [0 . ,
1] [44 ,
52] 0.9274 [0 . , . ,
1] [43 , ,
52] 0.9275 [0 . , . , , .
4] [ − . , − . , − . , .
7] 0.9906 [0 . , , . , . − [48 . , . , . ,
22] 0.9917 [0 . , , . , .
7] [6 , , ,
18] 0.643
The error terms ε i are i.i.d. and generated from C N (0 , σ ) distribution, where the noise variance σ de-pends on the signal-to-noise ratio (SNR) level in deci-bel (dB), given by SNR(dB) = 10 log ( σ s /σ ), where σ s = K (cid:8) | s | + | s | + · · · + | s K | (cid:9) denotes the averagesource power. An SNR level of 20 dB is used in thispaper unless its specified otherwise and the number ofMonte-Carlo trials is L = 1000.In each set-up, we evaluate the performance of allmethods in recovering exactly the true support and thesource powers. Due to high mutual coherence and due thelarge differences in source powers, the DoA estimation isnow a challenging task. A key performance measure isthe (empirical) probability of exact recovery (PER) of all K sources, defined asPER = ave { I ( A ∗ = ˆ A K ) } , where A ∗ denotes the index set of true source DoAs onthe grid and set ˆ A K the found support set, where |A ∗ | = | ˆ A K | = K , and the average is over all Monte-Carlo tri-als. Above I ( · ) denotes the indicator function. We alsocompute the average root mean squared error (RMSE)of the debiased estimate ˆ s = arg min s ∈ C K k y − X ˆ A K s k of the source vector as RMSE = p ave {k s − ˆ s k } . A. Compared methods
This paper compares the SAEN approach to the ex-isting well-known greedy methods, such as orthogonalmatching pursuit (OMP) and compressive samplingmatching pursuit (CoSaMP) . Moreover, we also drawcomparisons for two special cases of the c-PW-WEN al-gorithm to Lasso estimate that has K -non-zeros (i.e.,ˆ β ( λ K , w = and α = 1) and EN estimator when cherry-picking the best α in the grid [ α ] = { α i ∈ [1 ,
0) : α = 1 < · · · < α m < . } (i.e., ˆ β ( λ K , α bst )).It is instructive to compare the SAEN to simpleradaptive EN (AEN) approach that simply uses adaptiveweights to weight different coefficients differently in thespirit of adaptive Lasso . This helps in understandingthe effectiveness of the cleverly chosen weights and theusefulness of the three-step procedure used by the SAENalgorithm. Recall that the first step in AEN approach iscompute the weights using some initial solution. Afterobtaining the (adaptive) weights the final K -sparse AENsolution is computed. We devise three AEN approacheseach one using a different initial solution to compute theweights:1. AEN ( LSE ) uses the weights found as w ( LSE ) = ⊘ | X + y | , where X + is Moore-Penrose pseudo inverse of X .2. AEN ( n ) employs weights from an initial n -sparseEN solution ˆ β ( λ n , α ) at n th knot which is found byc-PW-WEN algorithm with w = .3. AEN (3 K ) instead uses weights calculated from aninitial EN solution ˆ β ( λ K , α ) having 3 K nonzerosas in step 1 of SAEN algorithm, but the remainingtwo steps of SAEN algorithm are omitted.The upper bound for PER rate for SAEN is the (em-pirical) probability that the initial solution ˆ β init com-puted in step 1 of algorithm 3 contains the true support A ∗ , i.e., the valueUB = ave (cid:8) I (cid:0) A ∗ ⊂ supp( ˆ β init (cid:1)(cid:9) (16)where the average is over all Monte-Carlo trials. Wealso compute this upper bound to illustrate the abilityof SAEN to pick the true K -sparse support from theoriginal 3 K -sparse initial value. For set-up 1 (cf. Ta-ble II), the average recovery results for all of the abovementioned methods are provided in Table III. ABLE III. The recovery results for set-up 1. Results illus-trate the effectiveness of three step SAEN approach comparedto its competitors. The SNR level was 20 dB and the upperbound (16) for the PER rate of SAEN is given in parentheses.SAEN AEN (3 K ) AEN ( n ) PER (0.957) ( LSE ) EN Lasso OMP CoSaMPPER 0 0.332 0.332 0.477 0.140RMSE 1.870 1.163 1.163 1.060 35.58
It can be noted that the proposed SAEN outperformsall other methods and weighting schemes and recovers hetrue support and powers of the sources effectively. Notethat the SAEN’s upper bound for PER rate was 95.7%and SAEN reached the PER rate 86.4%. The results ofthe AEN approaches validate the need for accurate initialestimate to construct the adaptive weights. For example,AEN (3 K ) performs better than AEN ( n ) , but much worsethan the SAEN method. B. Straight and oblique DoAs
Set-up 2 and set-up 3 correspond to the case wherethe targets are at straight and oblique DoAs, respectively.Performance results of the sparse recovery algorithms aretabulated in Table IV. As can be seen, the upper boundfor PER rate of SAEN is full 100% percentage whichmeans that the true support is correctly included in 3 K -sparse solution computed at Step 1 of the algorithm.For set-up 2 (straight DoAs), all methods have almostfull PER rates except CoSaMP with 67.8% rate. Per-formance of other estimators expect of SAEN changesdrastically in set-up 3 (oblique DoAs). Here SAEN isachieving nearly perfect ( ∼ θ = 43 ◦ andthe variation of the source powers is slightly larger. Ascan be noted from Table IV, the PER rates of greedy al-gorithms, OMP and CoSaMP, have declined to outstand-ingly low 0%. This is very different with the PER ratesthey had in set-up 3 which contained only two sources.Indeed, inclusion of the 3rd source from an DoA similarwith the other two sources completely ruined their accu-racy. This is in deep contrast with the SAEN methodthat still achieves PER rate of 75%, which is more thantwice the PER rate achieved by Lasso. SAEN is againhaving the lowest RMSE values. TABLE IV. Recovery results for set-ups 2 - 4. Note that foroblique DoA’s (set-ups 3 and 4), the SAEN method outper-form the other methods and has a perfect recovery resultsfor set-up 2 (straight DoAs). SNR level is 20 dB. The upperbound (16) for the PER rate of SAEN is given in parentheses.Set-up 2 with two straight DoAs
SAEN EN Lasso OMP CoSaMPPER (1.000) oblique DoAs
SAEN EN Lasso OMP CoSaMPPER (1.000) oblique DoAs
SAEN EN Lasso OMP CoSaMPPER (0.776)
RMSE
In summary, the recovery results for set-ups 1-4(which express different degrees of basis coherence, prox-imity of target DoA’s, as well as variation of source pow-ers), clearly illustrate that the proposed SAEN performsvery well in identifying the true support and the power ofthe sources and is always outperforming the commonlyused benchmarks sparse recovery methods, namely, theLasso, EN, OMP or CoSaMP with a significant margin.It is also noteworthy that EN often achieved better PERrates than Lasso which is mainly due to its group se-lection ability. As a specific example of this particularfeature, Figure 4 shows the solution paths for Lasso andEN for one particular Monte-Carlo trial, where EN cor-rectly chooses the true DoAs but Lasso fails to select all correct DoAs. In this particular instance, the EN tun-ing parameter was α = 0 .
9. This is reason behind thesuccess of our c-PW-WEN algorithm which is the corecomputational engine of the SAEN.
C. Off-grid sources
Set-ups 5 and 6 explore the case when the targetDoAs are off the grid. Also note that set-up 5 uses finergrid spacing ∆ θ = 1 ◦ compared to set-up 6 with ∆ θ = 2 ◦ .Both set-ups contain four target sources that have largelyvarying source powers. In the off-grid case, one would likethe CBF method to localize the targets to the nearestDoA in the angular grid [ ϑ ] that is used to construct thearray steering matrix. Therefore, in the off the grid case, J. Acoust. Soc. Am. / 22 May 2018 Sequential adaptive elastic net 9 P o w e r (43)°(52)° (a) (b) P o w e r
40 50 60 70DoA [degrees]
SourcesLassoEN (c)FIG. 4. (Color online) The Lasso and EN solution paths(upper panel) and respective DoA solutions at the knot λ .Observe that Lasso fails to recover the true support but ENsuccessfully picks the true DoAs. The EN tuning parameterwas α = 0 . the PER rate refers to the case that CBF method selectsthe K -sparse support that corresponds to DoA’s on thegrid that are closest in distance to the true DoAs. TableV provides the recovery results. As can be seen, again theSAEN is performing very well, outperforming the Lassoand EN. Note that OMP and CoSaMP completely fail inselecting the nearest grid-points. TABLE V. Performance results of CBF methods for set-ups5 and 6, where target DoA-s are off the grid. Here PERrate refers to the case that CBF method selects the K -sparsesupport that corresponds to DoA’s on the grid that are closestto the true DoAs. The upper bound (16) for the PER rate ofSAEN is given in parentheses. SNR level is 20 dB.Set-up 5 with four off-grid straight DoAs SAEN EN Lasso OMP CoSaMPPER (0.999) oblique DoAs
SAEN EN Lasso OMP CoSaMPPER (0.794) P E R FIG. 5. (Color online) PER rates of CBF methods at differentSNR levels for set-up 7.
D. More targets and varying SNR levels
Next we consider the set-up 7 (cf. Table II) whichcontains K = 4 sources. The first three of the sourcesare at straight DoAs and the fourth one at a DoA withmodest obliqueness ( θ = 18 o ). We now compute thePER rates of the methods as a function of SNR. From thePER rates shown in Fig. 5 we again notice that SAENclearly outperforms all of the other methods. Note thatthe upper bound (16) of the PER rate of the SAEN is alsoplotted. Both greedy algorithms, OMP and CoSaMP, areperforming very poorly even at high SNR levels. Lassoand EN are attaining better recovery results than thegreedy algorithms. Again EN is performing better thanLasso due to additional flexibility offered by EN tuningparameter and its ability to cope with correlated steering(basis) vectors. SAEN recovers the exact true support inmost of the cases due to its step-wise adaptation usingcleverly chosen weights. Furthermore, the improvementin PER rates offered by SAEN becomes larger as the SNRlevel increases. One can also notice that SAEN is close tothe theoretical upper bound of PER rate at higher SNRregime. VIII. CONCLUSIONS
We developed c-PW-WEN algorithm that computesweighted elastic net solutions at the knots of penalty pa-rameter over a grid of EN tuning parameter values. c-PW-WEN also computes weighted Lasso as a special case(i.e., solution at α = 1) and adaptive EN (AEN) is ob-tained when adaptive (data dependent) weights are used.We then proposed a novel SAEN approach that uses c-PW-WEN method as its core computational engine anduses three-step adaptive weighting scheme where sparsityis decreased from 3 K to K in three steps. Simulationsillustrated that SAEN performs better than the adap-tive EN approaches. Furthermore, we illustrated that the3 K -sparse initial solution computed at step 1 of SAENprovide smart weights for further steps and includes thetrue K -sparse support with high accuracy. The proposedSAEN algorithm is then accurately including the truesupport at each step.
10 J. Acoust. Soc. Am. / 22 May 2018 Sequential adaptive elastic net sing the K -sparse Lasso solution computed directlyfrom Lasso path at the k th knot fails to provide exactsupport recovery in many cases, especially when we havehigh basis coherence and lower SNR. Greedy algorithmsoften fail in the face of high mutual coherence (due todense grid spacing or oblique target DoA’s) or low SNR.This is mainly due the fact that their performance heav-ily depends on their ability to accurately detecting max-imal correlation between the measurement vector y andthe basis vectors (column vectors of X ). Our simulationstudy also showed that their performance (in terms ofPER rate) deteriorates when the number of targets in-creases. In the off-grid case, the greedy algorithms alsofailed to find the nearby grid-points.Finally, the SAEN algorithm performed better thanall other methods in each set-up and the improvementwas more pronounced in the presence of high mutualcoherence. This is due to ability of SAEN to in-clude the true support correctly at all three steps ofthe algorithm. Our MATLAB R (cid:13) package that imple-ments the proposed algorithms is freely available at . The packagealso contains a MATLAB R (cid:13) live script demo on how touse the method in CBF problem along with an examplefrom simulation set-up 4 presented in the paper. ACKNOWLEDGMENTS
The research was partially supported by theAcademy of Finland grant no. 298118 which is grate-fully acknowledged. R. Tibshirani, “Regression shrinkage and selection via the lasso,”Journal of the Royal Statistical Society. Series B (Methodologi-cal) 267–288 (1996). H. Zou and T. Hastie, “Regularization and variable selection viathe elastic net,” Journal of the Royal Statistical Society: SeriesB (Statistical Methodology) (2), 301–320 (2005). T. Yardibi, J. Li, P. Stoica, and L. N. C. III, “Sparsity con-strained deconvolution approaches for acoustic source mapping,”The Journal of the Acoustical Society of America (5), 2631–2642 (2008) doi: . A. Xenaki, E. Fernandez-Grande, and P. Gerstoft, “Block-sparsebeamforming for spatially extended sources in a bayesian formu-lation,” The Journal of the Acoustical Society of America (3),1828–1838 (2016). D. Malioutov, M. Cetin, and A. S. Willsky, “A sparse signalreconstruction perspective for source localization with sensor ar-rays,” IEEE Transactions on Signal Processing (8), 3010–3022(2005) doi: . G. F. Edelmann and C. F. Gaumond, “Beamforming using com-pressive sensing,” The Journal of the Acoustical Society of Amer-ica (4), EL232–EL237 (2011) doi: . A. Xenaki, P. Gerstoft, and K. Mosegaard, “Compressive beam-forming,” The Journal of the Acoustical Society of America (1), 260–271 (2014) doi: . P. Gerstoft, A. Xenaki, and C. F. Mecklenbruker, “Multiple andsingle snapshot compressive beamforming,” The Journal of theAcoustical Society of America (4), 2003–2014 (2015) doi: . Y. Choo and W. Seong, “Compressive spherical beamformingfor localization of incipient tip vortex cavitation,” The Journal of the Acoustical Society of America (6), 4085–4090 (2016)doi: . A. Das, W. S. Hodgkiss, and P. Gerstoft, “Coherent multi-path direction-of-arrival resolution using compressed sensing,”IEEE Journal of Oceanic Engineering (2), 494–505 (2017) doi: . S. Fortunati, R. Grasso, F. Gini, M. S. Greco, and K. LePage,“Single-snapshot doa estimation by using compressed sensing,”EURASIP Journal on Advances in Signal Processing (1),120 (2014) doi: . E. Ollila, “Nonparametric simultaneous sparse recovery: An ap-plication to source localization,” in
Proc. European Signal Pro-cessing Conference (EUSIPCO’15) , Nice, France (2015), pp.509–513. E. Ollila, “Multichannel sparse recovery of complex-valued sig-nals using Huber’s criterion,” in
Proc. Compressed Sensing The-ory and its Applications to Radar, Sonar and Remote Sensing(CoSeRa’15) , Pisa, Italy (2015), pp. 32–36. M. N. Tabassum and E. Ollila, “Single-snapshot doa estimationusing adaptive elastic net in the complex domain,” in (2016), pp. 197–201, doi: . T. Hastie, R. Tibshirani, and M. Wainwright,
Statistical learningwith sparsity: the lasso and generalizations (CRC Press, 2015). M. Osborne, B. Presnell, and B. Turlach, “A new ap-proach to variable selection in least squares problems,” IMAJournal of Numerical Analysis (3), 389–403 (2000) doi: . B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least an-gle regression (with discussion),” The Annals of statistics (2),407–499 (2004), doi: . M. N. Tabassum and E. Ollila, “Pathwise least angle regressionand a significance test for the elastic net,” in (2017), pp. 1309–1313,doi: . S. Foucart and H. Rauhut,
A mathematical introduction to com-pressive sensing , Vol. 1 (Springer, 2013). H. Zou, “The adaptive lasso and its oracle properties,” Journal ofthe American Statistical Association (476), 1418–1429 (2006)doi: . A. E. Hoerl and R. W. Kennard, “Ridge regression: biased esti-mation for nonorthogonal problems,” Technometrics (1), 55–67 (1970). S. Rosset and J. Zhu, “Piecewise linear regularized so-lution paths,” Ann. Statist. (3), 1012–1030 (2007) doi: . R. J. Tibshirani and J. Taylor, “The solution path of thegeneralized lasso,” Ann. Statist. (3), 1335–1371 (2011) doi: . R. J. Tibshirani, “The lasso problem and uniqueness,” Electron.J. Statist. , 1456–1490 (2013) doi: . A. Panahi and M. Viberg, “Fast candidate points selection in thelasso path,” IEEE Signal Processing Letters (2), 79–82 (2012). K. L. Gemba, W. S. Hodgkiss, and P. Gerstoft, “Adaptiveand compressive matched field processing,” The Journal ofthe Acoustical Society of America (1), 92–103 (2017) doi: . A. B. Gershman, C. F. Mecklenbrauker, and J. F. Bohme, “Ma-trix fitting approach to direction of arrival estimation with imper-fect spatial coherence of wavefronts,” IEEE Transactions on Sig-nal Processing (7), 1894–1899 (1997) doi: . J. A. Tropp and A. C. Gilbert, “Signal recovery from randommeasurements via orthogonal matching pursuit,” IEEE Trans-actions on Information Theory (12), 4655–4666 (2007) doi: . D. Needell and J. Tropp, “Cosamp: Iterative signal recovery fromincomplete and inaccurate samples,” Applied and Computational
J. Acoust. Soc. Am. / 22 May 2018 Sequential adaptive elastic net 11 armonic Analysis (3), 301 – 321 (2009).(3), 301 – 321 (2009).