[PDF] Structured LISTA for Multidimensional Harmonic Retrieval

Abstract

Learned iterative shrinkage thresholding algorithm (LISTA), which adopts deep learning techniques to learn optimal algorithm parameters from labeled training data, can be successfully applied to small-scale multidimensional harmonic retrieval (MHR) problems. However, LISTA computationally demanding for large-scale MHR problems because the matrix size of the learned mutual inhibition matrix exhibits quadratic growth with the signal length. These large matrices consume costly memory/computation resources and require a huge amount of labeled data for training, restricting the applicability of the LISTA method. In this paper, we show that the mutual inhibition matrix of a MHR problem naturally has a Toeplitz structure, which means that the degrees of freedom (DoF) of the matrix can be reduced from a quadratic order to a linear order. By exploiting this characteristic, we propose a structured LISTA-Toeplitz network, which imposes a Toeplitz structure restriction on the mutual inhibition matrices and applies linear convolution instead of the matrix-vector multiplication involved in the traditional LISTA network. Both simulation and field test for air target detection with radar are carried out to validate the performance of the proposed network. For small-scale MHR problems, LISTAToeplitz exhibits close or even better recovery accuracy than traditional LISTA, while the former significantly reduces the network complexity and requires much less training data. For large-scale MHR problems, where LISTA is difficult to implement due to the huge size of the mutual inhibition matrices, our proposed LISTA-Toeplitz still enjoys desirable recovery performance.

Full PDF

aa r X i v : . [ ee ss . SP ] F e b Structured LISTA for Multidimensional HarmonicRetrieval

Rong Fu, Yimin Liu ∗ , Tianyao Huang, and Yonina C. Eldar Abstract —Learned iterative shrinkage thresholding algorithm(LISTA), which adopts deep learning techniques to learn opti-mal algorithm parameters from labeled training data, can besuccessfully applied to small-scale multidimensional harmonicretrieval (MHR) problems. However, LISTA computationallydemanding for large-scale MHR problems because the matrixsize of the learned mutual inhibition matrix exhibits quadraticgrowth with the signal length. These large matrices consumecostly memory/computation resources and require a huge amountof labeled data for training, restricting the applicability of theLISTA method.In this paper, we show that the mutual inhibition matrix of aMHR problem naturally has a Toeplitz structure, which meansthat the degrees of freedom (DoF) of the matrix can be reducedfrom a quadratic order to a linear order. By exploiting thischaracteristic, we propose a structured LISTA-Toeplitz network,which imposes a Toeplitz structure restriction on the mutualinhibition matrices and applies linear convolution instead of thematrix-vector multiplication involved in the traditional LISTAnetwork. Both simulation and ﬁeld test for air target detectionwith radar are carried out to validate the performance ofthe proposed network. For small-scale MHR problems, LISTA-Toeplitz exhibits close or even better recovery accuracy than tradi-tional LISTA, while the former signiﬁcantly reduces the networkcomplexity and requires much less training data. For large-scaleMHR problems, where LISTA is difﬁcult to implement due tothe huge size of the mutual inhibition matrices, our proposedLISTA-Toeplitz still enjoys desirable recovery performance.

Index Terms —Compressed sensing, multidimensional har-monic retrieval, iterative shrinkage thresholding algorithm,learned ISTA, Toeplitz structure.

I. I

NTRODUCTION

Multidimensional harmonic retrieval has been extensivelystudied in the signal processing literature. This problemappears in a wide range of applications such as wirelesscommunication channel estimation [2, 3], beampattern syn-thesis [4], direction-of-arrival (DOA) estimation [5–7] andrange-Doppler estimation [8]. Standard methods for MHRinclude fast Fourier transform (FFT) spectral estimation [9](periodogram), Welch’s method [10] (also called the modiﬁedperiodogram) and subspace based methods[11, 12]. Accordingto the Nyquist criterion, harmonic signals are generally uni-formly sampled at or above the Nyquist rate to avoid aliasingof the spectrum.

This work is supported by the National Natural Science Foundation ofChina (Grant No. 61801258). Part of this paper was presented in part at theIEEE Radar Conference (RadarConf), Boston, MA, USA, April 2019 [1].R. Fu, Y. Liu and T. Huang are with the Department of ElectronicEngineering, Tsinghua University, Beijing, 100084, China (e-mail: [email protected], { yiminliu, huangtianyao } @tsinghua.edu.cn).Yonina C. Eldar is with the Faculty of Math and CS, Weizmann Instituteof Science, Rehovot 7610001, Israel (e-mail: [email protected]). In various applications, it is desirable to minimize the re-quired number out of Nyquist samples needed for multidimen-sional harmonic retrieval (MHR) [13]. Especially for MHRproblems with large dimensions, as the size of measurementsis increasingly associated with the dimension, the demandfor sample reduction becomes more signiﬁcant. Take DOAestimation with antenna arrays as an example, where eachactive antenna transmits and receives signals reﬂected fromone or more moving targets to estimate the direction angles.The smaller amount of samples implies fewer active antennas,lowering the hardware cost [14].Sparse recovery or compressed sensing (CS) [15] has beensuggested to reduce the measurement data size, when thesinusoids are spectrally sparse, namely when there is a smallnumber of harmonics. Particularly, consider a harmonic re-trieval problem, where a time-domain signal, consisting of K distinct complex sinusoids, has M Nyquist samples. Whenit is assumed that the sinusoid frequencies lie precisely ona set of discrete grids, the signal of interest can be sparselyrepresented by a discrete basis. CS suggests that the sparsesignal can be recovered from a random subset with a reducedsize of N = O ( K log M ) out of the M Nyquist samples [13].The recovery is completed by ﬁnding a sparse representationof the time-domain signal over a discrete dictionary matrix Φ ∈ C N × M , of which a grid corresponds to a predeﬁnedgrid. To be speciﬁc, CS formulates MHR as a linear decodingproblem y = Φ x + w , (1)where y ∈ C N is the obtained sub-Nyquist samples, x ∈ C M denotes the sparse spectral representation of the unknownsinusoids, and w ∈ C N is the additive noise vector.A myriad of methods including greedy algorithms andoptimization-based approaches [16–20] have been developedto solve such a CS problem. Among these algorithms, a keyconcern for applications is real-time realization. While greedyalgorithms may lead to a non-optimal solution, optimization-based approaches yield near-optimal estimates in both theoryand practice [18]. One of the well-known ℓ -norm regulariza-tion techniques is iterative shrinkage thresholding algorithm(ISTA) [19], which has been proven to have desirable globalrate of convergence to the optimum solution [20]. An accel-erated method, namely fast ISTA (FISTA), can speed up therate of convergence by adding a momentum term [19]. Thesuccess of ISTA and its variants hinge upon the computationof the proximal operator, which is efﬁcient in a wide range ofapplications [17]. However, it takes many iterations for ISTA or FISTA to reach a sparse representation, which inevitablygives rise to high computational cost thus restricting theapplication of CS.The approximation ability of deep learning motivates toconsider the possibilities of recovering sparse signals at a smallcomputation cost through a neural network. By unfolding theiterations of ISTA, learned ISTA (LISTA) was proposed byGregor and LeCun [21], which has been demonstrated to besuperior to ISTA in convergence speed in both theoreticalanalysis and empirical results [1, 22, 23]. Following this idea,many researchers [22–25] unfold other iterative algorithmssuch as approximate message passing (AMP) and alternatingdirection method of multipliers (ADMM) algorithms, andincorporate them into deep networks, which have alreadyshown their efﬁciency in many sparse recovery problems. Itis common for these reconstruction networks to learn someintended network variables for the model information in thedictionary matrix Φ . However, in some cases like large-scaleCS problems, the dictionary Φ has many columns so thatthere will be a huge amount of network variables to learn.For example, the size of one weight matrix (termed mutualinhibition matrix), which depends on the Gram matrix Φ H Φ ,to learn in LISTA is up to M . It is very difﬁcult to traina network with high-dimensional variables: Not only will ittake much more training time and memory, but also we needto provide a larger training dataset to avoid overﬁtting.To reduce the number of trainable parameters, many con-volutional extensions based on LISTA were proposed forthe applications of optical image super-resolution, denoisingor inpainting [26–28]. These networks exploit the fact thatdictionary matrices in these applications are Toeplitz matricesor concatenations of Toeplitz matrices, which allows one im-posing the Toeplitz structure constraint on the learned matricesin LISTA and replacing the matrix-multiplication operationsin LISTA with convolutions. Since the degrees of freedom(DoF) of a M × M Toeplitz matrix are O ( M ) , much less thanthe counterpart of a general matrix with the same dimension,such convolutional networks reduce the number of variablesto learn signiﬁcantly. Referring to MHR problems wherethe dictionaries have a Fourier instead of Toeplitz structure,these convolutional extensions are not directly applicable.When we implement a heuristic convolutional network termedConvLISTA by directly imposing convolutional prior on thedictionary like [26–28], the recovery performance of such aconvolutional extension turns to be bad.In this paper, we put forward a structured network calledLISTA-Toeplitz for MHR problems, in order to address thehigh-dimensional setting. In MHR, the Gram matrix Φ H Φ (rather than the matrix Φ itself) is a Toeplitz matrix. We recallthat this Gram matrix composes the mutual inhibition matrix inLISTA, whose entries are learned from training data. Tailoredfor MHR problems, we shrink the traditional LISTA byimposing Toeplitz structure constraint on the mutual inhibitionmatrix and replacing the corresponding matrix-multiplicationoperations with convolution ﬁlters, while the rest componentsin LISTA are inherited. This constructs our LISTA-Toeplitz,and is different from previous approaches (e.g., convLISTA)where the Toeplitz structure and convolution operation are related to the dictionary itself. The proposed network yieldsa fast and accurate reconstruction on both synthetic and realdata, and outperforms ConvLISTA.Main contributions of this paper are summarized as follows:1) We exploit the Toeplitz structure in one-dimensional(1D) harmonic retrieve problems, as well as the dou-bly block-Toeplitz structure in two-dimensional (2D)harmonic retrieval problems, and thus propose a newstructured network called LISTA-Toeplitz, which im-poses a Toeplitz structure restriction on the learnedmutual inhibition matrices in LISTA. LISTA-Toeplitzsigniﬁcantly shrinks the number of network variables,simplifying the complexity of the network and makingthe network more trainable.2) We use linear convolution to calculate the multiplicationby the Toeplitz matrices, which further facilitates therealization of networks. Using convolution instead ofmatrix-vector multiplication relieves the storage burden,while some fast algorithms for linear convolution reducecomputation complexity. Besides, some off-the-shelfneural network toolboxes provide efﬁcient convolutionoperators, easing the construction of such a structurednetwork.3) Both simulated and real data validate the effectivenessof our proposed network and demonstrate its betterrecovery performance over the traditional LISTA andits previous convolutional extension, i.e., ConvLISTAnetwork.The rest of this paper is organized as follows. The signalmodel in MHR, the conventional CS solutions, and our moti-vations to use LISTA and LISTA-Toeplitz, are introduced inSection II. In Section III we explore the Toeplitz structure inharmonic retrieval problems and develop the correspondingLISTA-Toeplitz network. To show the effectiveness of theproposed networks, in Section IV we apply our network to1D and 2D harmonic retrieval problems with both syntheticand real data. Section V concludes the paper. Notation : The symbol C represents the set of complexnumbers. Correspondingly, C M and C M × N are the sets ofthe M -dimensional ( M -D) vectors and M × N matrices ofcomplex numbers, respectively. The subscripts [ · ] i and [ · ] i,k areused for the i -th entry of a vector and the i -th row, k -th columnentry of a matrix. We let [ · ] and {·} denote a vector/matrixand a set, respectively. We use a set in subscript to constructa vector/matrix or set, e.g., for a set N := { , , . . . , N − } and vectors x n ∈ C M , n ∈ N , [ x n ] n ∈N and { x n } n ∈N representing the matrix [ x , x , . . . , x N − ] ∈ C M × N andthe set { x , x , . . . , x N − } , respectively. The transpose andHermitian transpose are written as the superscripts ( · ) T and ( · ) H , respectively. For a vector, k · k and k · k q denote the ℓ “norm” and ℓ q norm, q ≥ , respectively. Operators ∗ , ◦ and ⊗ represent linear convolution, element-wise multiplication andKronecker product [29], respectively.II. M OTIVATION AND P ROBLEM F ORMULATION

Here, we introduce the motivation for proposing the LISTA-Toeplitz algorithm, i.e., accelerating the ISTA-based methods for MHR problems. To this aim, we ﬁrst review preliminarieson the typical ISTA approach and its variations, includingFISTA and LISTA, in Subsection II-A. Then in SubsectionII-B, we formulate the MHR signal model and show that theinherent Toeplitz structure in MHR model can be exploitedto improve existing iterative algorithms, yielding the LISTA-Toeplitz algorithm. The ﬂow of LISTA-Toeplitz will be de-tailed in Section III.

A. Preliminaries

In this subsection, we review the procedures of ISTA, FISTAand LISTA, which are designed to solve the CS problem (1).In the framework of CS, the dictionary matrix Φ in (1) isusually under-determined. ISTA methods harness the sparseprior to estimate x by using regularized regression min x f ( Φ x , y ) + λ k x k , (2)where f ( Φ x , y ) = k y − Φ x k , measures the error, and λ is a regularization parameter controlling the sparsity penaltycharacterized with ℓ norm.The standard ISTA iteratively performs Lipschitz gradientdescent with respect to the cost function in (2) [19, 30].Speciﬁcally, the sparse solution in the ( t + 1) -th iteration,denoted x ( t +1) , is pursued by the following recursion: x ( t +1) = S λL (cid:18) L Φ H y + (cid:18) I − L Φ H Φ (cid:19) x ( t ) (cid:19) , (3)where λ is the regularization parameter in (2), L is theLipschitz constant, given by L = λ max ( Φ H Φ ) , and λ max ( · ) represents the maximum eigenvalue of a Hermitian matrix. Theelement-wise soft-threshold operator S is deﬁned as [ S θ ( u )] i = sign ([ u ] i ) ( | [ u ] i | − θ ) + , (4)where sign ( · ) returns the sign of a scalar, ( · ) + means max( · , ,and θ is the threshold.ISTA shows its accuracy in recovering sparse signals. How-ever, it may take thousands of iterations for convergence [18],which motivates development of many variants of ISTA with ahigher convergence rate, including the well-known FISTA andLISTA as introduced below. FISTA described in [19] can beviewed as a Nesterov’s accelerated version of ISTA, whichtakes roughly one order-of-magnitude fewer iterations thanISTA [18].To further accelerate ISTA, Gregor and LeCun proposed aneural network containing only several layers, named LISTA[21]. Each layer is an unfolded version of the ISTA iteration(3) that can be rewritten as x ( t +1) = S θ ( t ) (cid:16) W ( t ) e y + W ( t ) g x ( t ) (cid:17) , (5)where the terms λ/L , L Φ H y and (cid:0) I − L Φ H Φ (cid:1) in (3) arereplaced by θ ( t ) , W ( t ) e ∈ C M × N and W ( t ) g ∈ C M × M ,respectively. The matrices W ( t ) e and W ( t ) g are named the ﬁltermatrix and the mutual inhibition matrix , respectively. Fig. 1illustrates an LISTA network structure with T -layer.Opposed to ISTA and FISTA, where parameters in eachiteration are identical and are calculated analytically or set manually, LISTA treats the tuple ( W ( t ) e , W ( t ) g , θ ( t ) ) in eachlayer, t = 0 , , . . . , T − , as variables to learn from somepredeﬁned training data using the back-propagation algorithm.The network performs better when making the values ofparameters differ in each layer. We may omit the superscript ( t ) in the following notation for simplicity. Numerical resultsshow that LISTA can achieve virtually the same accuracy asthe original ISTA using nearly two orders of magnitude feweriterations [22, 31].Nevertheless, a challenge in LISTA is that there are manyvariables to learn. A large neural network leads to high com-putational burden. Besides, it requires careful tuning of hyper-parameters such as learning rates and initialization values toavoid training problems such as over-ﬁtting [32] and gradientvanishing [33]. In CS problems with large-scale sparse signal x ∈ C M , the mutual inhibition matrix W g of size M × M ,which is much larger than W e thus takes a dominant roleamong all the variables, especially when allowing W g varyingacross the layers. It motivates us to tailor the network to ﬁta speciﬁc problem so that we can impose restrictions on W g in order to reduce the computational burden. For some certainCS problems like MHR, the mutual inhibition matrix naturallyhas a Toeplitz structure, which can be exploited to reducethe dimension of neural networks, as will be detailed in thesubsequent subsection. B. Toeplitz Structure in MHR Problems

Here, we introduce the signal model of MHR problems,associated with many practical applications including 1D, 2DDOA estimation and range-Doppler recovery in radar systems.Then, we reveal that MHR has speciﬁc characteristics of thedictionary matrices, which can help us reduce the dimensionsof the learned variables in LISTA. The Toeplitz structuresof 1D and 2D harmonic retrieval problems are discussed inSubsections II-B1 and II-B2, respectively.

1) Toeplitz structure of 1D harmonic retrieval:

First, weconsider a 1D harmonic retrieval problem, following the signalmodel presented in [13]. Assume that there are K distinctcomplex-value sinusoids, with their frequencies denoted by f k , k = 1 , , . . . , K . The observation taken at i -th time instant canbe reformed into a superposition of these K complex-valuesinusoids, given by y i = K X k =1 a k e j πf k i , (6)where a k denotes the complex amplitude of the k -th sinusoid.We uniformly discretize frequencies into M grids andassume that the K sinusoids are on the grids. Thus we recast(6) in matrix form as below y ⋆ = Ψ x , (7)where x ∈ C M contains only K nonzero elements, corre-sponding to the complex des of the K sinusoids, and Ψ is the M × M discrete Fourier matrix F M whose ( i, m ) -th entryis deﬁned as [ Ψ ] i,m = [ F M ] i,m = e j2 π mM i , i p , m ∈ M . (8) (1) e W (1) θ e W (0) (1) g W y ... e W g W (0) x ( ) T ( ) T ( ) T θ θ Fig. 1. Block diagrams of LISTA network, in which the red broken-line boxes indicate the whole process of one ISTA iteration.

Furthermore, we consider a compressive measurement,where only N entries of y ⋆ are observed with N ≪ M .To store the indices of selected entries from y ⋆ , we deﬁne asubset Ω of cardinality N randomly chosen from the set M .Then we use a row-subsampled matrix R to selects N rows of Ψ corresponding to the elements in Ω , i.e., [ R ] n,m = 1 ,where m is the n -th element of Ω while other entries in the n -th rowbeing zeros. Thus, the sub-sampled observations are denotedby y ∈ C N , i.e., y = Ry ⋆ = R Ψ x , (9)Here, we use Φ = R Ψ to represent the new dictionary matrix,consisting of sub-sampled N rows of the full dictionary Ψ .The Gram matrix of this dictionary matrix is Φ H Φ = F HM R H RF M , which can be formulated as Φ H Φ = M − X m =0 Ω ( m ) φ m φ Hm , (10)where Ω ( m ) is an indicator function which equals 1 when m ∈ Ω and zero otherwise, and φ m is the m -th columnof the Fourier matrix F M . It can be veriﬁed that φ m φ Hm is a Hermitian Toeplitz matrix. As a consequence, the Grammatrix above can be viewed as a sum of N Hermitian Toeplitzmatrices, which is still a Hermitian Toeplitz matrix. Note thatfor a Hermitian Toeplitz matrix, the DoF is M . If we disregardthe Hermitian symmetry, the DoF is slightly larger, M − .In both cases, the DoF are much less than M , that of ageneral matrix without such structure, which can be exploitedto reduce the complexity of networks in LISTA.

2) Doubly-block Toeplitz structure of 2D harmonic re-trieval:

For the 2D harmonic retrieval, the correspondingdictionary matrix is Φ = R ( F M ⊗ F M ) .The Gram matrix can be formulated as Φ H Φ = ( F M ⊗ F M ) H R H R ( F M ⊗ F M ) ( a ) = M − X m =0 M − X m =0 Ω { ( m , m ) } ( φ m ⊗ ψ m ) (cid:0) φ Hm ⊗ ψ Hm (cid:1) ( b ) = M − X m =0 M − X m =0 Ω { ( m , m ) } (cid:0) φ m φ Hm (cid:1) ⊗ (cid:0) ψ m ψ Hm (cid:1) , where φ m and ψ m denote the m -th, m -th column of theFourier matrix F M and F M , respectively. The equation (a)is a consequence of the fact that taking the complex conjugateor transpose before carrying out the Kronecker product yieldsthe same result as doing so afterward, and (b) stems directlyfrom the mixed product property of Kronecker product [29]. Similarly, both A := φ m φ Hm ∈ C M × M and B := ψ m ψ Hm ∈ C M × M are Hermitian Toeplitz matrices. Thus,the Toeplitz matrix A can be represented by a (2 M − -D vector, denoted by [ a l ] l ∈M ∗ , where M ∗ := {− M +1 , − M + 2 , · · · , M − } and we use the asterisk notationin the superscript to distinguish from the set M p . Particularly,the ( i, j ) -th element of A can be denoted as [ A ] i,j = a i − j .According to the deﬁnition of Kronecker product, we have A ⊗ B =  C C − · · · C − M +1 C C . . . ...... . . . . . . C − C M − · · · C C  , where C i = a i B , i ∈ M ∗ .The matrix A ⊗ B has a so-called doubly-block Toeplitzstructure [34, 35], i.e., each block matrix of which is itselfa Toeplitz matrix and also repeated down the diagonals ofthe whole matrix. In the literature, such matrix is also calledtwo-fold block Toeplitz matrix [13] or Toeplitz-block-Toeplitzmatrix [36]. After the sum operation over m and m , thedoubly-block Toeplitz structure is preserved, implying that theGram matrix in the 2D MHR problem is also a doubly-blockToeplitz. Observing the structure of Φ H Φ , we ﬁnd that it isconstructed by two Hermitian Toeplitz matrices, of which theDoFs are M − and M − , respectively. Consequently,the DoF of the Gram matrix Φ H Φ is (2 M − M − ,much less than M = M M , the number of elements in Φ H Φ , indicating the possibility to compress the networks inthe original LISTA.Previous discussions presented in Subsections II-B1 andII-B2 reveal that the Gram matrix Φ H Φ possesses Toeplitzor Toeplitz related structure. Hence, the mutual inhibitionmatrix W g to learn, which corresponds to the (cid:0) I − L Φ H Φ (cid:1) ,is also Toeplitz structured and thus compressible. Inspired bythis phenomenon in MHR problems, we propose a heuristi-cally structured network called LISTA-Toeplitz by imposinga Toeplitz structure on W g in LISTA. Applying such astructure-imposing approach for neural network will beneﬁtperformance by compression and model reduction in spacecomplexity and gain high sample efﬁciency in training process.In the following Section III, we will explain how to build theLISTA-Toeplitz network.III. LISTA-T OEPLITZ N ETWORK D ESIGN

As shown in Section II-B, 1D/2D harmonic retrieval prob-lems naturally possess a Toeplitz/doubly-block Toeplitz struc- ture on the Gram matrix Φ H Φ . In this section, we will imposethe Toeplitz structure restriction on the corresponding variablesof the LISTA network, yielding the design of LISTA-Toeplitznetwork. The motivations of using the Toeplitz structure inbuilding network architecture are multi-fold: 1) Neural net-works with structured weight matrices will obtain model orderreduction in the number of network variables so that it canbe potentially applied to large-scale sparse recovery problems;2) Taking advantage of the model information, it performsmuch more efﬁciently to train such a model-based network andeasier to accomplish the desired result for speciﬁc problems,especially when the amount of training data is limited. Therest of this section will be devoted to detailed introductionsof LISTA-Toeplitz networks for 1D and 2D harmonic retrievalproblems, shown in Subsections III-A and III-B, respectively. A. 1D LISTA-Toeplitz Network

Here, we consider 1D harmonic retrieval problems anddesign the LISTA-Toeplitz network by exploiting the Toeplitzstructure in the mutual inhibition matrix W g ∈ C M × M , where M is the number of 1D grid points.Since a M dimensional Toeplitz matrix can be representedby a M − dimensional vector, we denote such vector by h := [ h m ] Tm ∈M ∗ ∈ C M − . Consequently, the matrix W g is expressed by [ W g ] i,k = h i − k , i, k ∈ M , shown in thefollowing equation W g =  h h − h − · · · h − M +1 h h h − . . . ... h . . . . . . . . . h − ... · · · . . . . . . h − h M − · · · h h h  . (11)Here, we disregard the Hermitian structure of W g , to achievea compromise between the expressive power and compactnessof the network architecture.Based on the Toeplitz structure of W g , the multiplicationoperation W g x in (5) can be expressed by a linear con-volution between two vectors and realized by some off-the-shelf toolboxes [37, 38]. To see this, note that the linearconvolution between h and x ∈ C M , denoted h ∗ x , yields a M dimensional vector with entries given by [ h ∗ x ] i = M − X k =0 h i − k [ x ] k , i ∈ M . (12)The right hand side in (12) also equals to [ W g x ] i , according todeﬁnition of W g in (11) and the rule of matrix multiplication.Hence, we have W g x = h ∗ x . (13)Instituting (13) into (5) implies a proximal mapping usinglinear convolution operation, given by x ( t +1) = S θ ( t ) (cid:16) W e y + h ∗ x ( t ) (cid:17) . (14)Based on (14), we replace each layer of the standard LISTAwith a structured network, which yields the 1D version of pro- posed LISTA-Toeplitz. The whole framework the frameworkof our proposed LISTA-Toeplitz network is given in Fig. 2.Comparing Figs. 1 and 2, we note that the differenceof the conventional LISTA and LISTA-Toeplitz lies in therealization of multiplication W g x . The latter uses a dimension-reduced vector h to replace the large matrix W g , and applieslinear convolution (13). The reduction in space complexityand computational complexity by using structured matrices aresigniﬁcant. The advantages of LISTA-Toeplitz are discussed indetail as follows.1) For the space complexity, the memory demand of thenetwork is reduced, which also contributes to lower the costand power consumption of the networks [39]. While the1D LISTA needs to update the weight matrix W g of size M , our proposed 1D LISTA-Toeplitz network only needsto learn M − elements for W g . Thus, the space com-plexity of our LISTA-Toeplitz network is O ( M ) . This is asigniﬁcant advantage compared to the original LISTA networkwhich requires O ( M ) parameters. Especially for large-scaleharmonic retrieval problems, our proposed LISTA-Toeplitznetwork decreases memory requirements by a factor of M ,which greatly relieves the storage burden.2) Resorting to linear convolution operation also reducesthe computational burden. For computation efﬁciency, wecan use Discrete Fourier Transform (DFT) to speed up thecomputation process of linear convolution. Particularly, thelinear convolution in (14) can be computed in the Fourierdomain, given by x ( k +1) = S θ ( k ) (cid:16) W e y + F − (cid:16) F ( h ) ◦ F ( x ( k ) ) (cid:17)(cid:17) , (15)where ◦ denotes element-wise multiplication and F denotesthe DFT operator while F − is the inverse DFT (IDFT)operator. DFT and IDFT operators can be efﬁciently computedwith Fast Fourier Transforms [40, 41], and the time complexityis O ( M log M ) in the term of required times of complex-valuemultiplications. Therefore, the proposed 1D LISTA-Toeplitznetwork has time complexity O ( M log M ) in each layer, moreefﬁcient than the traditional LISTA enjoying the counterpartof O ( M ) .3) Because of the huge reduction in the network variables,the proposed structured network is more trainable for large-scale problems and performs better on limited training data.It has been commonly considered that the number of trainingsamples need to be more than roughly ten times the numberof network variables [42–44]. Compared with its unstructuredcounterpart, LISTA-Toeplitz reduces the number of networkvariables, therefore requires less labeled data for training.Consequently, this takes less training time and makes theprocess of training neural networks more efﬁciently.To summarize, different from other network compressionapproaches such as Principal Filter Analysis (PFA) whichprovides a speciﬁc or heuristic compression factor [45], theToeplitz constraint used in the proposed methods enablesmodel order reduction and provides huge computational com-plexity reduction of large-scale problems: The storage require-ment is reduced from O ( M ) to O ( M ) and the computationalcomplexity can be reduced from O ( M ) to O ( M log M ) . (1) e W (1) θ e W (0) (1) h y ... e W (0) x ( ) T ( ) T ( ) T h θ θ Fig. 2. The Block diagram of 1D LISTA-Toeplitz network. The red broken-line boxes indicate each layer of LISTA-Toeplitz network, corresponding to thewhole process of an ISTA iteration. Blue boxes represent convolutional layers (13), which highlight the modiﬁcation of LISTA-Toeplitz over the standardLISTA. TABLE IT

HE ONE - LAYER COMPLEXITY OF THE

LISTA

AND

LISTA-T

OEPLITZNETWORKS . Network Time SpaceLISTA O ( M ) O ( M ) LISTA-Toeplitz O ( M log M ) O ( M ) Table I compares the time and space complexity of LISTA and1D LISTA-Toeplitz network. These beneﬁts are also applicableto higher dimensional harmonic retrieval problems, thoughthe network architecture is slightly different, as shown in thesubsequent subsection.

B. 2D LISTA-Toeplitz Network

In this subsection, we extend the LISTA-Toeplitz networkto 2D harmonic retrieval problems. As shown in Subsec-tion II-B2, the mutual inhibition matrix W g naturally follows adoubly-block Toeplitz structure, which requires a slight changein the network settings of LISTA-Toeplitz. For clarity, wedenote such a network by 2D LISTA-Toeplitz. Recall that inthe 2D harmonic retrieval problems, we use M and M todenote the cardinality of the grid sets in the ﬁrst and seconddimension, respectively, and M = M M . The sparse vector x has a block structure, expressed as x = (cid:2) x Tm (cid:3) Tm ∈M , wherethe sub-vector x m is denoted by x m = [ x m ,m ] Tm ∈M ∈ C M .Following the similar procedure for 1D case, we ﬁrst usea dimension reduced matrix H ∈ C (2 M − × (2 M − torepresent W g ∈ C M M × M M . This is possible, becausethe DoF of W g is (2 M − × (2 M − due to itsToeplitz structure. For convenience, H is expressed as H :=[ h m ,m ] m ∈M ∗ ,m ∈M ∗ . Here we also ignore the Hermitianstructure for making some compromise on the representationalprobability of the LISTA-Toeplitz network.To link between H and W g , we ﬁrst deﬁne M − Toeplitz sub-matrices H m ∈ C M × M constructed from the m -th column of H , i.e., [ h m ,m ] Tm ∈M , where m ∈ M ,given by [ H m ] i,j = h i − j,m .With these Toeplitz matrices, we construct a doubly-blockToeplitz matrix T ( H ) ∈ C M M × M M as T ( H ) :=  H H − · · · H − M +1 H H . . . ...... . . . . . . H − H M − · · · H H  . (16) We can verify that W g = T ( H ) if the elements of H satisfy h s − i,t − j = [ W g ] s +( t − × M ,i +( j − × M , (17)where i, s ∈ M and j, t ∈ M .We then prove that the result of W g x can be calculatedby linear convolution operation as follows. Since the vector x has a nested structure, we deﬁne X := [ x x · · · x M ] ∈ C M × M , which satisﬁes that x = vec ( X ) . The linearconvolution operation between two matrices is deﬁned as [ H ∗ X ] s,t = M − X j =0 M − X i =0 h s − i,t − j x i,j , (18)where s ∈ M , t ∈ M . According to the deﬁnition of matrixmultiplication, we have [ W g x ] l = M M X k =1 [ W g ] l,k [ x ] k = M X j =1 M X i =1 [ W g ] l,i + jM x i,j . (19)Comparing (18) and (19), we conclude that W g x = vec ( H ∗ X ) by setting l = s + tM in the subscript in (19). Thus, theunfolded ISTA iteration (5) is rewritten as x ( t +1) = S θ ( t ) (cid:16) W e y + vec (cid:16) H ∗ X ( t ) (cid:17)(cid:17) , (20)where X ( t ) ∈ C M × M is obtained by reshaping x ( t ) ∈ C M M , i.e., X ( t ) = h x ( t )1 x ( t )2 · · · x ( t ) M i . Based on (21),we illustrate the 2D LISTA-Toeplitz network in Fig. 3.Same with the 1D LISTA-Toeplitz, the 2D version signiﬁ-cantly reduces the number of variables in the weight matrixcompared with its counterpart of the standard LISTA, from O (cid:0) M M (cid:1) to O ( M M ) . For a typical case where M ∗ = M = M , the reduction in spacial complexity, from O (cid:0) M ∗ (cid:1) to O (cid:0) M ∗ (cid:1) , is more notable than the 1D case. In higherdimension scenarios, with M ∗ = M = M = · · · = M P ,the LISTA-Toeplitz network can be simply extended, and thebeneﬁts of exploiting the Toeplitz structure becomes moredominant: The memory demand for storing the variablesto learn will be decreased from O (cid:0) M P ∗ (cid:1) to O (cid:0) M P ∗ (cid:1) . Asdiscussed in Subsection III-A, the compressed network alsocontributed to making the network more trainable, loweringthe cost and power consumption.Note that this convolutional prior is imposed on the Grammatrix of the dictionary matrix rather than the dictionary itself. (1) e W (1) θ e W (0) (1) y ... e W (0) x ( ) T ( ) T ( ) T θ θ H H

Fig. 3. The Block diagram of 2D LISTA-Toeplitz network. The red broken-line boxes indicate each layer of LISTA-Toeplitz network, corresponding tothe whole process of an ISTA iteration. Blue boxes represent linear convolutional layers (13), which highlight the modiﬁcation of LISTA-Toeplitz over thestandard LISTA.

That is why we replace matrix multiplication with linear convo-lution just for the mutual inhibition part W g x ( t ) . For compar-ison, we also try to making the linear multiplication by W e tobe convolution, although there is no equivalence between W e y and h e ∗ y where W e ∈ C M × N and h e ∈ C ( M + N − × . Weachieve the architectures of this convolutional network (namedConvLISTA) as follows. x ( t +1) = (cid:26) S θ ( t ) (cid:0) h e ∗ y + h ∗ x ( t ) (cid:1) , for 1D MHR S θ ( t ) (cid:0) h e ∗ y + vec (cid:0) H ∗ X ( t ) (cid:1)(cid:1) , for 2D MHR (21)In MHR problems, as the dictionary matrix does nothave shift-invariant structure, the recovery performance ofConvLISTA, where both linear blocks are implemented byconvolutions, is much worse than the LISTA and our LISTA-Toeplitz networks. The numerical results of these networks areshown in next section.IV. P RACTICAL A PPLICATIONS AND N UMERICAL R ESULTS

In this section, we perform numerical experiments involv-ing 1D and 2D harmonic retrieval problems to demonstratethe effectiveness of the proposed LISTA-Toeplitz network incomparison with the original LISTA network, ConvLISTA andconventional iterative algorithms including ISTA and FISTA.As presented in the sequel, we showcase our performance byusing both synthetic data and real data . In Subsections IV-Aand IV-B, we generate synthetic data for both 1D and 2Dharmonic retrieval problems, respectively. The reconstructionquality of these methods is measured in terms of normalizedmean squared error (NMSE) and hit rate. Here, NMSE meansthe mean squared error of the recovered signal ˆ x normalizedby the power of ground truth x , given by NMSE = E k x − b x k / k x k . (22)Hit rate is deﬁned as the percentage of the number of cor-rectly recovered components to the total number of nonzeroentries. In Subsection IV-C, we apply our proposed 2D LISTA-Toeplitz network to real radar measurements, which veriﬁes itsperformance in real data.The tested algorithms are set as follows. Since ISTA andFISTA usually take about 1000 and 100 iterations to con-verge, respectively, while LISTA, ConvLISTA and our LISTA-Toeplitz only need 10 layers to converge, in the followingexperiments, we set the numbers of iterations/layers for ISTA, Codes for reproducing these experiments are available athttps://github.com/....

FISTA, LISTA, ConvLISTA and LISTA-Toeplitz as around1000, 100, 10 and 10, respectively, which will be speciﬁedin the following sections. For LISTA, ConvLISTA and ourLISTA-Toeplitz methods, the performance relies on not onlythe network structure but also the amount of training data,which will be speciﬁed in the following experiments individu-ally. Following the practical criterion in [42–44], we generatetraining data as many as roughly ten times the number ofunknown parameters in the network. For each training pair,we generate sparse signals x as i.i.d. Bernoulli-Gaussianwith the ﬁxed sparsity K , where the sparse support followsa Bernoulli distribution and the complex amplitude followsa Gaussian distribution. Using a ﬁxed dictionary based onharmonic retrieval model, the observations y are computedaccording to (9) under the noise power σ ,More details on network settings and training proceduresof LISTA-Toeplitz are referred to Appendix, where we intro-duce special modiﬁcations to handle complex-value data inAppendix A, and introduce some training details in AppendixB. For example, we explain how to generate training dataset,how to compute the initial value of network parameters andtrain our LISTA-Toeplitz network. A. 1D Harmonic Retrieval with Synthetic Data

To show the effectiveness of the proposed 1D LISTA-Toeplitz network, we formulate an 1D harmonic retrievalproblem in two different cases: noiseless and noisy syntheticdata.Here, the measurement vector y is generated using thesignal model (9), where N = 64 samples are randomlyselected from M = 512 measurements. The normalizedfrequency [0 , is equally divided into M = 512 grids,making ∆ ω = 2 π/M . The number of signal componentsis K = 5 unless stated specially, and their amplitudes areGaussian distributed. In the training process, the noise powerwas ﬁxed as σ = 0 . . Regarding ConvLISTA and ourproposed LISTA-Toeplitz network, we generate 50,000 trialsfor training and 1000 trials for veriﬁcation and testing. As abenchmark, we also train a ten-layer LISTA network, whichhas around M ≈ × parameters to learn. To avoidoverﬁtting, we generate trails for the traditional LISTAnetwork.Firstly, we consider noiseless cases, and examine the recov-ery performance using a single trial. The frequencies of truesinusoids are set on and off the predeﬁned frequency points, respectively, with their recovery results shown in Figs. 4 and5. In our simulation, we set 5 targets with random amplitudes,marked by triangles. In the on-the-grid case, ConvLISTA canonly recover 2 targets of highest amplitude, while all theother methods (ISTA, FISTA, LISTA and LISTA-Toeplitz) cansuccessfully recover all of them. It is because imposing theToeplitz structure constraint on the linear block W e y leads tobad recovery performance in 1D harmonic retrieval problem.We also ﬁnd that LISTA-Toeplitz has close performance withother methods, successfully recovering the on-the-grid andoff-the-grid sinusoids, while it has a signiﬁcant reductionof network parameter number compared with the traditionalLISTA. We also present simulation results to show that ourLISTA-Toeplitz is able to recover the off-the-grid frequencies.Comparing with Fig. 4, where frequencies are on-the-grid, weshift the true frequencies one quarter of the grid size awayfrom the on-the-grid values in Fig. 5. Particularly, we set theactual frequencies by f k = ( m k + 1 / /M , where m k is thegrid index nearest to the k -th sinusoid, k = 1 , , . . . , K . Asrevealed by previous study [46], the off-the-grid case causesmismatch between the assumed and the actual frequencies,which brings higher sidelobe pedestal problem. However, theproposed LISTA-Toeplitz method successfully recovers theneighboring grid points.Secondly, we consider noisy cases, where we use NMSEand hit rate as performance metrics to evaluate the tested re-covery methods versus noise power. Based on the same sparsesignal with the previous experiment, we change the noisepower, and calculate the NMSE and hit rates by averaging1000 Monte Carlo trials, yielding Figs. 6 and 7, respectively.As shown in Fig. 6, the NMSE of each method is decreasingmonotonically with the noise power. When the noise power isno larger than 10 dB, the ten-layer LISTA and LISTA-Toeplitzachieve higher accuracy than that of the FISTA and ISTA,whose numbers of iterations are set 110 and 1500, respectively,much larger than the counterparts of learning-based methods.We also ﬁnd that the proposed LISTA-Toeplitz has the lowestNMSE when the noise power is less than 0 dB, and has closeperformance with LISTA when the noise power is larger than0 dB. As shown in Fig. 7, under reasonably small noise lever(less than 5 dB), the LISTA-Toeplitz network leads to highesthit rates, which shows its advantage on ﬁnding the frequenciesof harmonics. B. 2D Harmonic Retrieval with Synthetic Data

Simulations were also conducted to show the performanceof LISTA-Toeplitz network for 2D harmonic retrieval prob-lems. Similarly, we consider both noiseless and noisy cases.In the 2D case, simulations are conﬁgured as follows. Thenumbers of full observations in two dimensions are M = 8 and M = 64 , respectively. From these overall M M obser-vations, we randomly select N = 64 samples. The numberof distinct sinusoids is K = 5 . In these simulations, wegenerate 50,000 samples to train the ConvLISTA and ourLISTA-Toeplitz network, while we use samples to trainthe ten-layer LISTA network. For both learned networks, weprepare 1000 samples for the validation and test sets. a m p li t ude normalized frequency Fig. 4. Recovered results of multiple harmonic components in an on-the-gridcase via LISTA and 1D LISTA-Toeplitz network. normalized frequency a m p li t ude Fig. 5. Recovered results of multiple harmonic components in an off-the-gridcase via 1D LISTA-Toeplitz network.

Similarly to the 1D case presented in Subsection IV-A, weﬁrst provide a single noiseless trial to intuitively demonstratethe recovery output of the tested algorithms. When the fre-quencies are all on the grid points, the results are shown inFig. 8. For clarity of the presentation, only components withamplitudes larger than 0.5 are shown in the plot, where we -25 -20 -15 -10 -5 0 5 10 15 20 noise power/dB -20-15-10-50510 N M SE / d B Fig. 6. The NMSEs of 1D harmonic retrieval between four different methods(ISTA, FISTA, LISTA, LISTA-Toeplitz). -25 -20 -15 -10 -5 0 5 10 15 20 noise power/dB h i t r a t e Fig. 7. The hit rates of 1D harmonic retrieval between four different methods(ISTA, FISTA, LISTA, LISTA-Toeplitz). a m p li t ude Fig. 8. Recovered 2D plane of multiple harmonic components in an on-the-grid case via LISTA and 2D LISTA-Toeplitz network. see that all the algorithms expect ConvLISTA work well inreconstructing sparse signals.In our simulation, we set 5 targets with random amplitudes,marked by triangles. In Fig. 8, ConvLISTA can only recover4 targets of highest amplitude, while all the other methodssucceed in recovering all the 5 targets. Thus, to guaranteethe recovery performance in harmonic retrieval problems, it issuggested to only make the block for the mutual inhibition partto be convolutional and remain the others as standard linear.We also consider the case when the frequencies are off thegrid. Here, the frequency values are one quarter of the grid sizeaway from its nearest grid point along the second frequencydimension, i.e., f p,k = ( m k + 1 / /M p , where m k is the gridindex of the k -th sinusoid, p = 2 , k = 1 , , . . . , K . Fig. 9presents the results of an off-the-grid example, and showsthat this four methods successfully ﬁnd the nearest grid pointscorresponding to those off-the-grid ground truth. Particularly,LISTA-Toeplitz network achieves comparable performancewith other methods by learning a smaller number of networkparameters.We then consider noisy scenarios, and evaluate the perfor-mance of tested methods versus noise power in terms of bothNMSE and hit rates, leading to simulation results shown in a m p li t ude Fig. 9. Recovered 2D plane of multiple harmonic components in an off-the-grid case via 2D LISTA-Toeplitz network. -25 -20 -15 -10 -5 0 5 10 15 20 noise power/dB -20-15-10-5051015 N M SE / d B Fig. 10. Simulation results of four methods (ISTA, FISTA, LISTA, LISTA-Toeplitz) on the 2D harmonic retrieval problem in terms of NMSE.

Figs. 10 and 11, respectively. When the noise power is lowerthan − dB, the NMSE of the LISTA-Toeplitz network is − . dB, lower than the other methods. From Figs. 11, the hitrate curves of the four methods are close: The hit rates of ISTAand FISTA are almost the same, slightly lower than the LISTA-Toeplitz network. In comparison to original ISTA and FISTA,LISTA and LISTA-Toeplitz network improve upon ISTA byachieving a similar result using one to two orders of magnitudefewer iterations, while LISTA-Toeplitz improves upon LISTAwith faster training and higher reconstruction quality. C. Experimental data set for Air Target Recovery

To further investigate the performance of the 2D LISTA-Toeplitz network, experimental scenarios were carried out,where we used randomized stepped frequency radar (RSFR)to detect civil aircraft in air and estimate the range-Dopplerparameters of its dominant scatterers. The conﬁguration of theradar is introduced in [47]. Here, we formulate the involvedrange-Doppler reconstruction problem as a 2D harmonic re-trieval problem, and resort to the proposed 2D LISTA-Toeplitznetwork. The range and Doppler domain correspond to twofrequency domains, with M = 64 and M = 16 . Theobservations, the radar echoes of transmitted N = 64 pulses, -25 -20 -15 -10 -5 0 5 10 15 20 noise power/dB h i t r a t e Fig. 11. Simulation results of four methods (ISTA, FISTA, LISTA, LISTA-Toeplitz) on the 2D harmonic retrieval problem in terms of hit rate. are regarded samples of the virtually full M M observations[48].Under such settings, we compare the 2D LISTA-Toeplitznetwork with the conventional ISTA method, disregarding thestandard LISTA network, because the LISTA network has toomany variables (more than parameters) to learn and it ishard to train such a huge LISTA network.The recovered range-Doppler parameters of scatterers areindicated in Fig. 12. Both ISTA and LISTA-Toeplitz recoversome scatterers laid on the grids of velocity m/s, whichaccords with the fact that the aircraft is ﬂying away fromthe radar with a radial relative velocity approximately of m/s. This experiment validates the correctness of the proposedmethod in real applications.V. C ONCLUSION

This paper considered the compressive MHR problem,which estimates frequency components in multidimensionalspectra based on a compressive measurement. We proposed astructured network to solve MHR problems by revealing andexploiting the Toeplitz structure in the Gram matrix of themeasurement matrix. Compared with the traditional LISTAnetwork, the proposed network, namely LISTA-Toeplitz net-work, greatly reduces the dimension of network variables,which consequently reduces the time and space complexityand makes the network easier to train. Other convolutionalextensions such as ConvLISTA which is not designed basedon the speciﬁc signal model will suffer major degradationon recovery performance. We presented both simulated andreal data of 1D/2D harmonic retrieval problems to verify theperformance of the proposed algorithms. In the numericalexamples, our network shows excellent recovery performancein terms of both NMSE and hit rate, better than ConvLISTAand comparable to ISTA, FISTA and LISTA.A

PPENDIX

In this appendix, we illustrate some additional speciﬁcationswhen constructing and training a LISTA-Toeplitz networkfor complex-value applications like harmonic retrieval. InAppendix A, we give a detailed description of the necessary a m p lit ud e × Range: m

Velocity: m/s (a) × a m p lit ud e

24 1605

LISTA-Toeplitz

Range: m

Velocity: m/s (b)Fig. 12. The reconstructed air target using (a) ISTA and (b) LISTA-Toeplitz. extension for our proposed network when applied to complex-value cases. Some details in training process is provided inAppendix B, including dataset generation, parameter initializa-tion and some training strategies.

A. Necessary Extensions for Proposed Structured Network

In Section III, we redesign the original LISTA networkby converting multiplications of Toeplitz matrices to linearconvolutions. Here we further extend it to a complex-valueframework, because most off-the-shelf deep learning tool-boxes, e.g., PyTorch and TensorFlow, are only applicable toreal-value networks. Particularly, we transform the complex-value linear convolution as well as matrix multiplication andnon-linear operations to their real-value counterparts.Different from some existing complex-value network ex-tensions [49, 50], which separately feed a double-size real-valued network with the real and imaginary parts of theinputs, discarding the links between real and imaginary parts,we maintain the structure of the complex-value data andoperations, as discussed in the sequel.The main difﬁculty is on the non-linear operator of thecomplex numbers in each layer, i.e., the soft-threshold operatordeﬁned in (4), which can be rewritten as S θ ([ x ] i ) = [ x ] i (cid:18) − θ max ( | [ x ] i | , θ ) (cid:19) , (23) + +-+-+ Soft threshold { x } ℑ { x } ℜ { y } ℑ { y } ℜ { x } ℑ { x } ℜ { h } ℑ { h } ℜ { W e } ℑ { W (cid:0) } ℜ θθ Fig. 13. Block diagrams of one layer of the LISTA-Toeplitz network appliedto complex-value data. which equals when | [ x ] i | < θ , and yields S θ ([ x ] i ) =[ x ] i (cid:16) − θ | [ x ] i | (cid:17) = sign([ x ] i )( | [ x ] i | − θ ) when | [ x ] i | ≥ θ . Tobe compatible with the real-value deep learning toolboxes, werealize (23) with real-value components. To this end, we ﬁrstcompute the absolute value of x . Then, the real and imaginaryparts of (23) are computed with ℜ{S θ ([ x ] i ) } = ℜ{ [ x ] i } (cid:18) − θ max ( | [ x ] i | , θ ) (cid:19) , ℑ{S θ ([ x ] i ) } = ℑ{ [ x ] i } (cid:18) − θ max ( | [ x ] i | , θ ) (cid:19) , (24)respectively, where ℜ{·} and

ℑ{·} denote the real and imagi-nary parts of a complex argument, respectively.For the linear matrix multiplication and convolution opera-tor, we implement their complex-value counterpart operatorsthrough two cross coupling real-value channels, e.g., for 2Dlinear convolution operator, H ∗ X = ( ℜ{ H } ∗ ℜ{ X } −ℑ{ H } ∗ ℑ{ X } ) + j( ℜ{ H } ∗ ℑ{ X } + ℑ{ H } ∗ ℜ{ X } ) .With above preparations, we construct each layer of thecomplex-value LISTA-Toeplitz network with real-value com-ponents, as illustrated in Fig. 13, which facilitates the use ofoff-the-shelf framework toolboxes. B. Training Details

In this appendix, we describe some main details whentraining our proposed network, including three main parts:1) generate data set, 2) initialize network parameters, and 3)training strategies, as will be introduced in the sequel.

1) Data Preparation:

To evaluate the performance of theproposed method, we generate three data sets: training, val-idation and test sets. We ﬁrst prepare N tr training sampleswith known labels, i.e., the ground truth of sparse signal x . Inthese training samples, we denote by { y i } N tr i =1 and { x i } N tr i =1 theobservation and label sets, respectively, which obey y i = Φ x i , i = 1 , , · · · , N tr . Denote two matrices Y ∈ C N × N tr and X ∈ C M × N tr by Y = [ y y · · · y N tr ] , X = [ x x · · · x N tr ] . (25)The choice of N tr is related to the initialization of networkparameters, and will be discussed in the subsequent subsection.After training process is ﬁnished, we generate another N vl and N ts samples for validation and testing, respectively.The validation data set is used for determining some hyper-parameters of the network.

2) Network Parameter Initialization:

It is important toinitialize network parameters correctly, because a good startingpoint allows the network to converge rapidly from the verybeginning. In a LISTA-Toeplitz network, each layer has a setof parameters to learn, e.g., { W e , h , θ ( k ) } for a 1D LISTA-Toeplitz network or { W e , H , θ ( t ) } for a 2D counterpart.Following the steps in [23], we initialize the networkparameter W e ∈ C M × N by generating a coarse estimate ofthe dictionary matrix from training pairs, which is computedas b Φ = Y ( X H X ) − X H . (26)Then, the learned matrix W e is initialed as W e = L b Φ T ,where L = λ max ( b Φ H b Φ ) .The initial values of W g (for LISTA), h or H (for 1D or2D LISTA-Toeplitz, respectively) are chosen as zeros in theexperiments. They can also be initialized by random values.The initialization also affects the required number of train-ing data. It has been widely accepted as a practical criterion inmany published prediction modeling studies that the amountof samples need to be more than roughly ten times the degreesof freedom in the model [42–44]. For a T -layer 1D LISTA-Toeplitz network, the numbers of parameters in W g and W e are around O (2 M T ) and O ( M N T ) , respectively. Here, since W e has a good choice of initial value, it can be learntefﬁciently with a small amount of training samples. However, W g is initialed as zeros or random numbers, which requires tobe update from training data. Therefore, in our simulation wegenerate around N tr = O (20 M T ) pairs of sparse signals andcorresponding measurement signals to obtain a well-trainednetwork.

3) Training Protocol:

With these N tr labeled training dataand initialized parameters, we train the network implementedusing TensorFlow with the strategy described below.We choose NMSE of training samples as a quantitativemetric, also referred to as the loss function, which is deﬁnedas NMSE = N tr X i =1 k x i − b x i k / N tr X i =1 k x i k , (27)where b x i is the output signal reconstructed by the proposedneural network with respect to the observation y i .Then we start optimizing the network parameters towardsminimizing the loss function on the training data set usingAdam optimizer. R EFERENCES [1] R. Fu, T. Huang, Y. Liu, and Y. C. Eldar, “CompressedLISTA exploiting Toeplitz structure,” in , 2019, pp. 1–6.[2] X. Liu, N. D. Sidiropoulos, and T. Jiang,

Multidimen-sional Harmonic Retrieval with Applications in MIMOWireless Channel Sounding . John Wiley and Sons Ltd.,2005, ch. 2, pp. 41–75.[3] C. R. Berger, Z. Wang, J. Huang, and S. Zhou, “Applica-tion of compressive sensing to sparse channel estimation,” IEEE Communications Magazine , vol. 48, no. 11, pp.164–174, November 2010.[4] J. Xiong and W. Wang, “Sparse reconstruction-basedbeampattern synthesis for multi-carrier frequency diversearray antenna,” in ,March 2017, pp. 3395–3398.[5] R. D. Balakrishnan and H. M. Kwon, “A new inverseproblem based approach for azimuthal DOA estima-tion,” in

Global Telecommunications Conference, 2003.GLOBECOM ’03. IEEE , 2004, pp. 2187–2191 vol.4.[6] A. Xenaki, P. Gerstoft, and K. Mosegaard, “Compressivebeamforming,”

The Journal of the Acoustical Society ofAmerica , vol. 136, p. 260, 07 2014.[7] N. D. Nion, D.Sidiropoulos, “Tensor algebra and multi-dimensional harmonic retrieval in signal processing formimo radar,”

IEEE Transactions on Signal Processing ,vol. 58, no. 11, pp. 5693–5705, 2010.[8] T. Huang and Y. Liu, “Compressed sensing for a fre-quency agile radar with performance guarantees,” in , 2015, pp.1057–1061.[9] C. Bingham, M. Godfrey, and J. Tukey, “Modern tech-niques of power spectrum estimation,”

IEEE Transac-tions on Audio and Electroacoustics , vol. 15, no. 2, pp.56–66, 1967.[10] P. Welch, “The use of fast fourier transform for theestimation of power spectra: A method based on timeaveraging over short, modiﬁed periodograms,”

IEEETransactions on Audio and Electroacoustics , vol. 15,no. 2, pp. 70–73, 1967.[11] M. Pesavento, “Exploiting multiple shift invariances inharmonic retrieval,” in , 2009,pp. 2101–2104.[12] L. Ma, Y. Jin, and Lei Wang, “A cross four-ordercumulants based harmonic retrieval tls-esprit method inthe estimation of harmonic parameters,” in , 2011, pp. 12–15.[13] Y. Chi and Y. Chen, “Compressive two-dimensionalharmonic retrieval via atomic norm minimization,”

IEEETransactions on Signal Processing , vol. 63, no. 4, pp.1030–1042, Feb 2015.[14] Z. Tan, Y. C. Eldar, and A. Nehorai, “Direction of arrivalestimation using co-prime arrays: A super resolutionviewpoint,”

IEEE Transactions on Signal Processing ,vol. 62, no. 21, pp. 5565–5576, 2014.[15] Y. C. Eldar and G. Kutyniok,

Compressed Sensing:Theory and Applications . Cambridge University Press,2012.[16] H. Fang and H. R. Yang, “Greedy algorithms and com-pressed sensing,”

Acta Automatica Sinica , vol. 37, no. 12,pp. 1413–1421, 2011.[17] N. Antonello, L. Stella, P. Patrinos, and T. van Wa-terschoot, “Proximal gradient algorithms: Applicationsin signal processing,” arXiv preprint arXiv:1803.01621 , 2018.[18] A. Draganic, I. Orovic, and S. Stankovic, “On somecommon compressive sensing recovery algorithms andapplications - review paper,”

Facta universitatis - series:Electronics and Energetics , vol. 30, pp. 477–510, 042017.[19] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,”

SiamJ Imaging Sciences , vol. 2, no. 1, pp. 183–202, 2009.[20] P. L. Combettes and V. R. Wajs, “Signal recovery byproximal forward-backward splitting,”

Multiscale ModelSimul , vol. 4, no. 4, pp. 1168–1200, 2006.[21] K. Gregor and Y. Lecun, “Learning fast approximationsof sparse coding,” in

International Conference on Interna-tional Conference on Machine Learning , 2010, pp. 399–406.[22] M. Borgerding, P. Schniter, and S. Rangan, “AMP-inspired deep networks for sparse linear inverse prob-lems,”

IEEE Transactions on Signal Processing , vol. 65,no. 16, pp. 4293–4308, Aug 2017.[23] M. Borgerding and P. Schniter, “Onsager-corrected deeplearning for sparse linear inverse problems,” in , Dec 2016, pp. 227–231.[24] V. Monga, Y. Li, and Y. Eldar, “Algorithm unrolling:Interpretable, efﬁcient deep learning for signal and imageprocessing,” 12 2019.[25] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-Net forcompressive sensing MRI,” in

Advances in Neural Infor-mation Processing Systems 29 , D. D. Lee, M. Sugiyama,U. V. Luxburg, I. Guyon, and R. Garnett, Eds. CurranAssociates, Inc., 2016, pp. 10–18.[26] H. Sreter and R. Giryes, “Learned convolutional sparsecoding,” in , 2018,pp. 2191–2195.[27] R. J. G. van Sloun, R. Cohen, and Y. C. Eldar, “Deeplearning in ultrasound imaging,”

Proceedings of the IEEE ,vol. 108, no. 1, pp. 11–29, Jan 2020.[28] G. Dardikman-Yoffe and Y. C. Eldar, “LSPARCOM:Deep unfolded super-resolution microscopy,” in

Imagingand Applied Optics Congress . Optical Society ofAmerica, 2020, p. JW5A.5.[29] H. Zhang and F. Ding, “On the kronecker products andtheir applications,”

Journal of Applied Mathematics , vol.2013, 06 2013.[30] K. Wu, Y. Guo, Z. Li, and C. Zhang, “Sparse codingwith gated learned ISTA,” in

International Conferenceon Learning Representations , 2020.[31] R. Giryes, Y. C. Eldar, A. M. Bronstein, and G. Sapiro,“The learned inexact project gradient descent algorithm,”in , April 2018, pp.6767–6771.[32] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, andR. Salakhutdinov, “Dropout: A simple way to preventneural networks from overﬁtting,”

Journal of MachineLearning Research , vol. 15, pp. 1929–1958, 06 2014. [33] S. Ioffe and C. Szegedy, “Batch normalization: Accelerat-ing deep network training by reducing internal covariateshift,” in Proceedings of the 32nd International Confer-ence on International Conference on Machine Learning -Volume 37 , ser. ICML’15. JMLR.org, 2015, p. 448–456.[34] P. A. ROEBUCK and S. BARNETT, “A survey ofToeplitz and related matrices,”

International Journal ofSystems Science , vol. 9, no. 8, pp. 921–934, 1978.[35] M. Wax and T. Kailath, “Efﬁcient inversion of doublyblock Toeplitz matrix,” in

ICASSP ’83. IEEE Inter-national Conference on Acoustics, Speech, and SignalProcessing , vol. 8, 1983, pp. 170–173.[36] M. Oudin and J. P. Delmas, “Asymptotic generalizedeigenvalue distribution of Toeplitz-block-Toeplitz matri-ces,” in , 2008, pp. 3309–3312.[37] G. Nguyen, S. Dlugolinsky, M. Bobak, V. Tran,A. Lopez Garcia, I. Heredia, P. Mal´ık, and L. Hluch´y,“Machine learning and deep learning frameworks andlibraries for large-scale data mining: a survey,”

ArtiﬁcialIntelligence Review , vol. 52, pp. 77–124, 01 2019.[38] T. Hope, Y. S. Resheff, and I. Lieder,

Learning Ten-sorFlow: A Guide to Building Deep Learning Systems ,1st ed. O’Reilly Media, Inc., 2017.[39] T. E. Potok, C. Schuman, S. R. Young, R. M. Patton,and G. Chakma, “A study of complex deep learning net-works on high performance, neuromorphic, and quantumcomputers,”

ACM Journal on Emerging Technologies inComputing Systems , vol. 14, no. 2, 2017.[40] A. V. Oppenheim and R. W. Schafer,

Discrete-TimeSignal Processing , 3rd ed. Upper Saddle River, NJ,USA: Prentice Hall Press, 2009.[41] K. Ye and L.-H. Lim, “Fast structured matrix compu-tations: Tensor rank and cohn–umans method,”

Founda-tions of Computational Mathematics , vol. 18, no. 1, pp.45–95, Feb 2018.[42] X. Zhu, C. Vondrick, C. C. Fowlkes, and D. Ramanan,“Do we need more training data?”

Int. J. Comput. Vision ,vol. 119, no. 1, p. 76–92, Aug. 2016.[43] M. van Smeden, K. G. Moons, J. A. de Groot, G. S.Collins, D. G. Altman, M. J. Eijkemans, and J. B. Re-itsma, “Sample size for binary logistic prediction models:Beyond events per variable criteria,”

Statistical Methodsin Medical Research , vol. 28, no. 8, pp. 2455–2474, 2019,pMID: 29966490.[44] S. Lei, H. Zhang, K. Wang, and Z. Su, “How training dataaffect the accuracy and robustness of neural networksfor image classiﬁcation,” in

International Conference onLearning Representations (ICLR) , 2019.[45] X. Suau, u. Zappella, and N. Apostoloff, “Filter distil-lation for network compression,” in ,2020, pp. 3129–3138.[46] Y. Chi, L. Scharf, A. Pezeshki, and R. Calderbank,“Sensitivity to basis mismatch in compressed sensing,”

Signal Processing, IEEE Transactions on , vol. 59, pp.2182 – 2195, 06 2011.[47] L. Wang, T. Huang, and Y. Liu, “Theoretical analysis for extended target recovery in randomized stepped fre-quency radars,” arXiv preprint arXiv:1908.02929 , 2019.[48] T. Huang, Y. Liu, X. Xu, Y. C. Eldar, and X. Wang,“Analysis of frequency agile radar via compressed sens-ing,”

IEEE Trans. Signal Process. , vol. 66, no. 23, pp.6228–6240, 2018.[49] R. H¨ansch, “Complex-valued multi-layer perceptrons–anapplication to polarimetric SAR data,”

PhotogrammetricEngineering and Remote Sensing , vol. 76, p. 1081, 092010.[50] R. Haensch and O. Hellwich, “Complex-valued convo-lutional neural networks for object detection in polsardata,” in8th European Conference on Synthetic ApertureRadar