[PDF] A Unified Approach to Sparse Signal Processing

Abstract

A unified view of sparse signal processing is presented in tutorial form by bringing together various fields. For each of these fields, various algorithms and techniques, which have been developed to leverage sparsity, are described succinctly. The common benefits of significant reduction in sampling rate and processing manipulations are revealed. The key applications of sparse signal processing are sampling, coding, spectral estimation, array processing, component analysis, and multipath channel estimation. In terms of reconstruction algorithms, linkages are made with random sampling, compressed sensing and rate of innovation. The redundancy introduced by channel coding in finite/real Galois fields is then related to sampling with similar reconstruction algorithms. The methods of Prony, Pisarenko, and MUSIC are next discussed for sparse frequency domain representations. Specifically, the relations of the approach of Prony to an annihilating filter and Error Locator Polynomials in coding are emphasized; the Pisarenko and MUSIC methods are further improvements of the Prony method. Such spectral estimation methods is then related to multi-source location and DOA estimation in array processing. The notions of sparse array beamforming and sparse sensor networks are also introduced. Sparsity in unobservable source signals is also shown to facilitate source separation in SCA; the algorithms developed in this area are also widely used in compressed sensing. Finally, the multipath channel estimation problem is shown to have a sparse formulation; algorithms similar to sampling and coding are used to estimate OFDM channels.

Full PDF

aa r X i v : . [ c s . I T ] F e b A Uniﬁed Approach to Sparse Signal Processing

F. Marvasti,

Senior Member, IEEE,

A. Amini, F. Haddadi, M. Soltanolkotabi,

Student Member, IEEE,

B. H. Khalaj,

Member, IEEE,

A. Aldroubi, S. Holm,

Senior Member, IEEE,

S. Sanei,

Senior Member, IEEE,

J. Chambers,

Senior Member, IEEE, and other invited contributors

Abstract — A uniﬁed view of the area of sparse signal processingis presented in tutorial form by bringing together various ﬁeldsin which the property of sparsity has been successfully exploited.For each of these ﬁelds, various algorithms and techniques,which have been developed to leverage sparsity, are describedsuccinctly. The common potential beneﬁts of signiﬁcant reductionin sampling rate and processing manipulations through sparsesignal processing are revealed.The key application domains of sparse signal processing aresampling, coding, spectral estimation, array processing, compo-nent analysis, and multipath channel estimation. In terms ofthe sampling process and reconstruction algorithms, linkagesare made with random sampling, compressed sensing and rateof innovation. The redundancy introduced by channel codingin ﬁnite and real Galois ﬁelds is then related to over-samplingwith similar reconstruction algorithms. The methods of Prony,Pisarenko, and MUltiple SIgnal Classiﬁcation (MUSIC) are nextshown to be targeted at analyzing signals with sparse frequencydomain representations. Speciﬁcally, the relations of the approachof Prony to an annihilating ﬁlter in rate of innovation and ErrorLocator Polynomials in coding are emphasized; the Pisarenkoand MUSIC methods are further improvements of the Pronymethod. Such narrowband spectral estimation is then related tomulti-source location and direction of arrival estimation in arrayprocessing. The notions of sparse array beamforming and sparsesensor networks are also introduced. Sparsity in unobservablesource signals is also shown to facilitate source separation inSparse Component Analysis (SCA); the algorithms developed inthis area are also widely used in compressed sensing. Finally, thenature of the multipath channel estimation problem is shown tohave a sparse formulation; algorithms similar to sampling andcoding are used to estimate typical multicarrier communicationchannels.

I. I

NTRODUCTION T HERE are many applications in signal processing andcommunication systems where the discrete signals aresparse in some domain such as time, frequency, or space i.e.,most of the samples are zero, or alternatively their transformin another domain (normally called “ frequency coefﬁcients ”) Manuscript received February 05, 2009; revised ??? ??, 2009.F. Marvasti, A. Amini, F. Haddadi, B. Khalaj and M. Soltanolkotabi areafﬁliated with Advanced Communication Research Institute (ACRI), Elec-trical Engineering Department, Sharif University of Technology { marvasti,khalaj } @sharif.edu, { arashsil, farzanhaddadi, msoltan } @ee.sharif.eduA. Aldroubi is afﬁliated with Math Department, Vanderbilt University,[email protected]. The work of Akram Aldroubi was supportedin part by grant nsf-dms 0807464.S. Sanei is afﬁliated with Centre of Digital Signal Processing, School ofEngineering, Cardiff University, Cardiff, UK, [email protected]. Chambers is afﬁliated with Electrical and Electronic Department, Lough-borough University, [email protected]. Holm is afﬁliated with the Department of Informatics, University ofOslo, sverre@iﬁ.uio.noThe invited contributors to speciﬁc sections are listed in the acknowledg-ments. x [ i ] Time Coefficient | D FT { x [ i ] } | DFT Coefficient

Fig. 1. Sparse discrete time signal with its Discrete Fourier Transform (DFT). x [ i ] Time Coefficient | D FT { x [ i ] } | DFT Coefficient

Fig. 2. Sparsity is manifested in the frequency domain. is sparse (see Figs. 1 and 2). There are trivial sparse trans-formations where the sparsity is preserved in both “time” and“frequency” domains; the identity transform matrix and itssorted versions are extreme examples. Wavelet transformationsthat preserve the local characteristics of a sparse signal canbe regarded as “almost” sparse in the “frequency” domain; ingeneral, for sparse signals, the more similar the transformationmatrix is to an identity matrix, the sparser the signal is in thetransform domain. In addition, the transform matrix may besparse; wavelet transformation matrices are such examples.In any of these scenarios, sampling and processing can beoptimized using sparse signal processing. In other words, thesampling rate and the processing manipulations can be signiﬁ-cantly reduced; hence, a combination of data compression andprocessing time reduction can be achieved .Each ﬁeld has developed its own tools, algorithms, andreconstruction methods for sparse signal processing. Very few Sparse Signal Processing, Panel Session organized and chaired by F.Marvasti and lectured by Profs. E. Candes, R. Baraniuk, P. Marziliano, andDr. A. Cichoki, ICASSP 2008, Las Vegas, May 2008.

TABLE IC

OMMON NOTATIONS USED THROUGHOUT THE PAPER . n Length of original vector k Order of sparsity m Length of observed vector x Original vector s Corresponding sparse vector y Observed vector ν Noise vector A Transformation matrix relating s to y k u n × k ℓ p ` P ni =1 | u i | p ´ ( p ) authors have noticed the similarities of these ﬁelds . It isthe intention of this tutorial to describe these methods ineach ﬁeld succinctly and show that these methods can beused in other areas and applications often with appreciableimprovements. Among these ﬁelds are 1- Sampling : randomsampling of bandlimited signals [2], Compressed Sensing (CS)[3], and sampling with ﬁnite rate of innovation [4]; 2-

Coding :Galois [5], [6] and real-ﬁeld error correction codes [7]; 3-

Spectral Estimation [8], [9], [10], [11]; 4-

Array Processing :Multi-Source Location (MSL) and Direction Of Arrival (DOA)estimation [12], [13], sparse array processing [14], and sensornetworks [15]; 5-

Sparse Component Analysis (SCA): blindsource separation [16], [17], [18] and dictionary representation[19], [20], [21]; 6-

Channel Estimation in Orthogonal Fre-quency Division Multiplexing (OFDM) [22], [23], [24]. Thesparsity properties of these ﬁelds are summarized in TableII . The details of each application will be discussed in thenext sections but the common traits will be discussed in thisintroduction.The columns of Table II consist of 0- category, 1- topics, 2-sparsity domain, 3- type of sparsity, 4- information domain,5- type of sampling in information domain, 6- minimumsampling rate, 7- reconstruction method, and 8- applications.The ﬁrst rows (2-7) of column 1 are on sampling techniques.The 8-9 th rows are related to channel coding, row 10 ison spectral estimation and rows 11-13 are related to arrayprocessing. Rows 14-15 correspond to SCA and ﬁnally, row16 covers multicarrier channel estimation, which is a rathernew topic. As shown in column 2 of the table, depending onthe topics, sparsity is deﬁned in time, space, or “frequency”domains. In some applications, the sparsity is deﬁned as thenumber of polynomial coefﬁcients (which in a way could beregarded as “frequency”), the number of sources (which maydepend on location or time sparsity for the signal sources),or the number of “words” (signal bases) in a dictionary.The type of sparsity is shown in column 3; for samplingschemes, it is usually low-pass, band-pass, or multiband [25],while for compressed sensing, and most other applications,it is random. Column 4 represents the information domain,where the number of sparsity, locations, and amplitudes canbe determined by proper sampling (column 5) of this domain. “... problems that arise in interpolation, spectrum analysis, error-controlcoding, and fault-tolerant computing. We believe that the relations betweenthese problems have gone nearly unnoticed so far” [1]. A list of acronyms is given in Table XV at the end of the paper.

The other columns are self explanatory and will be discussedin more details in the following sections.The rows 2-4 of Table II are related to the sampling(uniform or random) of signals that are bandlimited in theFourier domain. Band-limitedness is a special case of sparsitywhere the nonzero coefﬁcients in the frequency domain areconsecutive. A better assumption in the frequency domain isto have random sparsity [26], [27], [28] as shown in row5 and column 3. A generalization of the sparsity in thefrequency domain is sparsity in any transform domain such asDiscrete Cosine Transform (DCT) and wavelets; this conceptis further generalized in compressed sensing (row 6) wheresampling is taken by a linear combination of time domainsamples [3], [29], [30], [31]. Sampling of signals with ﬁniterate of innovation (row 7) is related to piecewise smooth(polynomial based) signals. The position of discontinuouspoints is determined by annihilating ﬁlters that are equivalentto error locator polynomials in error correction codes andProny’s method [28] as discussed in Sections III and IV,respectively.Random errors in a Galois ﬁeld (row 8) and the additiveimpulsive noise in real-ﬁeld error correction codes (row 9)are sparse disturbances that need to be detected and removed.For erasure channels, the impulsive noise can be regardedas the negative of the sample value [32]; thus the missingsampling problem, which can also be regarded as a specialcase of nonuniform sampling, is also a special case of the errorcorrection problem. A subclass of impulsive noise for 2-Dsignals is salt and pepper noise [33]. The information domain,where the sampling process occurs, is called the syndromewhich is usually in a transform domain. In addition, for specialbinary codes such as Low Density Parity Check (LDPC) codes,the parity check matrix is extremely sparse. Sparse matrixinversions and manipulations [34], [35], [36] are utilized inDiscrete Wavelet Transform (DWT) and LDPC decoding.Spectral estimation (row 10) is the dual of error correctioncodes, i.e., the sparsity is in the frequency domain. MSL(row 11) and multi-target detection in radars are similar tospectral estimation since targets act as spatial sparse mono-tones; each target is mapped to a speciﬁc spatial frequencyregarding its line of sight direction relative to the receiver.The techniques developed for this branch of science is quiteunique; with examples such as MUSIC [8], Prony [9], andPisarenko [10]. We shall see that the techniques used in real-ﬁeld error correction codes and SCA can also be used in thisarea.The array processing category (rows 11-13) consists ofthree separate topics. The ﬁrst one covers MSL in radars andsonars and DOA, which are similar to spectral estimation.The techniques developed for this ﬁeld are similar to thespectral estimation methods with emphasis on the MinimumDescription Length (MDL) [37]. The second topic in the arrayprocessing category is related to the design of sparse arrayswhere some of the array elements are missing; the remainingnodes form a nonuniform sparse grid. In this case, one of theoptimization problems is to ﬁnd the sparsest array (number,location and weight of elements) for a given beampattern.This problem has some resemblance to the missing sampling

TABLE IIV

ARIOUS TOPICS AND APPLICATIONS WITH SPARSITY PROPERTIES : THE SPARSITY , WHICH MAY BE IN TIME / SPACE OR “ FREQUENCY ” DOMAINS , CONSISTS OF UNKNOWN SAMPLES / COEFFICIENTS THAT NEED TO BE DETERMINED . T

HE INFORMATION DOMAIN CONSISTS OF KNOWNSAMPLES / COEFFICIENTS IN “ FREQUENCY ” OR TIME / SPACE DOMAIN ( THE COMPLEMENT OF THE SPARSE DOMAIN ). A

LIST OF ACRONYMS IS GIVEN IN T ABLE XV AT THE END OF THE PAPER ; ALSO , A LIST OF COMMON NOTATIONS IS PRESENTED IN T ABLE

I. F

OR DEFINITION OF

ESPRIT

ON ROW ANDCOLUMN SEE THE FOOTNOTE ON PAGE × BW − ﬁltering / A/DInterpolationNonuniform Missing samp- × BW − Iterative Metho- Seismic /3 sampling Frequency Lowpass Time/Space -les/Jitter/Per- (in some cases -ds/Filter banks/ MRI / CT/-iodic/Random even BW) Spline Interp. FM / PPMSampling of Union of Uniform/Jit- Iterative metho- Data4 Sampling multiband Frequency disjoint Time/Space -ter/Periodic/ × P BW -ds/Filter banks/ Compression/signals intervals Random Interpolation RadarRandom Random/ Iterative methods: Missing Samp.5 sampling Frequency Random Time/Space Uniform × P Coeff. Adapt. Thresh. Recovery/RDE / ELP Data Comp.Compressed An arbitrary Random Random Basis pursuit/ Data6 sensing orthonormal Random mapping of mixtures c · k · log( nk ) Matching compressiontransform Time/Space of samples pursuitFinite Time and Random Filtered

Coeff. + 1 + Annihilating ECG/7 rate of polynomial and time Uniform · ( Discont. ﬁlter OCT/innovation Coeff. uniform domain Epochs ) (ELP) UWBGalois Uniform Berlekamp Digital8 ﬁeld Time Random Syndrome or × errors -Massey/Viterbi/ communic-Channel codes random Belief Prop. -tioncoding Real Transform Uniform × Impulsive Adaptive Fault9 ﬁeld Time Random domain or noise thresholding tolerantcodes random RDE / ELP systemSpectral Spectral Time / × Tones MUSIC/ Military/10 estimation estimation Frequency Random Autocor- Uniform − Pisarenko/ Radars-relation Prony / MDLMSL/ Space/ × MDL+ Radars/11 DOA Space Random Autocor- Uniform

Sources MUSIC / Sonar/estimation -relation ESPRIT UltrasoundArray Sparse arr- Random/ Peaks of × Desired Optimiz- Radars/sonar/12 processing -ay beam- Space Missing Space sidelobes/ array -ation: LP/ Ultrasound/-forming elements [Non]Uniform elements SA / GA MSLSensor × BW Similar Seismic/13 networks Space Random Space Uniform of random to row 5 Meteorology/ﬁeld EnvironmentalActive × Active ℓ l / ℓ /14 BSS source/Time Random Time Uniform sources SL0 BiomedicalSCA Linear mix- Uniform / × ℓ l / ℓ / Data15 SDR Dictionary Random -ture of time Random Sparse SL compressionsamples WordsChannel Multipath Frequency Uniform / × Spa- ℓ l / Channel16 estimation channels Time Random or time Nonuniform -rse channel MIMAT equalization/components OFDM problem (which is a special case of real-ﬁeld error correctioncodes), and may be solved by similar techniques. The thirdtopic is on sensor networks (row 13). Distributed samplingand recovery of a physical ﬁeld using an array of sparsesensors is a problem of increasing interest in environmentaland seismic monitoring applications of sensor networks [38].Sensor ﬁelds may be bandlimited or non-bandlimited. Sincethe power consumption is the most restricting issue in sensors,it is vital to use the lowest possible number of sensors (sparsesensor networks) with the minimum processing computation.In Sparse Component Analysis (SCA), the number of ob-servations is much less than the number of sources (signals).However, if the sources are sparse in the time domain, thenthe active sources and their amplitudes can be determined;this is equivalent to error correction codes. Sparse Dictionary Representation (SDR) is another new area where signals arerepresented by the sparsest number of words (signal bases)in a dictionary of ﬁnite number of words; this sparsity mayresult in tremendous amount of data compression. When thedictionary is overcomplete, there are many ways to representthe signal; however, we are interested in the sparsest repre-sentation. Normally, for extraction of statistically independentsources, Independent Component Analysis (ICA ) is used for acomplete set of linear mixtures. In the case of a non-complete(underdetermined) set of linear mixtures, ICA can work if thesources are also sparse; for this special case, ICA analysis issynonymous with SCA.Finally, channel estimation is shown in row 16. In mobile Also known as Karhunen Loeve Transform (KLT). communication systems, multipath reﬂections create a channelthat can be modeled by a sparse FIR ﬁlter. For proper decodingof the incoming data, the channel characteristics should be es-timated before it can be equalized. For this purpose, a trainingsequence is inserted within the main data, which enables thereceiver to obtain the output of the channel by exploiting thistraining sequence. The channel estimation problem becomes adeconvolution problem under noisy environments. The sparsitycriterion of the channel greatly improves the channel estima-tion; this is where the algorithms for extraction of a sparsesignal could be employed [22], [23], [39].When sparsity is random, further signal processing isneeded. In this case there are three items that need to beconsidered. 1- Evaluating the number of sparse coefﬁcients(or samples), 2- ﬁnding the position of sparse coefﬁcients,and 3- determining the values of these coefﬁcients. In someapplications only the ﬁrst two items are needed; e.g., inspectral estimation. However, in almost all the other casesmentioned in Table II, all the three items should be determined.Various types of Linear Programming (LP) and some itera-tive algorithms, such as the Iterative Method with AdaptiveThresholding (IMAT), determine the number, positions andvalues of sparse samples at the same time. On the other hand,the Minimum Description Length (MDL) method, used inDOA/MSL and spectral estimation, determines the numberof sparse source locations or frequencies. In the subsequentsections, we shall describe, in more details, each algorithmfor various areas and applications based on Table II.Finally, it should be mentioned that the signal model foreach topic or application may be deterministic or stochastic.For example, in the sampling category for rows 2-4 and 7,the signal model is typically deterministic although stochasticmodels could also be envisioned [40]. On the other hand forrandom sampling and CS (rows 5-6), the signal model isstochastic although deterministic models may also be envi-sioned [41]. In channel coding and estimation (rows 8-9 and16), the signal model is normally deterministic. For Spectraland DOA estimation (rows 10-11), stochastic models areassumed; while for array beam-forming (row 12), deterministicmodels are used. In sensor networks (row 13), both determin-istic and stochastic signal models are employed. Finally, inSCA (rows 14-15), statistical independence of sources maybe necessary and thus stochastic models are applied.II. S

AMPLING : U

NIFORM , N

ONUNIFORM , M

ISSING ,R ANDOM , C

OMPRESSED S ENSING , R

ATE OF I NNOVATION

Analog signals can be represented by ﬁnite rate discretesamples (uniform, nonuniform, or random) if the signal hassome sort of redundancies such as band-limitedness, ﬁnitepolynomial representation (e.g., periodic signals that are rep-resented by a ﬁnite number of trigonometric polynomials),and nonlinear functions of such redundant functions [42],[43]. The minimum sampling rate is the Nyquist rate foruniform sampling and its generalizations for nonuniform [2]and multiband signals [44]. When a signal is discrete, theequivalent discrete representation in the “frequency” domain(DFT, DCT, DWT, Discrete Hartley Transform (DHT), Dis- crete Sine Transform (DST)) may be sparse, which is thediscrete version of bandlimited or multiband analog signals.For discrete signals, if the nonzero coefﬁcients (“frequency”sparsity) are consecutive, depending on the location of thezeros, they are called lowpass, bandpass, or multiband dis-crete signals; otherwise, the “frequency” sparsity is random.The number of discrete time samples needed to represent afrequency-sparse signal follows the law of algebra, that is,the number of time samples should be equal to the numberof coefﬁcients in the “frequency” domain; this is equivalentto the Nyquist rate- twice the bandwidth (for discrete signalswith DC components it is twice the bandwidth minus one). Thedual of frequency-sparsity is time-sparsity, which can happenin a burst or a random fashion. The number of “frequency”coefﬁcients needed follows the Nyquist criterion. This will befurther discussed in Section III for sparse additive impulsivenoise channels.

A. Sampling of Sparse Signals

If the sparsity locations of a signal are known in a transformdomain, then the number of samples needed in the time (space)domain should be at least equal to the number of sparse co-efﬁcients, i.e., the so called Nyquist rate. However, dependingon the type of sparsity (lowpass, bandpass, or random) and thetype of sampling (uniform, periodic nonuniform, or random),the reconstruction may be unstable and the correspondingreconstruction matrix may be ill-conditioned [45], [46]. Thusin many applications mentioned in Table II, the sampling ratein column 6 is higher than the minimum (Nyquist) rate.When the location of sparsity is not known, by the law ofalgebra, the number of samples needed to specify the sparsityis at least twice the number of sparse coefﬁcients. Again forstability reasons, the actual sampling rate is higher than thisminimum ﬁgure [2], [44]. To guarantee stability, instead ofdirect sampling of the signal, a combination of the samples canbe used. Donoho [30] has recently shown that if we take linearcombinations of the samples, the minimum stable samplingrate is of the order O ( k log( nk )) , where n and k are the framesize and the sparsity number, respectively.

1) Reconstruction Algorithms:

There are many reconstruc-tion algorithms that can be used depending on the sparsity pat-tern, uniform or random sampling, complexity issues, and sen-sitivity to quantization and additive noise [47], [48]. Amongthese methods are: Linear Programming (LP), Lagrange inter-polation [49], time varying method [50], spline interpolation[51], matrix inversion [52], Error Locator Polynomial (ELP)[53], iterative techniques [1], [46], [54], [55], [56], [57], [58],and Iterative Methods with Adaptive Thresholding (IMAT)[26], [32], [59], [60]. In the following we will only concentrateon the last three methods that have proven to be effective andpractical.

Iterative Methods When the Location of Sparsity is Known:

The reconstruction algorithms have to recover the originalsparse signal from the information domain and the type ofsparsity in the transform domain. We know the samples (bothposition and amplitude) and we know the location of sparsityin the transform domain. An iteration between these two

Fig. 3. Block diagram of the iterative reconstruction method. The Mask isan appropriate ﬁlter with coefﬁcients of 1’s and 0’s depending on the type ofsparsity in the original signal. TABLE IIIT

HE ITERATIVE ALGORITHM BASED ON THE BLOCK DIAGRAM OF F IG .31) Convert the input to the i th iteration ( x ( i ) ) into thetransform domain (for instance the Fourier domain); x (0) is normally the initial received signal.2) Multiply the transformed signal ( X ( i ) ) by a mask (forinstance a band-limiting ﬁlter).3) Take the inverse transform of the result in step 2 toget r ( i ) .4) Set the new result as: x ( i +1) = x (0) + x ( i ) − r ( i ) .5) Repeat for a given number of iterations.6) Stop when k x ( i +1) − x ( i ) k ℓ < ǫ . domains (Fig. 3 and Table III) or consecutive Projections OntoConvex Sets (POCS) should yield the original signal [45], [54],[55], [58], [61], [62], [63], [64].In case of the usual assumption that the sparsity is in the“frequency” domain and for the uniform sampling case oflowpass signals, one projection (bandlimiting in the frequencydomain) sufﬁces. However, if the frequency sparsity is random,the time samples are nonuniform, or the “frequency” domain isdeﬁned in a domain other than the DFT, then we need severaliterations to have a good replica of the original signal. Ingeneral, this iterative method converges if the “Nyquist” rateis satisﬁed, i.e., the number of samples per block is greaterthan or equal to the number of coefﬁcients. Figure 4 showsthe improvement in dB versus the number of iterations fora random sampling set for a bandpass signal. In this ﬁgure, Iteration Number S NR ( d B ) SimpleChebyshev Acc.

Conjugate Gradient

Fig. 4. SNR improvement vs. the no. of iterations for a random samplingset at the Nyquist rate (OSR=1) for a bandpass signal. Fig. 5. The Iterative Method with Adaptive Thresholding (IMAT) fordetecting the number, location, and values of sparsity. besides the standard iterative method, accelerated iterationssuch as Chebyshev and Conjugate Gradient methods are alsoused (please see Appendix I for the algorithms) [65].Iterative methods are quite robust against quantizationand additive noise. In fact, we can prove that the iterativemethods approach the pseudo-inverse (least squares) solutionfor a noisy environment; specially, when the matrix is ill-conditioned [44].

Iterative Method with Adaptive Threshold (IMAT) for Un-known Location of Sparsity:

As expected, when sparsity isassumed to be random, further signal processing is needed. Weneed to evaluate the number of sparse coefﬁcients (or samples),the position of sparsity, and the values of the coefﬁcients.The above iterative method cannot work since projection (themasking operation in Fig. 3) onto the “frequency” domain isnot possible without the knowledge of the positions of sparsecoefﬁcients. In this scenario, we need to use the knowledgeof sparsity in some way. The introduction of an adaptivenonlinear threshold in the iterative method can do the trick,thus the name: Iterative Method with Adaptive Threshold(IMAT); the block diagram and the algorithm are depictedin Fig. 5 and Table IV, respectively. The algorithms in [66],[32], [26], [24] are variations of this method. Figure 5 showsthat by alternate projections between information and sparsitydomains (adaptively lowering or raising the threshold levelsin the sparsity domain), the sparse coefﬁcients are graduallypicked up after several iterations. This method can be consid-ered as a modiﬁed version of Matching Pursuit as described inSection VI-D.1; the results are shown in Fig. 6. The samplingrate in the time domain is twice the number of unknown sparsecoefﬁcients. This is called the full capacity rate; this ﬁgureshows that after about 15 iterations, the SNR reaches its peakvalue. In general, the higher the sampling rate relative to thefull capacity, the faster is the convergence rate and the betteris the SNR value.

Matrix Solutions:

When the sparse nonzero locations areknown, matrix approaches can be utilized to determine thevalues of sparse coefﬁcients [52]. Although these approachesare rather straight forward, they may not be robust againstquantization or additive noise.There are other approaches such as Spline interpolation [51],nonlinear/time varying methods [52], Lagrange interpolation

TABLE IVG

ENERIC

IMAT OF F IG . 5 FOR ANY SPARSITY IN THE D ISCRETE T RANSFORM (DT),

WHICH IS TYPICALLY THE F AST F OURIER T RANSFORM (FFT).1) Use the all-zero block as the initial value of the sparsedomain signal ( th iteration)2) Convert the current estimate of the signal in the sparsedomain into the information domain (for instance thetime domain into the Fourier domain)3) Where possible, replace the values with the knownsamples of the signal in the information domain.4) Convert the signal back to the sparse domain.5) Use adaptive hard thresholding to distinguish the orig-inal nonzero samples.6) If neither the maximum number of iterations has pastnor a given stopping condition is fulﬁlled, return to the2nd step. Iteration S NR ( d B ) Fig. 6. SNR vs. the no. of iterations for sparse signal recovery using theIMAT (Table IV). [49] and Error Locator Polynomial (ELP) [67] that will not bediscussed here. These methods work quite well in the absenceof additive noise but they may not be robust in the presenceof noise. However the ELP approach will be discussed in Sec.III-A; variations of this method are called the annihilating ﬁlterin sampling with ﬁnite rate of innovation (Sec. II-C) and theProny’s method in spectral and DOA estimation (Sec. IV-A).

B. Compressed Sensing (CS)

The relatively new topic of Compressed (Compressive)Sensing (CS) which was originally introduced in [30] andfurther extended in [68], [69] and [31] deals with sampling ofsparse signals, in general. The idea is to introduce samplingschemes with low number of required samples which uniquelyrepresent the original sparse signal; these methods have lowercomputational complexities than the traditional techniquesthat employ oversampling and then apply compression. Inother words, compression is achieved exactly at the time ofsampling. Unlike the classical sampling theorem [70], thesignals are assumed to be sparse in an arbitrary transformdomain, not necessarily the Fourier transform. Furthermore,there is no restricting assumption for the location of nonzerocoefﬁcients in the sparsity domain; i.e., the locations shouldnot follow a speciﬁc pattern such as lowpass or multiband structure. Clearly, this assumption includes a more generalclass of signals than the ones previously studied.Since the concept of sparsity in a transform domain is easierto study for discrete signals, most of the research in this ﬁeldis focused along discrete type signals [71]; however, recentresults [72] show that most of the work can be generalized tocontinuous signals that have a sparse representation in a Rieszbasis [73]. We ﬁrst study discrete signals and then brieﬂydiscuss the extension to the continuous case.

1) CS Mathematical Modeling:

Let the vector x ∈ R n be aﬁnite length discrete signal in the time domain which has to beunder-sampled. We assume that x has a sparse representationin a transform domain denoted by a unitary matrix Ψ n × n ; i.e.,we have: x = Ψ · s (2)where s is an n × vector which has at most k non-zeroelements ( k -sparse vectors). In practical cases, s has at most k signiﬁcant elements and the insigniﬁcant elements are setto zero which means s is an almost k -sparse vector. Forexample, x can be the pixels of an image and Ψ can be thecorresponding IDCT matrix. In this case, most of the DCTcoefﬁcients are insigniﬁcant and if they are set to zero, thequality of the image will not degrade signiﬁcantly. In fact, thisis the main concept behind some of the lossy compressionmethods such as JPEG2000. Since the inverse transform on x yields s , the vector s can be used instead of x , whichcan be succinctly represented by the location and values ofthe nonzero elements of s . Although this method efﬁcientlycompresses x , it initially requires all the samples of x toproduce s , which undermines the whole purpose of CS.Now let us assume that instead of samples of x , we take m linear combinations of the samples (called generalizedsamples). If we represent these linear combinations by thematrix Φ m × n and the resultant vector of samples by y m × ,we have: y m × = Φ m × n · x n × = Φ m × n · Ψ n × n · s n × (3)The question is how the matrix Φ and the size m shouldbe chosen to ensure that these samples uniquely represent theoriginal signal x . Apparently, the case of Φ = I n × n for m = n yields a trivial solution (keeping all the samples of x ) thatdoes not employ the sparsity characteristic. We look for Φ matrices with as few rows as possible which can guaranteethe invertibility of the sampling process for the class of sparseinputs.To solve this problem, we introduce probabilistic measures;i.e., instead of exact recovery of signals, we focus on theprobability that a random sparse signal (according to a givenprobability density function) fails to be reconstructed using itsgeneralized samples. If the probability of failure approaches The sequence of vectors { v n } is called a Riesz basis if there exist scalars < A ≤ B < ∞ such that for every absolutely summable sequence ofscalars { a n } , we have the following inequalities: A ` X n | a n | ´ ≤ ‚‚ X n a n v n ‚‚ ℓ ≤ B ` X n | a n | ´ (1) zero, we can state that the sampling scheme (the joint pair of Ψ , Φ ) is successful in recovering x with probability 1.Let us assume that Φ ( m ) represents the ﬁrst m rows ofan invertible matrix Φ n × n . It is apparent that if we use { Φ ( m ) } nm =0 as the sampling matrices for a given sparsitydomain, the failure probabilities for Φ (0) and Φ ( n ) are oneand zero respectively, and as the index m increases, thefailure probability decreases. The important point is that thedecreasing rate of the failure probability is exponential withrespect to mk [74]. Therefore, we expect to reach an almostzero failure probability much earlier than m = n despite thefact that the exact rate highly depends on the mutual behaviorof the two matrices Ψ , Φ . More precisely, it is shown in [74]that: P failure < n · e − cµ Ψ , Φ ( m )) · mk (4)where c is a positive constant and µ ( Ψ , Φ ( m ) ) is the maximumcoherence between the rows of Ψ and Φ ( m ) deﬁned by [75]: µ ( Ψ , Φ ( m ) ) = √ n · max ≤ a ≤ n, ≤ b ≤ m (cid:12)(cid:12) h ψ a , φ b i (cid:12)(cid:12) (5)where ψ a , φ b are the a th and b th rows of the matrices Ψ and Φ , respectively. The above result implies that, the probabilityof reconstruction is close to one for: m ≥ µ ( Ψ , Φ ) | {z } ≥ µ ( Ψ , Φ ( m ) ) k · ln nc (6)The above derivation implies that, the lower the maximumcoherence between the two matrices, the lower is the numberof required samples. Thus, to decrease the number of samples,we should look for matrices Φ with low coherence with Ψ .For this purpose, we use a random Φ . It is shown that thecoherence of a random matrix with i.i.d. Gaussian distributionwith any unitary Ψ is considerably small [30], which makesit a proper candidate for the sampling matrix. Investigation onthe probability distribution has shown that the Gaussian PDFis not the only solution (for example a binary distribution isconsidered in [76]) but may be the simplest to analyze.The inequality shown in (6) can be simpliﬁed in termsof m, n and k ; it can be shown [3] and [71] that a sparsesignal can be reconstructed from its compressed samples witha probability of almost one if: m ≥ c k log( nk ) (7)

2) Reconstruction from Compressed Measurements:

In thissubsection, we would like to consider the reconstruction algo-rithms and stability issues. Essentially, there are three methods:A- Geometric, B- Combinatorial, and C- Information Theo-retic. We would like to brieﬂy discuss these three methods.

Geometric Methods:

The oldest methods for reconstructionfrom compressed sampling are geometric, i.e., ℓ minimizationtechniques for ﬁnding a k -sparse vector s ∈ R n from a set of m = O (cid:0) k log( nk ) (cid:1) measurements ( y i ); see e.g., [30], [74],[77], [78], [79]. Let us assume that we have applied a suitable Φ which guarantees the invertibility of the sampling process.The reconstruction method should be a technique to recovera k -sparse vector s n × from the observed samples y m × =

40 50 100 200 500 1000 2000 4000200500100020005000 N o . o f r equ i r ed m ea s u r e m en t s ( m ) No. of sparse coefficients (k)

DFTDCTc.k.log(n/k)

Fig. 7. Relation between m and k for sparse DFT and DCT signals; theframe size is n = 2 . Φ m × n · Ψ n × n · s n × or possibly y m × = Φ m × n · s n × + ν m × ,where ν denotes the noise vector. Suitability of Φ implies that s n × is the only k -sparse vector that produces the observedsamples; therefore, s n × is also the sparsest solution for y = Φ · Ψ · s . Consequently, s can be found using:minimize k s k ℓ subject to y = Φ · Ψ · s (8)Minimization with respect to ℓ -norm (sparsity) is an NP-complete problem in general. However, it is shown in [80]that minimization with ℓ -norm results in the same vector s for many cases. The interesting part is that the number ofrequired samples to replace ℓ with ℓ -minimization has thesame order of magnitude as the one for the invertibility of thesampling scheme. Hence, s can be derived from (8) using ℓ -minimization. It is worthwhile to mention that replacement of ℓ -norm with ℓ -norm, which is faster to implement, does notnecessarily produce reasonable solutions. However, there aregreedy methods (Matching Pursuit as discussed in Sec. VI onSCA [81], [82]) which iteratively approach the best solutionfaster than ℓ -norm optimization (Basis Pursuit as discussedin Sec. VI on SCA).The technique of IMAT, discussed in random sampling andsimulated in Fig. 6 for sparse DFT signals, can also be usedfor the recovery of a sparse signal in other transform domainssuch as DCT. It should be noted that this sampling processis a special case of CS. Instead of a linear combination ofsamples, the actual random samples are used [27]. Simulationresults shown in Fig. 7 conﬁrm the relation represented in (7).In our simulation results, perfect reconstruction corresponds toan SNR value of dB with a reliability of at least %. It isinteresting to see that (7) closely tracks the DFT sparse signalfor a range of k when c = 1 . When k increases for a given n (i.e., the signal becomes less sparse), the relative number ofneeded measurements to specify the signal is decreased andthe relation (7) is no longer valid.A sufﬁcient condition for these methods to work is that thematrix Φ · Ψ must satisfy the so called Restricted IsometricProperty (RIP) [83], [84], [76]; which will be discussed in thefollowing subsection:

RIP:

It is important that in the presence of noise, thealgorithms produce the best approximate solution to withina precision; in other words, small perturbations in the signal caused by noise result in small distortions in the outputsolution. This characteristic is usually deﬁned as stability.In compressed sensing, the stability of the reconstruction isdetermined by the characteristics of the sampling matrix Φ .We say that the matrix Φ has RIP of order k , when for all k -sparse vectors s , we have [31]: − δ k ≤ k Φ · s k ℓ k s k ℓ ≤ δ k (9)where ≤ δ k < (isometry constant). The RIP is a sufﬁcientcondition that provides us with the maximum and minimumpower of the samples with respect to the input power andensures that none of the k -sparse inputs fall in the null spaceof the sampling matrix. The RIP property essentially statesthat every k columns of the matrix Φ m × n must be almostorthonormal. The explicit construction of a matrix with sucha property is difﬁcult for any given n ≫ m ; however, theproblem has been studied in some cases [41], [85]. Moreover,given such a matrix Φ , ﬁnding s (or alternatively x ) via theminimization problem involves linear programming with n variables and m constraints which can be computationallyexpensive.Among the matrices that satisfy the RIP condition areGaussian random matrices. If Φ is a Gaussian random matrixwith the number of rows satisfying (6), Φ · Ψ is also a Gaussianrandom matrix with the same number of rows and thus itsatisﬁes RIP, which guarantees a stable recovery. Assume thatinstead of Φ · s , we have Φ · s + ν , where ν represents theadditive noise vector. Since Φ · s + ν may not belong to therange space of Φ over k -sparse vectors, the ℓ minimization of(8) may not produce a solution. Thus we employ the followingminimization instead:minimize k s k ℓ subject to k y − Φ · s k ℓ < ǫ (10)where ǫ is the maximum noise power. Let us denote the resultof the above minimization for y = Φ · s + ν by ˆ s , which isalso a k -sparse vector. Now we have: k y − Φ · s k ℓ = k Φ · ( ˆs − s ) + ν k ℓ < ǫ ⇒ k Φ · ( ˆs − s ) k ℓ < k ν k ℓ + ǫ (11)Since both s and ˆ s are k -sparse, ˆ s − s is k -sparse, and byusing the RIP, we get: (1 − δ k ) k ˆ s − s k ℓ < k Φ · (ˆ s − s ) k ℓ < k ν k ℓ + ǫ (12)Or equivalently, k ˆ s − s k ℓ < k ν k ℓ + ǫ − δ k (13)This shows that small perturbations in the input cause smallperturbations in the output (stability). Moreover, as δ k ap-proaches unity, the distortion in the output caused by the inputadditive noise becomes more signiﬁcant; the ideal case is when δ k = 0 . Combinatorial:

Another standard approach for reconstruc-tion of compressed sampling is combinatorial. The samplingmatrix Φ is found using bipartite graphs, and consists ofbinary entries, i.e., entries that are either or . Binary searchmethods are then used to ﬁnd an unknown k -sparse vector s ∈ R n , see e.g., [77], [86], [87], [88], [89], [90], [91], [92]and the references therein. Typically, the binary matrix Φ has m = O ( k log n ) rows, and there exist fast algorithms forﬁnding the solution x from the m measurements (typicallya linear combination). However, the construction of Φ is alsodifﬁcult. Information Theoretic:

A more recent approach is adaptiveand information theoretic [93]. In this method, the signal s ∈ R n is assumed to be an instance of a vector random variable x = ( x , . . . , x n ) t , and the i th row of Φ is constructedusing the value of the previous sample y i − . Tools from thetheory of Huffman coding are used to develop a deterministicconstruction of a sequence of binary sampling vectors φ σ (i.e.,the components of φ σ consist of 0 or 1) in such a way as tominimize the average number of sampling (rows of Φ ) neededto determine a signal.

3) Almost Sparse Signals and Noisy Measurements:

Animportant issue in compressed sampling is the robustness tonoisy measurements. Speciﬁcally, the reconstruction of a k -sparse vector s from noisy y i = h φ i , x i + η i should producea k -sparse vector ˆ s , which is close to s .Another important aspect is the behavior of compressivesampling algorithms on almost sparse signals which are morelikely to occur in applications than exactly k -sparse vectors.For example a k -sparse vector s may be corrupted by noise η ∈ R n , producing the vector ˜ s = s + η . Another example isthe wavelet transform of an image which consists mostly ofsmall coefﬁcients and a few large coefﬁcients. Obviously, anymethod for the sampling and reconstruction of sparse signalsmust also be well adapted to almost sparse signals, i.e., if asampling and reconstruction method is applied to an almost k -sparse signal s , it must produce a k -sparse signal ˆ s thatincludes the k most signiﬁcant coefﬁcients of s (up a to smallerror).

4) CS for Analog Signals:

Recently, there have been effortsto extend the concept of CS to analog signals [72]. The sparsesignals are assumed to be the elements of a Shift-Invariant(SI) space generated by n kernels with period T . For instance,assume { a l ( t ) } nl =1 form a Riesz basis (see the footnote in page6) [73] for L ; the respective generated SI space is SI = (cid:8) n X l =1 X k ∈ Z d l [ k ] · a l ( t − kT ) (cid:12)(cid:12) d l [ k ] ∈ R , n X l =1 X k ∈ Z k d l [ k ] k < ∞ (cid:9) (14)For example, the set of all lowpass signals with bandwidth B form an SI space with a single kernel ( sinc (2 Bt ) ). Simi-larly, the set of multiband signals with the frequency supportdeﬁned as n ﬁxed disjoint intervals of equal length form an n -kernel SI space. The classical sampling theorems suggest thatthe sampling rate of nT is sufﬁcient for the signal recovery forthe space given in (14). Now, a signal x ( t ) ∈ SI is called k -sparse if in itsbasis representation, at most k generators among the total n are active; i.e., d l [ k ] = 0 for l / ∈ { l , l , . . . , l k } where { l , l , . . . , l k } is an arbitrary subset of { , , . . . , n } . Similarto the discrete case, we are looking for sampling schemes thatemploy the inherent sparsity in order to decrease the samplingrate. In [72], it is shown that the rate could be as low as kT for k -sparse signals; this is the theoretical lower bound forinvertible sampling rates. Instead of the sampling matrix Φ , aﬁlter bank is used for sampling; the signal is passed through p ﬁlters ( k ≤ p < n ) followed by samplers, each sampling atrate T . The analog continuous signal x ( t ) is thus transformedto an inﬁnite length vector; generalization of the CS for ﬁnitelength vectors has also been studied in [72]. C. Sampling with Finite Rate of Innovation

The classical sampling theorem states that: x ( t ) = X i ∈ Z x (cid:0) i B (cid:1) · sinc (2 Bt − i ) (15)where B is the bandwidth of x ( t ) with the Nyquist interval T s = B . These uniform samples can be regarded as thedegrees of freedom of the signal; i.e., a lowpass signal withbandwidth B has one degree of freedom in each Nyquistinterval T s . Replacing the sinc function with other kernelsin (15), we can generalize the sparsity (bandlimitedness) inthe Fourier domain to a wider class of signals known as theSI spaces: x ( t ) = X i ∈ Z c i · ϕ (cid:0) tT s − i (cid:1) (16)Similarly, the above signals have one degree of freedom ineach T s period of time (the coefﬁcients c i ). A more generaldeﬁnition for the degree of freedom is introduced in [4] andis named the Rate of Innovation . For a given signal model,if we denote the degree of freedom in the time interval of [ t , t ] by C x ( t , t ) , the local rate of innovation is deﬁned by t − t C x ( t , t ) and the global rate of innovation ( ρ ) is deﬁnedas ρ = lim τ →∞ τ C x ( t − τ, t + τ ) (17)provided that the limit exists; in this case we say that thesignal has ﬁnite rate of innovation [4], [28], [94], [95]. As anexample, for the lowpass signals with bandwidth B we have ρ = 2 B , which is the same as the Nyquist rate. In fact byproper choice of the sampling process, we are extracting theinnovations of the signal. Now the question that arises is thatwhether the uniform sampling theorems can be generalizedto the signals with ﬁnite rate of innovation. The answer ispositive for a class of non-bandlimited signals including theSI spaces. Consider the following signals: x ( t ) = X i ∈ Z R X r =1 c i,r · ϕ r (cid:0) t − t i T s (cid:1) (18)where { ϕ r ( t ) } kr =1 are arbitrary but known functions and { t i } i ∈ Z is a realization of a point process with mean µ . The Fig. 8. Sampling with the kernel ϕ ( t ) free parameters of the above signal model are { c i,r } and { t i } .Therefore, for this class of signals we have ρ = µ ; however,the classical sampling methods cannot reconstruct these kindsof signals with the sampling rate predicted by ρ . There aremany variations for the possible choices of the functions ϕ r ( t ) ;nonetheless, we just describe the simplest version. Let thesignal x ( t ) be a ﬁnite mixture of sparse Dirac functions: x ( t ) = k X i =1 c i · δ ( t − t i ) (19)where { t i } is assumed to be an increasing sequence. We intendto show that the samples generated by proper sampling kernels ϕ ( t ) (shown in Fig. 8) can be used to reconstruct the sparseDirac functions. In fact we choose the kernel ϕ ( t ) to satisfythe so called Strang-Fix condition of order k : ∀ ≤ r ≤ k − , ∃ { α r,i } i ∈ Z : X i ∈ Z α r,i ϕ ( t − i ) = t r (20)The above condition for the Fourier domain becomes:  Φ (cid:0) Ω = 0 (cid:1) = 0Φ ( r ) (cid:0) Ω = 2 πi (cid:1) = 0 , ∀ i = 0 ∈ Z r = 0 , . . . , k − (21)where Φ(Ω) denotes the Fourier transform of ϕ ( t ) , and thesuperscript ( r ) represents the r th derivative. It is also shownthat such functions are of the form ϕ ( t ) = f ( t ) ∗ β k ( t ) ,where β k ( t ) is the B-spline of order k th and f ( t ) is anarbitrary function with nonzero DC frequency [94]. Therefore,the function β k ( t ) is itself among the possible options for thechoice of ϕ ( t ) .We can show that for the sampling kernels which satisfy theStrang-Fix condition (20), the innovations of the signal x ( t ) (19) can be extracted from the samples ( y [ j ] ): y [ j ] = (cid:0) x ( t ) ∗ ϕ ( − tT s ) (cid:1)(cid:12)(cid:12) t = j · T s = k X i =1 c i ϕ ( t i − j ) (22)Thus, τ r , X j ∈ Z α r,j y [ j ]= k X i =1 c i X j ∈ Z α r,j ϕ ( t i − j )= k X i =1 c i t ri (23) In other words, we have ﬁltered the discrete samples ( y [ j ] ) inorder to obtain the values τ r ; (23) shows that these values areonly a function of the innovation parameters (amplitudes c i and time instants t i ). However, the values τ r are nonlinearlyrelated to the time instants and therefore, the innovationscannot be extracted from τ r using linear algebra . However,these nonlinear equations form a well-known system whichwas studied by Prony in the ﬁeld of spectral estimation (seeSec. IV-A) and its discrete version is also employed in bothreal and Galois ﬁeld versions of Reed-Solomon codes (see Sec.III-A). This method which is called the annihilating ﬁlter isas follows:The sequence { τ r } can be viewed as the solution of arecursive equation. In fact if we deﬁne H ( z ) = P ki =0 h i z i = Q ki =1 ( z − t i ) , we will have (see Sec. III-A and AppendicesII, III for the proof of a similar theorem): ∀ r : τ r + k = − k X i =1 h i · τ r + i − (24)In order to ﬁnd the time instants t i , we ﬁnd the polynomial H ( z ) (or the coefﬁcients h i ) and we look for its roots. Arecursive relation for τ r becomes:  τ τ . . . τ k τ τ . . . τ k +1 ... ... . . . ... τ k τ k +1 . . . τ k −  ·  h h ... h k  = −  τ k +1 τ k +2 ... τ k  (25)By solving the above linear system of equations, we obtaincoefﬁcients h i (for a discussion on invertibility of the leftside matrix see [94], [96]) and consequently, by ﬁnding theroot of H ( z ) , the time instants will be revealed. It shouldbe mentioned that the choice of τ , . . . , τ k in (25), canbe replaced with any k consecutive terms of { τ i } . Afterdetermining { t i } , (23) becomes a linear system of equationswith respect to the values { c i } which could be easily solved.This reconstruction method can be used for other typesof signals satisfying (18) such as the signals represented bypiecewise polynomials [94]. An important issue in nonlinearreconstruction is the noise analysis; for the purpose of de-noising and performance under additive noise the reader isencouraged to see [28].III. E RROR C ORRECTION C ODES : G

ALOIS AND R EAL /C OMPLEX F IELDS

The relation between sampling and channel coding is theresult of the fact that over-sampling creates redundancy [97].This redundancy can be used to correct for “sparse” impulsivenoise. Normally, the channel encoding is performed in ﬁniteGalois ﬁelds as opposed to real/complex ﬁelds; The reason isthe simplicity of logic circuit implementation and insensitivityto the pattern of errors. On the other hand, the real/complexﬁeld implementation of error correction codes has stabilityproblems with respect to the pattern of impulsive, quantization Note that the Strang-Fix condition can be also used for an exponentialpolynomial assuming the delta functions are non-uniformly periodic; in thatcase τ r in equation (23) is similar to E , the DFT of the impulses, as deﬁnedin Appendices II and III. and additive noise [46], [53], [67], [98], [99], [100], [101].Nevertheless, such implementation has found applications infault tolerant computer systems [102], [103], [104], [105],[106] and impulsive noise removal from 1-D and 2-D signals[32], [33]. Similar to ﬁnite Galois ﬁelds, real/complex ﬁeldcodes can be implemented in both block and convolutionalfashions.A discrete real-ﬁeld block code is an oversampled signalwith n samples such that, in the transform domain (e.g., DFT),a contiguous number of high frequency components are zero.In general, the zeros do not have to be the high frequencycomponents or contiguous. However, if they are contiguous,the resultant m equations (from the syndrome information do-main) and m unknown erasures form a Vandermonde matrix,which ensures invertibility and consequently erasure recovery.The DFT block codes are thus a special case of Reed-Solomon(RS) codes in the ﬁeld of real/complex numbers [97]. A moregeneral condition to have a Vandermonde matrix is that theindices of the zero frequencies form an arithmetic progressionusing mod ( n ) . This is equivalent to having contiguous zerosin the Sorted DFT (SDFT ) domain [2], [53], [107].Figure 9 represents a convolutional encoder of rate ofﬁnite constraint length [97] and inﬁnite precision per symbol.Fig. 9(a) is a systematic convolutional encoder and resemblesan oversampled signal discussed in Section II if the FIRﬁlter acts as an ideal lowpass ﬁlter. Fig. 9(b) is a non-systematic encoder used in the simulations to be discussedsubsequently. In case of additive impulsive noise, errors couldbe detected based on the side information that there arefrequency gaps in the original oversampled signal (syndrome).In the following subsections, various algorithms for decodingalong with simulation results are given for both block andconvolutional codes. Some of these algorithms can be used inother applications such as spectral and channel estimation. A. Decoding of Block Codes- ELP Method

Iterative reconstruction for an erasure channel is identicalto the missing sampling problem [108] discussed in SectionII-A.1 and therefore, will not be discussed here. Let usassume that we have a ﬁnite discrete signal x orig [ i ] , where i = 1 , . . . , l . The DFT of this sequence yields l complexcoefﬁcients in the frequency domain ( X orig [ j ] , j = 1 , . . . , l ).If we insert p consecutive zeros to get n = l + p samples( X [ j ] , j = 1 , . . . , n ) and take its inverse DFT, we end upwith an oversampled version of the original signal with n complex samples ( x [ i ] , i = 1 , . . . , n ). This oversampled signalis real if Hermitian symmetry (complex conjugate symmetry)is preserved in the frequency domain, e.g., the set Θ of p zeros are centered at n . For erasure channels, the sparsemissing samples are denoted by e [ i m ] = x [ i m ] , where i m ’sdenote the positions of the lost samples; consequently, for i = i m , e [ i ] = 0 . The Fourier transform of e [ i ] (called The kernel of SDFT is exp ( πjn i q ) , where q is relatively prime w.r.t. n ;this is equivalent to a sorted version of DFT coefﬁcients according to a mod rule. We call the set of indices of consecutive zeros syndrome positions anddenote it by Θ ; this set includes the complex conjugate part in a block of n samples. (a)(b) Fig. 9. Convolutional Encoders (a) A real-ﬁeld systematic convolutionalencoder of rate ; h [ i ] ’s are the taps of an FIR ﬁlter; (b) A non-systematicconvolutional encoder of rate , h [ i ] ’s and h [ i ] ’s are the taps of FIRﬁlters. E [ j ] , j = 1 , . . . , n ) is known for the syndrome positions Θ .The remaining values of E [ j ] can be found from the followingrecursion (see Appendix II): E [ r ] = − h k k X t =1 E [ r + t ] h k − t (26)where r is a member of the complement of Θ and theindex additions are in mod ( n ) . After ﬁnding E [ j ] values, thespectrum of the recovered oversampled signal X [ j ] can befound by removing E [ j ] from the received signal (see (97)in Appendix II). Hence the original signal can be recoveredby removing the inserted zeros at the syndrome positions of X [ j ] . The above algorithm, called Error Locator Polynomial(ELP) algorithm, is capable of correcting any combinationof erasures. However, if the erasures are bursty, the abovealgorithm may become unstable. To combat bursty erasures,we can use the SDFT [2], [53], [109], [110], [107] insteadof DFT. The simulation results for block codes with erasureand impulsive noise channels are given in the following twosubsections.

1) Simulation Results for Erasure Channels:

The simula-tion results for the ELP decoding implementation for n = 32 , p = 16 , and k = 16 erasures (a burst of consecutive missingsamples from position to ) are shown in Fig. 10.Since consecutive sample losses represent the worst case[53], [110], the proposed method works better for randomsamples. In practice, the error recovery capability of thistechnique degrades with the increase of the block and/or burstsize due to the accumulation of round-off errors. In order toreduce the round-off error, instead of the DFT, a transformbased on the SDFT, or Sorted DCT (SDCT) can be used [2],[53], [110]. These types of transformations act as an interleaverto break down the bursty erasures.

2) Simulation Results for Random Impulsive Noise Chan-nel:

There are several methods to determine the number, S NR ( d B ) Fig. 10. Recovery of a burst of 16 sample losses. locations and values of the impulsive noise samples, namely:Modiﬁed Berlekamp-Massey for real ﬁelds [111], [112], ELP,IMAT, and CFAR-RDE. The Berlekamp-Massey method forreal numbers is sensitive to noise and will not be discussedhere [111]. The ELP method is described in the next subsec-tion, while the other two methods are discussed below.

ELP Method [96]:

When the number and positions of theimpulsive noise samples are not known, h t in (26) is notknown for any t ; therefore, we assume the maximum possiblenumber of impulsive noise samples per block, i.e., k = ⌊ n − l ⌋ as given in (94) in Appendix II. To solve for h t , we needto know only n − l samples of E in the positions wherezeros are added in the encoding procedure. Once the values of h t are determined from the pseudo-inverse [96], the numberand positions of impulsive noise can be found from (96) inAppendix II. The actual values of the impulsive noise can bedetermined from (26) as in the erasure channel case. For theactual algorithm, please refer to Appendix III. As we are usingthe above method in the ﬁeld of real numbers, exact zerosof { H k } , which are the DFT of { h i } , are rarely observed;consequently, the zeros can be found by thresholding themagnitudes of H k . Alternatively, the magnitudes of H k canbe used as a mask for soft-decision; in this case, thresholdingis not needed. CFAR-RDE and IMAT Methods [32]:

The Constant FalseAlarm Rate with Recursive Detection Estimation (CFAR-RDE) method is similar to the Iteration Method with AdaptiveThresholding (IMAT) with the additional inclusion of theCFAR module to estimate the impulsive noise; CFAR is exten-sively used in radars to detect and remove clutter noise fromdata. In CFAR, we compare the noisy signal with its neighborsand determine if an impulsive (sparse) noise is present or not(using soft decision) . After removing the impulsive noise ina “soft” fashion, we estimate the signal using the iterativemethod for an erasure channel as described in Section II-A.1for random sampling or using the ELP method. The impulsivenoise and signal detection and estimation go through severaliterations in a recursive fashion as shown in Fig. 11. Asthe number of recursions increases, the certainty about thedetection of impulsive noise locations also increases; thus, thesoft decision is designed to act more like the hard decision This has some resemblance to soft decision iteration for turbo codes[101]. Fig. 11. CFAR-RDE method with the use of adaptive soft thresholding and an iterative method for signal reconstruction. S NR ( d B ) Full Error Correction Capacity

RDE stepCFAR and Soft−DecisionSimple thresholdingand Soft−Decision

Fig. 12. Comparison of CFAR-RDE and a simple soft decision RDE forDFT block codes. at the latter parts of the iteration steps, which yields theerror locations. Meanwhile, further iterations are performed toenhance the quality of the original signal since suppression ofthe impulsive noise also suppresses the original signal samplesat the location of the impulsive noise. The improvement ofusing CFAR-RDE over a simple soft decision RDE is shownin Fig. 12.

B. Decoding for Convolutional Codes

The performance of convolutional decoders depends onthe coding rate, the number and values of FIR taps forthe encoders, and the type of the decoder. Let us take theconvolutional encoder of rate of Fig. 9(b) as our platformfor simulations. For simulation results, the taps of the encoderof Fig. 9(b) are h = [1 , , , , , ,h = [16 , , , , , (27)The input signal is taken from a uniform random distributionof size and the simulations are run times and thenaveraged. The following subsections describe the simulationresults for erasure and impulsive noise channels.

1) Decoding for Erasure Channels:

For the erasure chan-nels, we employ two methods as described below:

Iterations with Averaging:

An iterative method to decodefor erasures in the convolutional code of Fig. 9(b) is shown inFig. 13. This ﬁgure is designed for the rate convolutionalencoder. At each stage of decoding, the results of the twobranches are averaged. For the rate and speciﬁc FIR struc-ture, the SNR improvement versus the relative rate of erasureswith respect to the theoretical maximum rate of correctioncapability (full capacity) is shown in Fig. 14. This ﬁgure showsthat the SNR values gradually decrease as the sampling ratereaches the full capacity. Decoding Using the Generator Matrix:

The generator ma-trix of a convolutional encoder of the type depicted in Fig.9(b) with taps given in (27) can be shown to be [5] G =  . . .

16 0 0 0 0 . . . . . . . . . . . . . . . . . . . . . . . . . . .

16 5 4 3 2 . . . . . . . . . . . . . . . ... ... ... ... ... . . .  T (28)An iterative decoding scheme for this matrix representationis similar to that of Fig. 3 except that the operator G consistsof the generator matrix, a mask (erasure operation) and thetranspose of the generator matrix. If the rate of erasure doesnot exceed the encoder full capacity, the matrix form ofthe operator G can be shown to be a nonnegative deﬁnitesquare matrix and therefore its inverse exists [1], [45]. Thus,with a proper choice of the relaxation parameter, the iterationrepresented in Fig. 13 converges to the actual signal.By using the above operator G in our iterative simulations, Fig. 13. Iterative decoding for a rate convolutional encoder as shown in Fig. 9(b). Erasure Percentage S NR ( d B ) Fig. 14. SNR vs. the percentage of erasure for the convolutional decoder ofFig. 13 after iterations. Erasure Percentage S NR ( d B ) Fig. 15. Simulation results of a convolutional decoder, using the iterativemethod with the generator matrix, after CG iterations (see Appendix I);SNR vs. the relative rate of erasures (w.r.t. full capacity) in an erasure channel. better results can be obtained in comparison with the averagingmethod of Fig. 14. Figure 15 shows that the SNR valuesgradually decrease as the rate of erasure reaches its maxi-mum (capacity). This ﬁgure shows that the generator matrixapproach of decoding using the iteration matrix performs muchbetter than the averaging method represented in Figs. 13 and14. However, the complexity of the matrix approach is higherthan the averaging method.

2) Decoding for Impulsive Noise Channels:

Let us consider x and y as the input and the output streams of the encoder,respectively, related to each other through the generator matrix G as y = Gx .Denoting the observation vector at the receiver by ˆ y , wehave ˆ y = y + ν , where ν is the impulsive noise vector.Multiplying ˆ y by the transpose of the parity check matrix H T , we get H T ˆ y = H T ( y + ν ) = H T G | {z } x + H T ν ⇒ H T ˆ y = H T ν (29)Multiplying the resultant by the right pseudo-inverse of the H T , we derive: H ( H T H ) − H T ˆ y = H ( H T H ) − H T ν = ˜ ν (30)Thus by multiplying the received vector by H ( H T H ) − H T , we obtain an approximation of theimpulsive noise. In the IMAT method, we apply the operator H ( H T H ) − H T in the iteration of Fig. 5; the threshold levelis reduced exponentially at each iteration step. The blockdiagram of IMAT in Fig. 5 is modiﬁed as shown in Fig. 16.For simulation results, we use the generator matrix shownin (28); its generator matrix can be calculated from [5] and isgiven below: H =  − .

063 0 0 . . . − .

313 0 . − . . . . − .

25 0 . − .

313 0 . . . . − .

188 0 . − .

25 0 . . . . − .

125 0 . − .

188 0 . . . . − .

063 1 − .

125 0 . . . . − .

063 1 . . . . . . . . . ... ... ... ... . . .  (31)In our simulations, the locations of the impulsive noisesamples are generated randomly and their amplitudes haveGaussian distributions with zero mean and variance equal to , , and times the variance of the encoder output. The Fig. 16. The modiﬁed diagram of IMAT method from Fig. 5.

10 20 30 40 50 60 70 80 90 100050100150200250300350

Percentage of Capacity S NR ( d B ) Rate = 1Rate = 2Rate = 5Rate = 10

Fig. 17. Simulation results by using the IMAT method for detecting thelocation and amplitude of the impulsive noise, λ = 1 . . results are shown in Fig. 17 after iterations. This ﬁgureshows that the high variance impulsive noise has a betterperformance. C. LDPC Codes

Low Density Parity Check (LDPC) codes are anotherexample of the application of sparse matrices in reducingcomplexity [113], [114], [115], [116]. LDPC codes are linearblock codes whose parity check matrix, H is sparse . Theterm “low-density” refers to this fact. A code with a sparse H matrix which has equal number of ’s in each row andequal number of ’s in each column is called a regular LDPCcode. The iterative decoding algorithm that is used to decodea given code, with either a high or low density of ones in the H matrix, is such that the decoding complexity grows almostlinearly with the block length. The coefﬁcient of this increaseis directly dependent on the number of ’s in the H matrix asis explained below.For decoding LDPC codes, we use Tanner graphs [117];these graphs may also be useful in decoding real-ﬁeld codeswhen the parity matrix is sparse and possibly binary. Eachiteration module of the decoding algorithm consists of twosteps. In the ﬁrst step, for each row, certain operations areperformed for each on that row. Similarly for each column,certain operations are performed for each on that column inthe second step. Therefore, the sparser the matrix is, the lowerthe decoding complexity will be.As an example, consider an H matrix of a regular LDPCcode which has and ones in its rows and columns,respectively. Consider another code whose H matrix hassimilarly and ones. The decoding complexity of the formercode is of the latter. Note that in general (assuming that Note that the generator matrix G of an LDPC code is not necessarilysparse and the above discussions are only valid for decoding and not encoding. H matrix has full rank), for a given code rate R , the ratio ofthe number of ones in each column to the number of ones ineach row is a constant which is equal to − R . Therefore, ifthe number of ones in the rows is scaled by a factor of c , thenumber of ones in the columns has to be scaled with the samefactor to keep the code rate ﬁxed. Consequently, the decodingcomplexity is scaled by c .

1) LDPC Decoding Using Linear Programming (LP): The linear programming decoding of block codes wasinitially suggested by Feldman in 2003 [118], [119]. ConsiderMaximum Likelihood (ML) decoding of block codes; thisis an optimization problem where we want to minimize acost function by ﬁnding a codeword that has the minimumEuclidean distance to the received vector from the channel. Inthe logarithm domain, the cost function can be transformedinto a linear function of the coded bits (see [120]). Notethat the codeword has to satisfy a linear set of parity checkequations. Therefore, the problem of decoding a block codecan be considered as an LP optimization, and the computa-tional complexity grows exponentially with the block length.However, a sub-optimal solution to this problem signiﬁcantlyreduces the complexity [120]. This modiﬁcation howevermakes the overall performance and complexity of the methodcomparable to that of the conventional decoding method ofLDPC codes, the Maximum A Posteriori (MAP) decoding ona bit level.The performance of the LP decoding method is usuallyinferior to that of MAP decoding but it has been shown to besuperior to that of the MIN-SUM, which is a simpliﬁcation ofthe original MAP decoding algorithm [121].IV. S

PECTRAL E STIMATION

In this section, we review some of the methods whichare used to evaluate the frequency content of data [8], [9],[10], [11]. In the ﬁeld of signal spectrum estimation, thereare several methods which are appropriate for different typesof signals. Some methods are known to better estimate thespectrum of wideband signals, where some others are proposedfor the extraction of narrow-band components. Since our focusis on sparse signals, it would be reasonable to assume sparsityin the frequency domain, i.e., we assume the signal to be acombination of several sinusoids plus white noise.Conventional methods for spectrum analysis are non-parametric methods in the sense that they do not assume anymodel (statistical or deterministic) for the data, except that itis zero or periodic outside the observation interval. As a wellknown non-parametric method, we can name the periodogram ˆ P per ( f ) that can be computed via the FFT: ˆ P per ( f ) = 1 mT s (cid:12)(cid:12)(cid:12)(cid:12) T s m − X r =0 x r e − j πfrT s (cid:12)(cid:12)(cid:12)(cid:12) (32)where m is the number of observations, T s is the samplinginterval (usually assumed as unity), and x r is the signal.Although non-parametric methods are robust with low compu-tational complexity, they suffer from fundamental limitations. For further discussion on LP and its relation to Basis Pursuit, please referto Sec. VI-D.2 and Table IX. The most important limitation is its resolution; too closelyspaced harmonics cannot be distinguished if the spacing issmaller than the inverse of the observation period.To overcome this resolution problem, parametric methodsare devised. Assuming a statistical model with some unknownparameters, we can get more resolution by estimating theparameters from the data at the cost of more computationalcomplexity. Theoretically, in parametric methods we can re-solve two closely spaced harmonics with limited data lengthif the SNR goes to inﬁnity .In this section, we shall discuss three parametric approachesfor spectral estimation: the Pisarenko, the Prony, and thewell known MUSIC algorithms. The ﬁrst two are mainlyused in spectral estimation, while the MUSIC method thatwas ﬁrst developed for array processing has been extendedto spectral estimation (see the next section). It should benoted that the parametric methods unlike the non-parametricapproaches, require prior knowledge of the model order (thenumber of tones). This can be decided from the data using theMinimum Discription Length (MDL) method discussed in thenext section. A. Prony Method

The Prony method was originally proposed for modelingthe expansion of gases [122]; however, now it is known asa general spectral estimation method. In fact, Prony tried toﬁt a weighted mixture of k damped complex exponentials to k data measurements. The original approach is related tothe noiseless measurements; however, it has been extended toproduce the least squared solutions for noisy measurements.We focus only on the noiseless case here. The signal ismodeled as a weighted mixture of k complex exponentialswith complex amplitudes and frequencies: x r = k X i =1 b i z ri (33)where x r is the noiseless discrete sparse signal consisting of k exponentials with parameters b i = a i e jθ i z i = e j πf i T s (34)where a i , θ i , f i represent the amplitude, phase and the fre-quency ( f i is a complex number in general), respectively. Letus deﬁne the polynomial H ( z ) such that its roots representthe complex exponential functions related to the sparse tones(see section II-C on FRI, (26) on ELP and Appendix II): H ( z ) = k Y i =1 ( z − z i ) = k X i =0 h i z k − i (35)By shifting the index of (33) and multiplying by the parameter h j and summing over j we get: k X j =0 h j x r − j = k X i =1 b i z r − ki k X j =0 h j z k − ji = 0 (36) Similar to array processing to be discussed in the next section, we canresolve any two closely spaced sources conditioned on 1) limited snapshotsand inﬁnite SNR or 2) limited SNR and inﬁnite number of observations, whilethe spatial aperture of the array is kept ﬁnite. TABLE VB

ASIC P RONY A LGORITHM

1) Solve the recursive equation in (36) to evaluate h i ’s.2) Find the roots of the polynomial represented in (35);these roots are the complex exponentials deﬁned as z i in (33).3) Solve (33) to obtain the amplitudes of the exponentials( b i ’s). where r is indexed in the range k + 1 ≤ r ≤ k . Thisformula implies a recursive equation to solve for h i [9]. Afterthe evaluation of h i ’s, the roots of (35) yield the frequencycomponents. Hence the amplitudes of the exponentials can beevaluated from a set of linear equations given in (33). Thebasic Prony algorithm is given in Table V.The Prony method is sensitive to noise, which was alsoobserved in the ELP and the annihilating ﬁlter methodsdiscussed in sections III-A and II-C. There are extended Pronymethods that are better suited for noisy measurements. B. Pisarenko Harmonic Decomposition (PHD)

The PHD method is based on the polynomial of theProny method that utilizes the eigen-decomposition of thedata covariance matrix [11]. Assume k complex tones arepresent in the spectrum of the signal. Then, decompose thecovariance matrix of k + 1 dimensions into a k -dimensionalsignal subspace and a -dimensional noise subspace that areorthogonal to each other. The noise subspace is spanned bythe eigenvector v which satisﬁes Rv = σ v .By including the additive noise, the observations are givenby: y r = x r + ν r (37)where y is the observation sample and ν is a zero-mean noiseterm that satisﬁes E { ν r ν r + i } = σ δ [ i ] . By replacing x r = y r − ν r in the difference equation (36), we get k X i =0 h i y r − i = k X i =0 h i ν r − i (38)which reveals the Auto-Regressive Moving Average (ARMA)structure (order ( k, k ) ) of the observations y r as a randomprocess. To beneﬁt from the tools in linear algebra, let usdeﬁne the following vectors y = [ y r , . . . , y r − k ] T h = [1 , h , . . . , h k ] T ν = [ ν r , . . . , ν r − k ] T (39)Now (38) can be written as y H h = ν H h (40)Multiplying both sides of (40) by y and taking the expectedvalue, we get E { yy H } h = E { y ν H } h . Note that E { yy H } = R yy ,  R yy (0) . . . R ∗ yy ( k ) ... . . . ... R yy ( k ) . . . R yy (0)  (41) TABLE VIPHD A

LGORITHM

1) Given the model order k (number of sinusoids), ﬁndthe autocorrelation matrix of the noisy observationswith dimension k + 1 ( R yy ).2) Find the smallest eigenvalue ( σ ) of R yy and thecorresponding eigenvector ( h ).3) Set the elements of the obtained vector as the coef-ﬁcients of the polynomial in (35). The roots of thispolynomial are the estimated frequencies. E { y ν H } = E { ( x + ν ) ν H } = E { νν H } = σ I (42)We thus have an eigen-equation R yy h = σ h (43)which is the key equation of the Pisarenko method. The eigen-equation of (43) states that the elements of the eigenvector ofthe covariance matrix, corresponding to the smallest eigen-value ( σ ), are the same as the coefﬁcients in the recursiveequation of x [ r ] (coefﬁcients of the ARMA model in (38)).Therefore, by evaluating the roots of the polynomial withcoefﬁcients that are the elements of this vector, we can ﬁndthe tones in the spectrum.Although we started by eigen-decomposition of R yy , weobserved that only one of the eigenvectors is required; the onethat corresponds to the smallest eigenvalue. This eigenvectorcan be found using simple approaches (in contrast to eigen-decomposition). The PHD method is brieﬂy shown in TableVI.A different formulation of the PHD method with linearprogramming approach (refer to Sec. VI-D.2 for descriptionof linear programming) for array processing is studied in[123]. The PHD method is shown to be equivalent to ageometrical projection problem which can be solved using ℓ -norm optimization. Let us convert the autocorrelation matrix R m × m into an m × vector. If R and R are the spa-tial autocorrelation matrices of two uncorrelated sources, theoverall autocorrelation matrix is R + R when both sourcesare active, which is translated to a similar summation of thecorresponding vectors. Although it seems that the vectorizednotation generates a subspace for uncorrelated sources, onlylinear combinations of m × vectors with positive coefﬁcientsare acceptable (resulting in a hyper-cone) since R is positivedeﬁnite. When the number of sources is less than the numberof sensors, the respective vector is restricted to lie on thesurface of the cone. Due to the existence of noise, the observedvector falls within the cone. It is shown [123] that the PHDmethod projects the measured vectors into the surface. Forthe purpose of the projection, ℓ -norm optimization can beemployed instead of the common Pisarenko method. C. MUSIC

MUltiple SIgnal Classiﬁcation (MUSIC), is a method orig-inally devised for high resolution source direction estimationin the context of array processing that will be discussed in the next section [124]. The inherent equivalence of the arrayprocessing and time series analysis paves the way for theemployment of this method in spectral estimation. MUSICcan be understood as a generalization and improvement of thePisarenko method. It is known that in the context of arrayprocessing, MUSIC attains the statistical efﬁciency in thelimit of asymptotically large number of observations [12]. Thisis a valuable property since MUSIC contains only a -D searchwhile efﬁcient ML methods require k -dimensional searches. MUSIC estimates the frequencies by ﬁnding k local maximaof a -D spectrum function while ML exhaustively searchesa k -dimensional parameter space to ﬁnd the global maximumof a k -variable spectrum function.In the PHD method, we construct an autocorrelation matrixof dimension k + 1 under the assumption that its smallesteigenvalue ( σ ) belongs to the noise subspace. Then we usethe Hermitian property of the covariance matrix to concludethat the noise eigenvector should be orthogonal to the signaleigenvectors. In MUSIC, we extend this method, using anoise subspace of dimension greater than one to improve theperformance. We also use some kind of averaging over noiseeigenvectors to obtain a more reliable signal estimator.The data model for the sum of exponentials plus noise canbe written in the matrix form as y = Ab + ν (44)where the length of data is taken as m > k and A ,  . . . e jω . . . e jω k ... . . . ... e j ( m − ω . . . e j ( m − ω k  b , [ b , . . . , b k ] T ν , [ ν , . . . , ν m ] T y , [ y , . . . , y m ] T (45)where ν represents the noise vector. Since the frequenciesare different, A is of rank k and the ﬁrst term in (44) formsa k -dimensional signal subspace, while the second term israndomly distributed in both signal and noise subspaces; i.e.,unlike the ﬁrst term, it is not conﬁned to a subspace of lowerdimension. The correlation matrix of the observations is givenby R = Abb H A H + σ I (46)where the noise is assumed to be white with variance σ .If we decompose R into its eigenvectors, k eigenvaluescorresponding to the k -dimensional subspace of the ﬁrst termof (46) are essentially greater than the remaining m − k values, σ , corresponding to the noise subspace; thus, bysorting the eigenvalues, the noise and signal subspaces canbe determined. Assume ω is an arbitrary frequency and e ( ω ) = [1 , e jω , . . . , e j ( m − ω ] . The MUSIC method estimates Statistical efﬁciency of an estimator means that it is asymptoticallyunbiased and its variance goes to zero. The error function is non-convex and has multiple local minima; thusgradient descent methods cannot be used to achieve a global minimum. O r i g . P r on y P HD M U S I C Normalized Frequency (f / f s ) Fig. 18. Spectral estimation of a sparse mixture of sinusoids using Prony,Pisarenko and MUSIC methods; input SNR is dB and time samplesare used. the spectrum content of the signal at frequency ω by projectingthe vector e ( ω ) into the noise subspace. When the projectedvector is zero, the vector e ( ω ) falls in the signal subspaceand most likely, ω is among the spectral tones. In fact, thefrequency content of the spectrum is inversely proportional tothe ℓ -norm of the projected vector: P MU ( ω ) = 1 e H ( ω ) Π ⊥ e ( ω ) (47) Π ⊥ = m X i = k +1 v i v Hi (48)where v i ’s are eigenvectors of R corresponding to the noisesubspace.The k peaks of P MU ( ω ) are selected as the frequenciesof the sparse signal. The determination of the number offrequencies (model order) in MUSIC is based on the MDL andAkaike Information Criterion (AIC) methods to be discussedin the next section.Figure 18 shows results for various spectral line estimationmethods discussed above. The ﬁrst upper ﬁgure shows theoriginal spectral lines, and the three other ﬁgures belong tothe results of the estimation methods. We can see that theProny method (which is similar to ELP and annihilating ﬁlterof Sec. II-C and (26)) does not yield good results, while thePHD method is better.V. S PARSE A RRAY P ROCESSING

We address three types of array processing: 1- Estimation ofMulti-Source Location (MSL) and Direction of Arrival (DOA),2- Sparse Array Beam-forming and Design, and 3- SparseSensor Networks. The ﬁrst topic is related to estimating thedirection and/or the location of multiple targets; this problemis very similar to the problem of spectral estimation dealtwith in the previous section. The second topic is related tothe design of sparse arrays with some missing and/or randomarray sensors. The last topic, depending on the type of sparsity,is either similar to the second topic or related to CS of sparse

Fig. 19. Uniform linear array with element distance d , element length I ,and a wave arriving from direction ϕ . signal ﬁelds in a network. Below each topic will be brieﬂydescribed. A. Array Processing for MSL and DOA Estimation

Among the important ﬁelds of active research in arrayprocessing are MSL and DOA estimation [124], [125], [126].In such schemes, a passive or active array of sensors is used tolocate the sources of narrow-band signals. Some applicationsmay assume far-ﬁeld sources (e.g. radar signal processing)where the array is only capable of DOA estimation, whileother applications (e.g. biomedical imaging systems) assumenear-ﬁeld sources where the array is capable of locatingthe sources of radiation. A closely related ﬁeld of study isspectral estimation due to similar linear statistical models. Thestochastic sparse signals pass through a partially known lineartransform (e.g., array response or inverse Fourier transform)and are observed in a noisy environment.In the array processing context, the common temporalfrequency of the source signals are known. Spatial samplingof the signal is used to extract direction of the signal (spatialfrequency). As a far-ﬁeld approximation, the signal wavefrontsare assumed to be planar. Consider a signal arriving with angle ϕ as in Fig. 19. Simultaneous sampling of this wavefronton the array will exhibit a phase change of the signal fromsensor to sensor. In this way, discrete samples of a complexexponential are obtained, where its frequency can be translatedto the direction of the signal source. The response of a UniformLinear Array (ULA) to a wavefront impinging on the arrayfrom direction ϕ is a ( ϕ ) = [1 , e j π dλ sin( ϕ ) , . . . , e j ( n − π dλ sin( φ ) ] (49)where d is the inter-element spacing of the array, λ is thewavelength, and n is the number of sensors in the array. Whenmultiple sources are present, the observed vector is the sumof the response(sweep) vectors and noise. This resembles thespectral estimation problem with the difference that samplingof the array elements is not limited in time. In fact, in arrayprocessing an additional degree of freedom (the number ofelements) is present; thus, array processing is more generalthan spectral estimation.Two main ﬁelds in array processing are MSL and DOA forestimating the source locations and directions, respectively;for both purposes, the angle of arrival (azimuth and elevation)should be estimated while for MSL an extra parameter of rangeis also needed. The simplest case is the -D ULA (azimuth-only) for DOA estimation. For the general case of k sources with angles ϕ , . . . , ϕ k with respect to the array, the ULA response is given by thematrix A ( ϕ ) = [ a ( ϕ ) , . . . , a ( ϕ k )] , where the vector ϕ ofDOA’s is deﬁned as ϕ = [ ϕ , . . . , ϕ k ] . In the above notation, A is a matrix of size n × k and a ( ϕ i ) ’s are column vectors.Now, the vector of observations at array elements ( x [ i ] ) isgiven by x [ i ] = As [ i ] + ν [ i ] (50)where the vector s [ i ] represents the multi-source signals and ν [ i ] is the white Gaussian noise. Source signals and additivenoise are assumed to be zero-mean and i.i.d. normal processeswith covariance matrices P and σ I , respectively. With theseassumptions, the observation vector x [ i ] will also follow an n -dimensional zero-mean normal distribution with the covari-ance matrix R = E { xx H } = APA H + σ I (51)In the ﬁeld of DOA estimation, extensive research has been ac-complished in 1) source enumeration, and 2) DOA estimationmethods. Both of the subjects correspond to the determinationof parameters k and ϕ .Although some methods are proposed for simultaneousdetection and estimation of the model statistical character-istics [127], most of the literature is devoted to two-stageapproaches; ﬁrst, the number of active sources is detectedand then their directions are estimated by techniques suchas Estimation of Signal Parameters via Rotational InvarianceTechniques (ESPRIT) [128], [129], [130], [131], [132].Usually the joint detection-estimation methods outperform thetwo-stage approaches with the cost of higher computationalcomplexity. Source enumeration techniques often rely on theeigenvalues of the covariance matrix obtained from the re-ceived data. These eigenvalues are indicators of the energylevel in certain principal components of the data. In the MSLestimation step, similar to the MUSIC method, the signalis decomposed into two orthogonal subspaces of signal andnoise. The signal subspace corresponds to the column spaceof the sweep matrix A ; thus, proper estimation of the signalsubspace leads to estimation of the matrix A and consequently ϕ and range [12], [13]. In the sequel, we take a closer lookat these two steps.

1) Minimum Description Length (MDL):

One of the mostsuccessful methods in array processing for source enumerationis the use of MDL criterion [133]. This technique is verypowerful and outperforms its older versions including AIC[134], [135], [136]. Hence, we conﬁne our discussion to MDLalgorithms.

Preliminaries:

MDL is an optimum method of ﬁnding themodel order and parameters for the most compressed repre-sentation of the observed data. For the purpose of statisticalmodeling, the MAP probability or, the suboptimal criterion The array in ESPRIT is composed of sensor doublets with the samedisplacement. The parameters of the impinging signals can be estimated viaa rotational invariant property of the signal subspace. The complexity andstorage of ESPRIT is less than MUSIC; it is also less vulnerable to arrayimperfections. ESPRIT, unlike MUSIC results in an unbiased DOA estimate;nontheless, MUSIC outperforms ESPRIT, in general. of ML is used; more precisely, conditioned on the observeddata, the maximum probability among the possible options isfound (hypotheses testing) [137]. When the model parametersare not known, the MAP and ML criteria result in the mostcomplex approach; consider ﬁtting a ﬁnite sequence of data toa polynomial of unknown degree [37]: x ( t i ) = P ( t i ) + ν ( t i ) , i = 1 , . . . , m (52)where P ( t ) = a + a t + · · · + a k t k , ν ( t ) is the observedGaussian noise and k is the unknown model order (degreeof the polynomial P ( t ) ) which determines the complexity.Clearly, m − is the maximum required order for uniquedescription of the data ( m observed samples) and the MLcriterion, always selects this maximum value ( ˆ k ML = m − );i.e., the ML method forces the polynomial P ( t ) to passthrough all the points. MDL, on the other hand, yields a sparsersolution ( ˆ k MDL < m − ).Due to the existence of additive noise, it is quite rationalto look for a polynomial with degree less than m which alsotakes the complexity order into account. In MDL, the ideaof how to consider the complexity order is borrowed frominformation theory: Given a speciﬁc statistical distribution, wecan ﬁnd an optimum source coding scheme (e.g., Huffmancoding) which attains the lowest average code length for thesymbols. Furthermore, if p x is the distribution of the source x and q x is another distribution, we have [138]: H ( x ) = E p x ( − log p x ) ≤ E p x ( − log q x ) (53)where H ( x ) is the entropy of the signal. This implies thatthe minimum average code length is obtained only for thecorrect source distribution (model parameters); in other words,the choice of wrong model parameters (distribution function)leads to larger code lengths. When a particular model withthe set of parameters θ is assumed for the data a priori, eachtime a sequence x is received, the parameters ﬁrst shouldbe estimated. The optimum estimation method is usually theML estimator which results in ˆ θ ML . Now, the probabilitydistribution for a received sequence x becomes p ( x | ˆ θ ML ) which according to information theory, requires an averagecode length of − log (cid:0) p ( x | ˆ θ ML ( x )) (cid:1) bits. In addition to thedata, the model parameters should also be encoded whichin turn require κ log( m ) bits where κ is the number ofindependent parameters to be encoded in the model and m isthe number of data points . Thus, the two part MDL selectsthe model that minimizes the whole required code lengthwhich is given by [139]: − log (cid:0) p ( x | ˆ θ ML ) (cid:1) + κ m ) (54)The ﬁrst term is the ML term for data encoding and the secondterm is a penalty function that inhibits the number of freeparameters of the model to get very large. MDL Source Enumeration:

In the source enumeration prob-lem, our model is a multivariate Gaussian random processwith zero mean and covariance of the type shown in (51),where the number of active sources is unknown. In some For a video introduction to these concepts, please refer to http://videolectures.net/icml08 grunwald mdl enumeration methods (other than MDL), the exact form of (51)is employed which results in high computational complexity.In the conventional MDL method, it is assumed that themodel is a covariance matrix with a spherical subspace ofdimension n − k . Suppose the sample covariance matrix is ˆ R = 1 m m X i =1 x i x i H (55)and assume the ordered eigenvalues of ˆ R are ˆ λ ≥ ˆ λ ≥ · · · ≥ ˆ λ n , while the ordered eigenvalues of the exact covariancematrix R are λ ≥ · · · ≥ λ k ≥ λ k +1 = · · · = λ n = σ .The normal distribution function of the received complex data x is [129] p ( x ; R ) = 1 det ( π R ) m e − tr { R − ˆ R } (56)where tr ( . ) stands for the trace operator. The ML estimateof signal eigenvalues in R are ˆ λ i , i = 1 , . . . , k withthe respective eigenvectors { ˆ v i } ki =1 . Since λ k +1 = · · · = λ n = σ , the ML estimate of the noise eigenvalue is ˆ σ ML = n − k P ni = k +1 ˆ λ i and { ˆ v i } ni = k +1 are all noise eigenvectors.Thus, the ML estimate of R given ˆ R is R ML = k X i =1 ˆ λ i ˆ v i ˆ v Hi + ˆ σ ML n X i = k +1 ˆ v i ˆ v Hi (57)In fact, since we know that R has a spherical subspace ofdimension n − k , we correct the observed ˆ R to obtain R ML .Now, we calculate − log (cid:0) p ( x | R ML ) (cid:1) ; from Appendix IV,we have: tr (cid:8) R − ML ˆ R (cid:9) = n (58)which is independent of k and can be omitted in the minimiza-tion of (54). Thus, for the ﬁrst term of (54) we only need thedeterminant | R ML | which is the product of the eigenvalues,and the MDL criterion becomes (see Appendix IV): m k X i =1 log(ˆ λ i ) + m ( n − k ) log (cid:18) n − k n X i = k +1 ˆ λ i (cid:19) + κ m ) (59)where κ is the number of free parameters in the distribution.This expression should be computed for different values of ≤ k ≤ n − and its minimum point should be ˆ k MDL .Note that we can subtract the term m P ni =1 log(ˆ λ i ) from theexpression, which is not dependent on k to get the well knownMDL criterion [129] (also see Appendix IV): m ( n − k ) log  n − k P ni = k +1 ˆ λ i Q ni = k +1 ˆ λ n − k i  + κ m ) (60)where the ﬁrst term is the likelihood ratio for the spheric-ity test of the covariance matrix. This likelihood ratio isa function of arithmetic and geometric means of the noisesubspace eigenvalues [140]. Figure 20 is an example of MDL Spherical subspace implies the eigenvalues of the cross-correlation matrixare equal in that subspace.

SNR in dB P r obab ili t y k = 0k = 1k = 2k = 3k = 4 SNR (dB) P r obab ili t y Fig. 20. An example of the performance of the MDL criterion in arrayprocessing. The MDL estimates the number of active sources, which is 2. performance in determining the number of sources in arrayprocessing. It is evident that in low SNR’s, the MDL has astrong tendency to underestimate the number of sources, whileas SNR increases, it gives a consistent estimate. Also at highSNR’s, underestimation is more probable than overestimation.Now we compute the number of independent parameters( κ ) in the model. Since the noise subspace is spherical,the choice of eigenvectors in this subspace can accept anyarbitrary orthonormal set; i.e., no information is revealedwhen these vectors are known. Thus, the set of parametersis { λ , . . . , λ k , σ , v , . . . , v k } . The eigenvalues of ahermitian matrix (correlation matrix) are all real while theeigenvectors are normal complex vectors. Therefore, the eigen-values (including σ ) introduce k + 1 degrees of freedom. Theﬁrst eigenvector has n − degrees of freedom (since its ﬁrstnonzero element can be adjusted to unity); while the second,due to its orthogonality to the ﬁrst eigenvector, has n − degrees of freedom. With the same argument, it can be shownthat there are n − i ) free parameters in the i th eigenvector; hence κ = 1 + k + k X i =1 n − i ) = n (2 n − k ) + 1 (61)where the last integer can be omitted since it is independentof k .The two-part MDL, despite its very low computationalcomplexity, is among the most successful methods for sourceenumeration in array processing. Nonetheless, this methoddoes not reach the best attainable performance for ﬁnitenumber of measurements [141]. The new version of MDL,called one-part or Reﬁned MDL has improved the performancefor the cases of ﬁnite measurements which has not beenapplied to the array processing problem [37].

B. Sparse Array Beam-forming and Design

Sparsely sampled irregular and random arrays have beenused in several ﬁelds such as radar, sonar, ultrasound imaging,and seismology. The objective is to improve economy ofdesign, e.g., achieve better resolution for a given maximumnumber of array elements. The whole idea of sparse arraysdepends on having a feature which is better than needed, e.g.,sidelobe suppression which can be traded for resolution.In array signal processing, the aperture smoothing functionplays the same role as the transfer function of an LTI ﬁlter.Assume that n elements are spaced as a -D uniform lineararray with a distance d and are located at x i = i · d for i =0 , , . . . , n − , as shown in Fig. 19. The aperture smoothingfunction when each element is weighted by a scalar w i is W ( u ) = n − X i =0 w i e − πji uλ d (62)The variable u is deﬁned by u = sin ϕ where ϕ is called theazimuth angle, λ the wavelength, and the weights, w i , are astandard window function [14]. The weights, w i , could alsobe angle-dependent if the individual elements are directional(dipoles). The aperture smoothing function determines howthe waveﬁeld Fourier transform is ﬁltered by a ﬁnite aperture,just as the ﬁlter transfer function determines how the receivedsignal spectrum is shaped by the ﬁltering operation. Thecondition for avoiding aliasing is that the argument in theexponent satisﬁes π | u | λ d = | β x | · d ≤ π (63)where β x = 2 π uλ is the x component of the wave-number.The relationship between the array pattern for a regular -Darray and a ﬁlter transfer function is ω ↔ β x = 2 π uλT ↔ dh i ↔ w i (64)By using these parallels, the time-frequency sampling theorem T ≤ πω max translates into the spatial sampling theorem d ≤ λ min . The aperture smoothing function at the Nyquist rate ofa uniform linear array is depicted in Fig. 21. −1 −0.5 0 0.5 1−50−40−30−20−100 φ / π G a i n ( d B ) Fig. 21. The element response for an array with ( d = λ/ ) and omni-directional elements ( ≪ λ ). In addition, array processing has some additional degrees offreedom that are not so common in time-frequency processing,such as irregular sampling, and things that are not found at allsuch as elements along curved surfaces.A sparse array has elements removed from the aperture(thinning), and there are basically two approaches to ﬁndingthe best thinning. The ﬁrst is through optimization; This canbe performed in -D and -D, and also with elements onan underlying regular grid or with freely chosen positions inthe aperture (avoiding overlapping elements). It also makesa difference whether the array is ﬂat or curved. The mostpopular optimization criterion is to minimize the peak sidelobein the beampattern with a condition on the maximum mainlobewidth.The second approach is more heuristic, and is based on theexperience that it is often not possible to formulate optimalityin a sense that is compatible with standard optimizationalgorithms. Also, in an imaging system, more degrees offreedom can be obtained in the optimization, if one allowsthe layouts of the transmitter and the receiver to be different.We will still assume that the receiver and transmitter arraysare located in the same position (monostatic), but now thereis partial or no overlap between the selected elements.In the optimization approach, the challenge is the combina-torial problem of ﬁnding the best layout of sparse elements forone and two dimensions. The optimization problem is creationof beampatterns with low mainlobe width and low sidelobes.The layout optimization methods consist of LP, Genetic Al-gorithms (GA) and Simulated Annealing (SA) [142].With a sparse optimized random array, we can get an arraypattern as shown in Fig. 22. Comparing this ﬁgure to Fig.21, we see that with even of the elements, we can getcomparable results for the peak sidelobe value. It is alsoevident that thinning has a price in that the overall energyin the sidelobe region has increased.Another example is shown in Fig. 23 where enhancedsimulated annealing has been used. This approach can handlearbitrary layouts where the elements are not locked to aCartesian grid and where each individual element can havedifferent directivity. Also, it can easily be extended to handlewideband excitations and the response optimization in the near −1 −0.5 0 0.5 1−30−25−20−15−10−50 φ / π G a i n ( d B ) Fig. 22. Array pattern for optimized array with elements out of . (a) (b)(c) Fig. 23. Symmetric non-cartesian sparse arrays, (a) sparse hex-grid arraythinned from to elements, (b) a square element array thinned from to elements, and (c) a ring array thinned from to elements.Optimization is by simulated annealing [144]. ﬁeld (focused arrays) [143].The simple examples of Fig. 23 illustrate the geometriesthat the algorithm can handle and shows a sparse hex-gridoptimized array with hexagonally shaped elements, a lineargrid design with square elements, and a ring array havingincreasing element sizes for increasing ring radii.One of the optimization problems in array processing isto design the sparsest array for a given beam pattern [142],[145]. In this case, as shown in Table II in row , therandom sparsity is in the space domain and the informationdomain is a desirable aperture array pattern; we conjecturethat, for a given beampattern, the IMAT used for randomsampling and impulsive noise removal (Section II-A.1) canbe used in synthesizing the number, locations and weights −1 −0.5 0 0.5 1−30−25−20−15−10−50 φ / π G a i n ( d B ) Fig. 24. One-way array pattern after optimization for -element arrayrandomly thinned to elements. Thinning and weights are shown in Fig.25. Fig. 25. Weights found after optimization for beam-pattern of the randomlayout as shown in Fig. 24. of the sparse array elements. Another optimization issue fora given number of array elements is equivalent to designingthe weights and locations of sparse random arrays to yieldan optimum array/beam-pattern [146], see also Figs. 24 and25; this kind of optimization is unique for this applicationand has no parallel in other applications. The weights and thearray locations can be optimized separately or jointly; jointoptimization of thinning pattern and weights has been reportedin sonar arrays. Joint optimization of positions and weights isalso possible, usually by iterating over a sequence of positionoptimization followed by weight optimization. Optimizationof the element positions of a sparse array is considerablymore difﬁcult than weight optimization. The reason is thatfor an array with n elements, the number of combinations forselecting k array elements is (cid:18) nk (cid:19) = n ! k !( n − k )! (65)When n , k are large or for -D arrays, an exhaustive searchis out of the question. There are several ways that this numbercan be reduced. The optimization criteria are the same for thelayout problem as for the weight problem, i.e., • Minimize main sidelobe of the pattern while restrictingthe width of the main-lobe below a given limit. • Minimize the total energy of the pattern wasted in thesidelobes while keeping the peak value of the main lobeabove a given threshold.The second heuristic approach builds on the properties ofsome of the basic building blocks of sparse arrays: randomarrays, binned arrays, and periodic arrays.Assume an array with n elements where only k elements arekept after random thinning. The average array pattern is equalto that of an aperture weighted with the probability densityfunction of the thinning. This also determines the shape of themain lobe. For a uniform thinning, it turns out that the ratio ofthe average sidelobe power to the main lobe power is k . Butthe variance is about k for | u | = | sin( ϕ ) | > λL , where L isthe aperture. The relative peak level of a -D random array is √ k ln( k ) [147] which gives, in our experience, a fairly goodestimate of the peak level.In the binned array, the aperture is divided into k equalsize bins and one element is chosen at random in each bin.This resembles a nearest neighbor restriction as there can beno more than two neighbor elements in a -D binned array.With a uniform distribution in each bin, it can be shown thatthe binned array has the same properties as the random arrayexcept that the variance does not reach the value of k until theangle reaches | u | = | sin( ϕ ) | > k λL . Thus the binned array hasrandom sidelobes close to the mainlobe that are much lowerthan the random array [147].Periodic arrays are thinned with a periodicity, e.g., , , or . This means that grat-ing lobes are formed whose position and size are easilypredicted. Periodic arrays have turned out to be particularlyuseful in imaging systems where one can use different peri-odicities for the transmitter and the receiver. As the two-waybeampattern is the product of the transmitter and the receiverbeampatterns, by proper design, grating lobes formed by thetransmitter can be suppressed by the receiver and vice versa.A special case is the Vernier array which has periodicities p and p − [148].The simplest way to utilize the above properties is to use theproperties of periodic arrays and variants where grating lobesin the transmitter are used to cancel the receiver grating lobesand vice versa. References [149], [150] describe simulationsand experiments with a -D array for medical ultrasound ofsize × elements where several variations and combi-nations are utilized for minimizing sidelobes. The differentvariants investigated with partial overlap between transmitterand receiver elements are • Periodic -D arrays with different periodicity on trans-mission and reception • Arrays with diagonal periodicity combined with eachother and with periodic arrays, see for example Fig. 26[149]. It gives a two-way maximum sidelobe of − dB when un-steered and − dB when steered to (30 , degrees. • Arrays with periodicity along only a single axis andcombinations with diagonal periodic arrays • Arrays with periodicity along radius vectors rather thanthe x - and y -axes (a) (b) Fig. 26. (a) Periodic diagonal array with out of elements used fortransmission and (b) periodic array with elements for reception-[149].

Arrays which have no overlap between the transmitter andreceiver elements are • Binned arrays with different layouts for transmission andreception • Polar binned arrays where the bins are along rays ema-nating from the center of the arrayThe proposed conﬁgurations have been tested on real datafor an array that has been built. Good correspondence withsimulations have been obtained.

Curved Arrays:

More recently, there has been some re-search on optimizations of curved arrays. The beam patterncan be found by projecting the elements onto a ﬂat surface(Fourier projection-slice theorem), therefore, the beampatternis equivalent to that of an array with unequal spacing betweenthe elements. This reduces grating lobes. There is an optimalradius of the curvature for the array, which minimizes thepeak sidelobe in the two-way response [151]. The optimalvalue varies with the thinning method. Similar work for theoptimization of one-way response has also been applied tocylindrical arrays for sonar applications [152].

C. Sparse Sensor Networks

Wireless sensor networks typically consist of a large numberof sensor nodes, spatially distributed over a region of interest,that observe some physical environment including acoustic,seismic, and thermal ﬁelds with applications in a wide range ofareas such as health care, geographical monitoring, homelandsecurity, and hazard detection. The way sensor networks areused in practical applications can be divided into two generalcategories:1) There exists a central node known as the Fusion Cen-ter (FC) that retrieves relevant ﬁeld information fromthe sensor nodes and communication from the sensornodes to FC generally takes place over a power- andbandwidth-constrained wireless channel.2) Such a central node does not exist and the nodestake speciﬁc decisions based on the information theyobtain and exchange among themselves. Issues suchas distributed computing and processing are of highimportance in such scenarios.In general, there are three main tasks that should be im-plemented efﬁciently in a wireless sensor network: sensing, communication, and processing. The main challenge in designof practical sensor networks is to ﬁnd an efﬁcient way ofjointly performing these tasks, while using the minimumamount of system resources (computation, power, bandwidth)and satisfying the required system design parameters (suchas distortion levels). For example, one such metric is theso-called energy-distortion tradeoff which determines howmuch energy the sensor network consume in extracting anddelivering relevant information up to a given distortion level.Although many theoretical results are already available in thecase of point-to-point links in which separation between sourceand channel coding can be assumed, the problem of efﬁcientlytransmitting or sharing information among a vast number ofdistributed nodes remains a great challenge. This is due tothe fact that well-developed theories and tools for distributedsignal processing, communications, and information theory inlarge-scale networked systems are still under development.However, recent results on distributed estimation or detectionindicate that joint optimization through some form of source-channel matching and local node cooperation can result insigniﬁcant system performance improvement [153], [154],[155], [156], [157].

1) How sparsity can be exploited in a sensor network:

Sparsity appears in many applications for which sensor net-works are deployed, e.g., localization of targets in a largeregion or estimation of physical phenomena such as tempera-ture ﬁelds that are sparse under a suitable transformation. Forexample, in radar applications, under a far-ﬁeld assumption,the observation system is linear and can be expressed as amatrix of steering vectors [158], [159]. In general, sparsitycan arise in a sensor network from two main perspectives:1) Sparsity of node distribution in spatial terms2) Sparsity of the ﬁeld to be estimatedAlthough nodes in a sensor network can be assumed to beregularly deployed in a given environment, such an assumptionis not valid in many practical scenarios. Therefore, the non-uniform distribution of nodes can lead to some type of sparsityin spatial domain that can be exploited to reduce the amountof sensing, processing, and/or communication. This issue issubsequently related to extensions of the nonuniform samplingtechniques to two-dimensional domains through proper inter-polation and data recovery when samples are spatially sparse[38], [160]. The second scenario that provides a proper basisfor exploiting the sparsity concepts arises when the ﬁeld to beestimated is a sparse multi-dimensional signal. From this pointof view, ideas such as those presented earlier in the context ofcompressed sensing (Sec. II-B) provide the proper frameworkto address the sparsity in such ﬁelds.

Spatial Sparsity and Interpolation in Sensor Networks:

Although general two-dimensional interpolation techniquesare well-known in various branches of statistics and signalprocessing, the main issue in a sensor network is exploringproper spatio/temporal interpolation such that communicationand processing are also efﬁciently accomplished. While thereis a wide range of interpolation schemes (polynomial, Fourier,and least squares [161]), many of these schemes are notdirectly applicable for spatial interpolation in sensor networksdue to their communication complexity. Another characteristic of many sensor networks is the non-uniformity of node distribution in the measurement ﬁeld.Although non-uniformity has been dealt with extensively incontexts such as signal processing, geo-spatial data processing,and computational geometry [2], the combination of irregularsensor data sampling and intra-network processing is a mainchallenge in sensor networks. For example, reference [162]addresses the issue of spatio-temporal non-uniformity in sen-sor networks and how it impacts performance aspects of asensor network such as compression efﬁciency and routingoverhead. In order to reduce the impact of non-uniformity,the authors in [162] propose using a combination of spatialdata interpolation and temporal signal segmentation. A simpleinterpolation wavelet transform for irregular sampling whichis an extension of the -D irregular grid transform to -Dspatio-temporal transform grids is also proposed in [163].Such a multi-scale transform extends the approach in [164]and removes the dependence on building a distributed meshwithin the network. It should be noted that although waveletcompression allows the network to trade reconstruction qualityfor communication energy and bandwidth usage, such energysavings are naturally offset by the overhead cost of computingthe wavelet coefﬁcients.Distributed wavelet processing within sensor networks isyet another approach to reduce communication energy andwireless bandwidth usage. Use of such distributed processingmakes it possible to trade long-haul transmission of raw datato the FC for less costly local communication and processingamong neighboring nodes [163]. In addition, local collabora-tion among nodes decorrelates measurements and results in asparser data set. Compressive Sensing in Sensor Networks:

Most naturalphenomena in SN’s are compressible through representationin a natural basis [79]. Some examples of these applicationsare imaging in a scattering medium [158], MIMO radar [159],and geo-exploration via underground seismic data. In suchcases, it is possible to construct a highly compressed versionof a given ﬁeld, in a decentralized fashion. If the correlationsbetween data at different nodes are known a-priori, it ispossible to use schemes that have very favorable power-distortion-latency tradeoffs ([153], [165], [166]). In such cases,distributed source coding techniques, such as Slepian-Wolfcoding, can be used to design compression schemes withoutcollaboration between nodes (see [165] and the referencestherein). Since prior knowledge of such correlations is notavailable in many applications, collaborative, intra-networkprocessing and compression are used to determine unknowncorrelations and dependencies through information exchangebetween network nodes. In this regard, the concept of com-pressive wireless sensing has been introduced in [157] forenergy-efﬁcient estimation at the FC of sensor data, based onideas from wireless communications [153], [155], [166], [167],[168] and compressive sampling theory [30], [84], [169]. Themain objective in such an approach is to combine processingand communications in a single distributed operation [170],[171], [172].

Methods to obtain the required sparsity in a SN:

Whiletransform-based compression is well-developed in traditional signal and image processing domains, the understanding ofsparse transforms for networked data is not as trivial [173].There are methods such as associating a graph with a givennetwork, where the vertices of the graph represent the nodesof the network, and edges between vertices represent rela-tionships among data at adjacent nodes. The structure of theconnectivity is the key to obtaining effective sparse transfor-mations for networked data [173]. For example, in the case ofuniformly distributed nodes, tools such as DFT or DCT canbe adopted to exploit the sparsity in the frequency domain. Inmore general settings, wavelet techniques can be extended tohandle the irregular distribution of sampling locations [163].There are also scenarios in which standard signal transformsmay not be directly applicable. For example, network monitor-ing applications rely on the analysis of communication trafﬁclevels at the network nodes where network topology affects thenature of node relationships in complex ways. Graph wavelets[174] and diffusion wavelets [175] are two classes of trans-forms that have been proposed to address such complexities.In the former case, the wavelet coefﬁcients are obtained bycomputing the digital differences of the data at different scales.The coefﬁcients at the ﬁrst scale are differences betweenneighboring data points, and those at subsequent spatial scalesare computed by ﬁrst aggregating data in neighborhoods andthen computing differences between neighboring aggregations.The resulting graph wavelet coefﬁcients are then deﬁned byaggregated data at different scales, and computing differencesbetween the aggregated data [174]. In the latter scheme,diffusion wavelets are based on construction of an orthonormalbasis for functions supported on a graph and obtaining acustom-designed basis by analyzing eigenvectors of a diffusionmatrix derived from the graph adjacency matrix. The resultingbasis vectors are generally localized to neighborhoods ofvarying size and may also lead to sparse representations ofdata on a graph [175]. One example of such an approach iswhere the node data correspond to trafﬁc rates of routers in acomputer network. Implementation of CS in a wireless SN:

Two main ap-proaches to implement random projections in a SN are dis-cussed in the literature [173]. In the ﬁrst approach, the CS pro-jections are simultaneously calculated through superpositionof radio waves and communicated using amplitude-modulatedcoherent transmissions of randomly-weighted values directlyfrom the nodes in the network to the FC (Fig. 27). Thisscheme, introduced in [157], [167] and further reﬁned in [176],is based on the notion of so-called matched source-channelcommunication [166], [167]. Although the need for complexrouting, intra-network communications, and processing arealleviated, local phase synchronization among nodes is anissue to be addressed properly in this approach.In the second approach, the projections can be computedand delivered to every subset of nodes in the network usinggossip/consensus techniques, or be delivered to a single pointusing clustering and aggregation. This approach is typicallyused for networked data storage and retrieval applications. Inthis method, computation and distribution of each CS sample isaccomplished through two simple steps [173]. In the ﬁrst step,each of the sensors multiplies its data with the corresponding

Fig. 27. Computation of CS projections through superposition of radio wavesof randomly weighted values directly from the nodes in the network to theFC (from [173]). element of the compressing matrix. Then, in the secondstep, the resulting local terms are simultaneously aggregatedand distributed across the network using randomized gossip[177], which is a simple iterative decentralized algorithm forcomputing linear functions. Because each node only exchangesinformation with its immediate neighbors in the network,gossip algorithms are more robust to failures or changes inthe network topology and cannot be easily compromised byeliminating a single server or fusion center [178].Finally, it should be noted that in addition to the encod-ing process, the overall system performance is signiﬁcantlyaffected by the decoding process [82], [179], [180]; this studyand its extensions to sparse SN’s remain as challenging tasks.

2) Sensing Capacity:

Despite wide-spread development ofSN ideas in recent years, understanding of fundamental perfor-mance limits of sensing and communication between sensorsis still under development. One of the issues that has recentlyattracted attention in theoretical analysis of sensor networks,is the concept of sensor capacity. The sensing capacity wasinitially introduced for discrete alphabets in applications suchas target detection [181], and later extended in [15], [182],[183] to the continuous case. The questions in this area arerelated to the problem of sampling of sparse signals, [30], [69],[169] and sampling with ﬁnite rate of innovation [4], [95].In the context of the CS, sensing capacity provides boundson the maximum signal dimension or complexity per sensormeasurement that can be recovered to a pre-deﬁned degree ofaccuracy. Alternatively, it can be interpreted as the minimumnumber of sensors necessary to monitor a given region to adesired degree of ﬁdelity based on noisy sensor measurements.The inverse of sensing capacity is the compression rate; i.e.,the ratio of the number of measurements to the number ofsignal dimensions which characterizes the minimum rate towhich the source can be compressed. As shown in [15], sens-ing capacity is a function of SNR, the inherent dimensionalityof the information space, sensing diversity, and the desireddistortion level.Another issue to be noted with respect to the sensing capac-ity is the inherent difference between sensor network and CSscenarios in the way in which the SNR is handled [15], [183].In sensor networks composed of many sensors, ﬁxed SNRcan be imposed for each individual sensor. Thus, the sensedSNR per location is spread across the ﬁeld of view leading Fig. 28. The BSS concept; the unobservable sources s [ i ] , . . . , s n [ i ] aremixed and corrupted by additive zero mean noise to generate the observations x [ i ] , . . . , x m [ i ] . The target of BSS is to estimate an unmixing system torecover the original sources in y [ i ] , . . . , y n [ i ] . to a row-wise normalization of the observation matrix. On theother hand, in CS, the vector-valued observation correspondingto each signal component is normalized by each column.This difference has led to different regimes of compressionrate [183]. In SN, in contrast to the CS setting, sensingcapacity is generally small and correspondingly the numberof sensors required does not scale linearly with the targetsparsity. Speciﬁcally, the number of measurements is generallyproportional to the signal dimension and is weakly dependenton target density sparsity. This issue has raised questions oncompressive gains in power-limited SN applications based onsparsity of the underlying source domain.VI. S PARSE C OMPONENT A NALYSIS : BSS

AND

SDR

A. Introduction

Recovery of the original source signals from their mixtures,without having a priori information about the sources andthe way they are mixed, is called Blind Source Separation(BSS). This process is impossible if no assumption about thesources can be made. Such an assumption on the sources maybe uncorrelatedness, statistical independence, lack of mutualinformation, or disjointness in some space [19], [20], [43].The signal mixtures are often decomposed into their con-stituent principal components, independent components or areseparated based on their disjoint characteristics described in asuitable domain. In the latter case the original sources shouldbe sparse in that domain. Independent Component Analysis(ICA) is often used for separation of the sources in theformer case whereas Sparse Component Analysis (SCA) isemployed for the latter case. These two mathematical tools aredescribed in the following sections followed by some resultsand illustrations of their applications.

B. Independent Component Analysis (ICA)

The main assumption in ICA is the statistical independenceof the constituent sources. Based on this assumption, ICA canplay a crucial role in the separation and denoising of signals(BSS).There has been recent research interest in the ﬁeld ofBSS due to its practicality in a wide range of problems.For example, BSS of acoustic signals measured in a room isoften referred to as the Cocktail Party problem, which meansseparation of individual sounds from a number of recordingsin an echoic and noisy environment. Figure 28 illustrates the BSS concept, wherein the mixing block represents themultipath propagation model between the original sources andthe microphone measurements.Generally, BSS algorithms make assumptions about the en-vironment in order to make the problem more tractable. Thereare typically three assumptions about the mixing medium. Themost simple but widely used case is the instantaneous case,where the source signals arrive at the sensors at the sametime. This has been considered for separation of biologicalsignals such as the EEG where the signals have narrowbandwidths and the sampling frequency is normally low [184].The generative model for BSS in this case can be easilyformulated as: x [ i ] = H · s [ i ] + ν [ i ] (66)where s [ i ] , x [ i ] , and ν [ i ] denote respectively the vector ofsource signals, size n × , observed signals size m × , andnoise signals size m × . H is the mixing matrix of size m × n . Generally, the mixing process can be nonlinear (dueto inhomogenity of the environment and that the mediumcan change with respect to the source signal variations; e.g.stronger vibration of a drum as a medium, with loudersound). However, in an instantaneous linear case where theabove problems can be avoided or ignored, the separationis performed by means of a separating matrix, W of size n × m , which uses only the information contained in x [ i ] to reconstruct the original source signals (or the independentcomponents) as: y [ i ] = W · x [ i ] (67)where y [ i ] is the estimate for the source signal s [ i ] . Theearly approaches in instantaneous BSS started from the workby Herault and Jutten [185] in 1986. In their approach,they considered non-Gaussian sources with equal number ofindependent sources and mixtures. They proposed a solutionbased on a recurrent artiﬁcial neural network for separation ofthe sources.In the cases where the number of sources is known anyambiguity caused by false estimation of the number of sourcescan be avoided. If the number of sources is unknown, acriterion may be established to estimate the number of sourcesbeforehand. In the context of model identiﬁcation this isreferred to as Model Order Selection and methods such as theFinal Prediction Error (FPE), AIC, Residual Variance (RV),MDL and Hannan and Quinn (HNQ) methods [186] may beconsidered to solve this problem.In acoustic applications, however, there are usually timelags between the arrival times of the signals at the sensors.The signals also may arrive through multiple paths. Thistype of mixing model is called a convolutive model [187].The convolutive mixing model can also be classiﬁed intotwo subcategories: anechoic and echoic. In both cases thevector representations of mixing and separating processes aremodiﬁed as x [ i ] = H [ i ] ∗ s [ i ] + ν [ i ] and y [ i ] = W [ i ] ∗ x [ i ] ,respectively, where ∗ denotes the convolution operation. In ananechoic model, however, the expansion of the mixing process may be given as: x r [ i ] = n X j =1 h r,j s j [ i − δ r,j ] + ν r [ i ] , for r = 1 , . . . , m (68)where the attenuation, h r,j , and delay δ r,j of source j to sensor r would be determined by the physical position of the sourcerelative to the sensors. Then the unmixing process to estimatethe sources will be given as: y j [ i ] = m X r =1 w j,r x r [ i − δ j,r ] , for j = 1 , . . . , n (69)where the w j,r ’s are the elements of W . In an echoic mixingenvironment it is expected that the signals from the samesources reach the sensors through multiple paths. Thereforethe expansion of the mixing and separating models will bechanged to x r [ i ] = n X j =1 L X l =1 h lr,j s j [ i − δ lr,j ] + ν r [ i ] , r = 1 , . . . , m (70)where L denotes the maximum number of paths for thesources, ν r [ i ] is the accumulated noise at sensor r , and ( . ) l refers to the l th path. The unmixing process will be formulatedsimilarly to the anechoic one. For a known number of sourcesan accurate result may be expected if the number of paths isknown; otherwise, the overall number of observations in anechoic case is inﬁnite.The aim of BSS using ICA is to estimate an unmixingmatrix W such that Y = WX best approximates theindependent sources S , where Y and X are respectivelymatrices with columns y [ i ] = (cid:2) y [ i ] , y [ i ] , . . . , y n [ i ] (cid:3) T and x [ i ] = (cid:2) x [ i ] , x [ i ] , . . . , x m [ i ] (cid:3) T . Thus the ICA separationalgorithms are subject to permutation and scaling ambiguitiesin the output components, i.e. W = PDH − , where P and D are the permutation and scaling (diagonal) matrices,respectively. Permutation of the outputs is troublesome inplaces where either the separated segments of the signals areto be joined together or when a frequency-domain BSS isperformed.Mutual information is a measure of independence and max-imizing the non-Gaussianity of the source signals is equivalentto minimizing the mutual information between them [188].In those cases where the number of sources is more thanthe number of mixtures (underdetermined systems), the aboveBSS schemes cannot be applied simply because the mixingmatrix is not invertible, and generally the original sourcescannot be extracted. However, when the signals are sparse,the methods based on disjointness of the sources in somedomain may be utilized. Separation of the mixtures of sparsesignals is potentially possible in the situation where, at eachsample instant, the number of nonzero sources is not morethan a fraction of the number of sensors (see Table II, rowand column 6). The mixtures of sparse signals can also beinstantaneous or convolutive. C. Sparse Component Analysis (SCA)

While the independence assumption for the sources iswidely exploited in the design of BSS algorithms, the pos-sible disjointness of the sources in some domain has notbeen considered. In SCA, this property is directly employed.Blind source separation by sparse decomposition has beenaddressed by Zibulevsky and Pearlmutter [189] for both over-determined/exactly-determined and underdetermined systemsusing the maximum a posteriori approach. One way of for-mulating SCA is by representing the sources using a propersignal dictionary: s r [ i ] = n X l =1 c r,l φ l [ i ] (71)where r = 1 , . . . , m and n is the number of basis functions inthe dictionary. The functions φ l [ i ] are called atoms or elementsof the dictionary. These atoms do not have to be linearlyindependent and may form an overcomplete dictionary. Thesparsity property requires that only a small number of thecoefﬁcients c r,l differ signiﬁcantly from zero. Based on thisdeﬁnition, the mixing and unmixing systems are modeled asfollows: x [ i ] = As [ i ] + ν [ i ] s [ i ] = CΦ [ i ] (72)where ν [ i ] is an m × vector. A and C can be determinedby optimization of a cost function based on an exponentialdistribution for c i,j [189]. Figure 29 shows three separatedsources using the above approach.In places where the sources are sparse and at each timeinstant, at most one of the sources has signiﬁcant nonzerovalue, the columns of the mixing matrix may be calculatedindividually, which makes the solution to the underdeterminedcase possible.The SCA problem can be stated as a clustering problemsince the lines in the scatter plot can be separated based ontheir directionalities by means of clustering. A number ofworks on this method have been reported [19], [191], [192].In the work by Li et. al. [192], the separation has beenperformed in two different stages. First, the unknown mixingmatrix is estimated using the k-means clustering method.Then, the source matrix is estimated using a standard linearprogramming algorithm. The line orientation of a data set maybe thought of as the direction of its greatest variance. Oneway is to perform eigenvector decomposition on the covariancematrix of the data, the resultant principal eigenvector, i.e., theeigenvector with the largest eigenvalue, indicates the directionof the data, since it has the maximum variance. In [191], GAPstatistics as a metric which measures the distance betweenthe total variance and cluster variances, has been used toestimate the number of sources followed by a similar methodto Li’s algorithm explained above. In line with this approach,Boﬁll and Zibulevsky [16] developed a potential functionmethod for estimating the mixing matrix followed by ℓ -normdecomposition for the source estimation. Local maxima of thepotential function correspond to the estimated directions ofthe basis vectors. After the mixing matrix is identiﬁed, the Fig. 29. Separation of sparse signals using the algorithm given in [190], left(a) sources, (b) mixtures, and (c) reconstructed sources, in both time-frequencyleft, and time, right. −1.5 −1 −0.5 0 0.5 1 1.5−1.5−1−0.500.511.5 x [i] x [ i ] (a) (b) Fig. 30. (a) the scatter plot and (b) the shortest path from the origin to thedata point, x [ i ] , extracted from [16]. sources have to be estimated. Even when A is known thesolution is not unique. So, a solution is found for which the ℓ -norm is minimized. Therefore, for x [ i ] = P a j s j [ i ] , P j | s j | is minimized using linear programming.Geometrically, for a given feasible solution, each sourcecomponent is a segment of length | s j | in the direction ofthe corresponding a j and, by concatenation, their sum deﬁnesa path from the origin to x [ i ] . Minimizing P j | s j | amountstherefore to ﬁnding the shortest path to x [ i ] over all feasiblesolutions j = 1 , . . . , n , where n is the dimension of spaceof the independent basis vectors [19]. Figure 30 shows thescatter plot, polar plot, and the shortest path from the originto the data point x [ i ] .There are many cases for which the sources are disjoint in other domains, rather than the time-domain, or when they canbe represented as sum of the members of a dictionary whichcan consist for example of wavelets or wavelet packets. Inthese cases the sparse component analysis can be performedin those domains more efﬁciently. Such methods often includetransformation to time-frequency domain followed by a bi-nary masking [193] or a BSS followed by binary masking[187]. One such approach, called DUET [193], transformsthe anechoic convolutive observations into the time-frequencydomain using a short-time Fourier transform and the relativeattenuation and delay values between the two observationsare calculated from the ratio of corresponding time-frequencypoints. The regions of signiﬁcant amplitudes (atoms) arethen considered to be the source components in the time-frequency domain. In this method only two mixtures havebeen considered and as a major limit of this method, onlyone source has been considered active at each time instant.For instantaneous separation of sparse sources, the commonapproach used by most researchers is to attempt to maximizethe sparsity of the extracted signals at the output of theseparator. The columns of the mixing matrix A assign eachobserved data point to only one source based on some measureof proximity to those columns [194], i.e., at each instant onlyone source is considered active. Therefore the mixing systemcan be presented as: x r [ i ] = n X j =1 a j,r s j [ i ] , r = 1 , . . . , m (73)where in an ideal case a j,r = 0 for r = j . Minimization ofthe ℓ -norm is one of the most logical methods for estimationof sources as long as the signals can be considered sparse. ℓ -norm minimization is a piecewise linear operation thatpartially assigns the energy of x [ i ] to the m columns of A around x [ i ] in R n space. The remaining n − m columns areassigned zero coefﬁcients, therefore the ℓ -norm minimizationcan be manifested as: min k s [ i ] k ℓ subject to A · s [ i ] = x [ i ] (74)A detailed discussion of signal recovery using ℓ -norm mini-mization is presented by Takigawa et. al. [195] and describedbelow. As mentioned above, it is important to choose a domainthat sparsely represents the signals.On the other hand, in the method developed by Pederson et. al. , as applied to stereo signals, the binary masks areestimated after BSS of the mixtures and then applied to themicrophone signals. The same technique has been used forconvolutive sparse mixtures after the signals are transformedto the frequency domain.In another approach [196] the effect of outlier noise hasbeen reduced using median ﬁltering then hybrid fast ICA ﬁlter-ing, and ℓ -norm minimization have been used for separationof temporomandibular joint sounds. It has been shown that forsuch sources, this method outperforms both the DegenerateUnmixing Estimation Technique (DUET) and Li’s algorithms.The authors of [197] have recently extended the DUET al-gorithm to separation of more than two sources in an echoicmixing scenario in the time-frequency domain. TABLE VIISCA S

TEPS

1) Consider the model x = A · s , we need a linear trans-formation that applies to both sides of the equation toyield a new sparse source vector.2) Estimate the mixing matrix A . Several approaches arepresented for this step, such as natural gradient ICAapproaches, and clustering techniques with variants ofk-means algorithm [199], [19].3) Estimate the source representation based on the spar-sity assumption. A majority of proposed methods areprimarily based on minimizing some norm or pseudo-norm of the source representation vector. The mosteffective approaches are Matching Pursuit [199], [200],Basis Pursuit, [189], [201], [78], [202]], FOCUSS[203], IDE [66] and Smoothed ℓ -norm [204]. In a very recent approach it has been considered that brainsignal sources in the space-time-frequency domain are disjoint.Therefore, clustering the observation points in the space-time-frequency-domain can be effectively used for separation ofbrain sources [198].As can be seen, generally, BSS exploits independence ofthe source signals whereas SCA beneﬁts from the disjointnessproperty of the source signals in some domain. While the BSSalgorithms mostly rely on ICA with statistical properties of thesignals, SCA uses their geometrical and behavioral properties.Therefore, in SCA, either a clustering approach or a maskingprocedure can result in estimation of the mixing matrix. Often,an ℓ -norm is used to recover the source signals. Generally, inplaces where the source signals are sparse, the SCA methodsoften result in more accurate estimation of the signals withless ambiguities in the estimation. D. SCA Algorithms

There are three main steps for the solution of an SCAproblem as shown in Table VII [199]. The ﬁrst step of TableVII shows a linear model for the SCA problem, the secondstep consists of estimating the mixing matrix A using sparsityinformation, and ﬁnally the third step is to estimate the sparsesource representation based on the estimate of A .In the following, we present a survey of major approachesthat are suggested for the third step.

1) Matching Pursuit:

Mallat and Zhang have developed ageneral iterative method for approximating sparse decomposi-tion [200]. When the dictionary is orthogonal and the signal x is composed of k ≪ n atoms, the algorithm recovers thesparse decomposition exactly after n steps. As we will see,the algorithm is greedy [205]. Since the algorithm is myopic,in some certain cases, wrong atoms are chosen in the ﬁrstfew iterations, and thus the remaining iterations are spent oncorrecting the ﬁrst few mistakes. The algorithm is shown inTable VIII.

2) Basis Pursuit:

The mathematical representation ofcounting the number of sparse components is denoted by ℓ .However, ℓ is not a proper norm and is not computationallytractable. The closest convex norm to ℓ is ℓ . The ℓ opti-mization of an over complete dictionary is called Basis Pursuit. TABLE VIIIM

ATCHING P URSUIT A LGORITHM

1) Let x (0) = n × , r (0) = x and i = 1 .2) Find index γ i such that h r ( i − , a γ i i is maximumwhere a j corresponds to columns of the mixing matrix A (atoms).3) Let s i = h r ( i − , a γ i i , x ( i ) = x ( i − + s i a γ i and r ( i ) = x − x ( i ) .4) If the last condition is not satisﬁed, increase i andreturn to step . TABLE IX RELATION BETWEEN LP AND B ASIS P URSUIT ( THE NOTATION FORLINEAR PROGRAMMING IS FROM [207].)Basis Pursuit Linear Programming m p s x (1 , . . . , × m C ± A Ax b

However the ℓ -norm is non-differentiable and we cannot usegradient methods for optimal solutions [206]. On the otherhand, the ℓ solution is stable due to its convexity (the globaloptimum is the same as the local one) [21].Formally, the Basis Pursuit can be formulated as: min k s k ℓ s.t. x = A · s (75)We now explain how the Basis Pursuit is related to LinearProgramming (LP). The standard form of linear programmingis a constrained optimization problem deﬁned in terms ofvariable x ∈ R n by: min C T x s.t. Ax = b , ∀ i : x i ≥ (76)where C T x is the objective function, Ax = b is a set ofequality constraints and ∀ i : x i ≥ is a set of bounds.Table IX shows this relationship. Thus, the solution of (75)can be obtained by solving the equivalent LP. There are twomajor approaches to solve LP: 1) Interior Point methods and2) Simplex algorithms, depending on whether we solve thecost function and then check whether it satisﬁes the constraintbounds or vice versa.

3) FOCal Underdetermined System Solver (FOCUSS):

FOCUSS is a non-parametric algorithm that consists of twoparts [203]. It starts by ﬁnding a low resolution estimationof the sparse signal, and then pruning this solution to asparser signal representation through several iterations. Thesolution at each iteration step is found by taking the pseudo-inverse of a modiﬁed weighted matrix. The pseudo-inverseof the modiﬁed weighted matrix is deﬁned by ( AW ) + =( AW ) H ( AW · ( AW ) H ) − . This iterative algorithm is thesolution of the following optimization problem:Find s = Wq , where: min k q k ℓ s.t. x = AWq (77)Description of this algorithm is given in Table X and anextended version is discussed in [203]. TABLE XFOCUSS (B

ASIC ) • Step 1: W p i = diag ( s i − ) • Step 2: q i = ` AW p i ´ + x • Step 3: s i = W p i · q i TABLE XIIDE

STEPS • Detection Step: Find indices of inactive sources: I = ˘ ≤ i ≤ m : ˛˛ a Ti · x − m X j = i ˆ s lj a Ti · a j ˛˛ < ǫ l ¯ • Estimation Step: Find the following projection as thenew estimate: s l +1 = argmin s l X i ∈ I s i s.t. x ( t ) = A · s ( t ) The solution is derived from Karush-Kuhn-Tucker sys-tem of equations. At the l + 1 th iteration: s i = A Ti · P ` x − A a · s a ´ s a = ` A Ta PA a ´ − A Ta P · x where the matrices and vectors are partitioned intoinactive/active parts as A i , A a , s i , s a and P = ` A i A Ti ´ − • stop after a ﬁxed number of iterations.

4) Iterative Detection and Estimation (IDE):

The ideabehind this method is based on a geometrical interpretation ofthe sparsity. Consider the elements of vector s are i.i.d. randomvariables. By plotting a sample distribution of vector s , whichis obtained by plotting a large number of samples in the S -space, it is observed that the points tend to concentrate ﬁrstaround the origin, then along the coordinate axes, then acrossthe coordinate planes. The algorithm used in IDE is given inTable XI. In this table, s i ’s are the inactive sources, s a ’s arethe active sources, A i is the column of A corresponding tothe inactive s i and A a is the column of A corresponding tothe active s a . Notice that IDE has some resemblances to theRDE method discussed in Sec. III-A.2, IMAT mentioned inSec. III-A.2, and MIMAT explained in Sec. VII-A.2.

5) Smoothed ℓ -norm (SL0) Method: As discussed earlier,the criterion for sparsity is the ℓ -norm; thus our minimizationis min k s k ℓ s.t. A · s = x (78)The ℓ -norm has two major drawbacks: the need for a com-binatorial search, and its sensitivity to noise. These problemsarise from the fact that the ℓ -norm is discontinuous. The ideaof SL0 is to approximate the ℓ -norm with functions of thetype [204]: f σ ( s ) , e − s σ (79) TABLE XIISL0

STEPS • Initialization:1) Set ˆ s equal to the minimum ℓ -norm solutionof As = x , obtained by pseudo-inverse of A .2) Choose a suitable decreasing sequence for σ , [ σ , . . . , σ K ] . • For i = 1 , . . . , K :1) Set σ = σ i ,2) Maximize the function F σ on the feasible set S = { s | As = x } using L iterations of thesteepest ascent algorithm (followed by projectiononto the feasible set): – Initialization: s = ˆ s i − . – for j = 1 , . . . , L (loop L times):a) Let: ∆ s = [ s e − s σ , . . . , s n e − s n σ ] T .b) Set s ← s − µ ∆ s (where µ is a smallpositive constant).c) Project s back onto the feasible set S : s ← s − A T ` AA T ´ − ` As − x ´

3) Set ˆ s i = s . • Final answer is ˆ s = ˆ s K where σ is a parameter which determines the quality of theapproximation. Note that we have lim σ → f σ ( s ) = (cid:26) if s = 00 if s = 0 (80)For the vector s , we have k s k ≈ n − F σ ( s ) , where F σ ( s ) = P ni =1 f σ ( s i ) . Now minimizing k s k is equivalent tomaximizing F σ ( s ) for some appropriate values of σ . For smallvalues of σ , F σ ( s ) is highly non-smooth and contains manylocal maxima, and therefore its maximization over A · s = x may not be global. On the other hand, for larger values of σ , F σ ( s ) is a smoother function and contains fewer localmaxima, and its maximization may be possible (in fact thereis no local maxima for large values of σ [204]). Hence we usea decreasing sequence for σ in the steepest ascent algorithmand may escape from getting trapped into local maxima andreach the actual maximum for small values of σ , which givesthe minimum ℓ -norm solution. The algorithm is summarizedin Table XII.

6) Comparison of Different Techniques:

The above tech-niques have been simulated and the results are depicted inFig. 31. In order to compare the efﬁciency and computationalcomplexity of these methods; we use a ﬁxed synthetic mix-ing matrix and source vectors. The elements of the mixingmatrix are obtained from zero mean independent Gaussianrandom variables with variance σ = 1 . Sparse sources havebeen artiﬁcially generated using a Bernoulli-Gaussian model: s i = p N (0 , σ on ) + (1 − p ) N (0 , σ off ) . We set σ off =0 . , σ on = 1 and p = 0 . . Then, we compute the noisymixture vector x from x = As + ν , where ν is the noisevector. The elements of the vector ν are generated accordingto independent zero mean Gaussian random variables withvariance σ ν . We use Orthogonal Matching Pursuit (OMP)which is a variant of Matching Pursuit [200]. OMP has a betterperformance in estimating the source vector in comparison to S NR ( d B ) σ ν SL0OMPFOCUSSIDELP

Fig. 31. Performance of various methods with respect to the standarddeviation. −4 −2 m (Number of sources) T i m e ( S e c ond s ) OMPFOCUSSLPIDESL0

Fig. 32. Computational time (Complexity) versus the number of sources.

Matching Pursuit. Fig. 32 demonstrates the time needed foreach algorithm to estimate the vector s with respect to thenumber of sources. This ﬁgure shows that IDE and SL0 havethe lowest complexity. E. Sparse Dictionary Representation (SDR) and Signal Mod-eling

A signal x ∈ R n may be sparse in a given basis but notsparse in a different basis. For example, an image may besparse in a wavelet basis (i.e., most of the wavelet coefﬁcientsare small) even though the image itself may not be sparse (i.e.,many of the gray values of the image are relatively large).Thus, given a class S ⊂ R n , an important problem is to ﬁnd abasis or a frame in which all signals in S can be representedsparsely. More speciﬁcally, given a class of signals S ⊂ R n ,it is important to ﬁnd a basis (or a frame) D = { w j } dj =1 (if it exists) for R n such that every data vector x ∈ S can berepresented by at most k ≪ n linear combinations of elementsof D . The dictionary design problem has been addressed in[19], [20], [21], [81], [84], [95], [208]. A related problem isthe signal modeling problem in which the class S is to bemodeled by a union of subspaces M = S li =1 V i where each V i is a subspace of R n with a dimension of V i ≤ k where k ≪ n [43]. If the subspaces V i are known, then it is possible to picka basis E i = { e ij } j for each V i and construct a dictionary D = S li =1 E i in which every signal of S has sparsity k (or isalmost k sparse). The model M = S li =1 V i can be found froman observed set of data F = { f , . . . , f m } ⊂ S by solving (ifpossible) the following non-linear least squares problem: Fig. 33. Linear least squares: e = d ( f , V ) + d ( f , V ) + d ( f , V ) . (a) (b) Fig. 34. Objective function: (a) e = d ( f , V ) + d ( f , V ) + d ( f , V ) and (b) e = d ( f , V ) + d ( f , V ) + d ( f , V ) . Conﬁguration of V , V in (a) creates the partition P = { f } and P = { f , f } while theconﬁguration in (b) causes the partition P = { f , f } and P = { f } . Find subspaces V , . . . , V l of R n that minimize the expres-sion e (cid:0) F, { V , . . . , V l } (cid:1) = m X i =1 min ≤ j ≤ l d ( f i , V j ) (81)over all possible choices of l subspaces with dim V i ≤ k the set S i V i is a nonlinear set andthe problem is fully non-linear (see Figs. 34 and 35). A moregeneral nonlinear least squares problem has been studied forﬁnite and inﬁnite Hilbert spaces [43]. In that general setting,the existence of solutions is proved and a meta-algorithm forsearching for the solution is described.For the special ﬁnite dimensional case of R n in (81),the search algorithm is an iterative algorithm that alternatesbetween data partition and the optimization of a simpler leastsquares problem. The search algorithm can be summarized asin Table XIII.The above algorithm, which is similar to k-means, consistsof two parts [209]: 1) an initialization; and 2) an iterative TABLE XIIIS

EARCH A LGORITHM • Input : – initial partition { F l , . . . , F l } – Data set F • Iterations :1) Use the SVD to ﬁnd { V , . . . , V l } by minimiz-ing e ` F i , V i ´ for each i , and compute Γ = P i e ` F i , V i ´ ;2) Set j = 1 ;3) While Γ j = P i e ` F ji , V ji ´ >e ` F , { V j , . . . , V jl } ´

4) Choose a new partition { F j +11 , . . . , F j +1 l } thatsatisﬁes, f ∈ F j +1 k implies that d ` f, V jk ´ ≤ d ` f, V jh ´ , h = 1 , . . . , l ;5) Use SVD to ﬁnd and choose { V j +11 , . . . , V j +1 l } , by minimizing e ` F j +1 i , V i ´ for each i , and compute Γ j +1 = P i e ` F j +1 i , V j +1 i ´ ;6) Increment j by , i.e., j → j + 1 ;7) End while • Output : – P j and V P j .Fig. 35. Data F belongs to two planes in R . The algorithm uses a randominitial partition in left hand side, and produces the ﬁnal partition and optimalsubspaces in the right hand side. search algorithm. The initialization is a Hough-like transformthat partitions the data set F into l classes (cid:8) F , . . . , F l (cid:9) ,such that each class F i consists of vectors that are close tothe same (yet unknown) subspace V i . For each partition F i we ﬁnd the subspace V i closest to the vectors in F i amongall subspaces V ⊂ R n of dimension dim V ≤ k . Finding thesubspace V i closets to the vectors in F i for a ﬁxed i is theclassical linear least squares problem and can be found exactlyusing a Singular Value Decomposition. This initializationprocess produces a ﬁrst approximation (cid:8) V , . . . , V l (cid:9) of theoptimal spaces (cid:8) V o , . . . , V ol (cid:9) minimizing (81). We now usethe subspaces (cid:8) V , . . . , V l (cid:9) to ﬁnd a new partition of thedata F , . . . , F l in such a way that the vectors in F i consistof all the vectors in F that are closest to V i , i = 1 , . . . , l .Using this new partition F , . . . , F l of F , we ﬁnd (for each i ) the subspace V i closest to the vectors in F i using SVDas before. We continue this process until the value of e isunchanged between two consecutive iterations. This is a localminimum which is also likely to be a global one (see Fig. 35).VII. M ULTIPATH C HANNEL E STIMATION

In wireless systems, channel estimation is required for thecompensation of channel distortions. The transmitted signal bounces off different objects and arrives at the receiver frommultiple paths. This phenomenon causes the received signalto be a mixture of reﬂected and scattered versions of thetransmitted signal. The mobility of the transmitter, receiver,and scattering objects results in rapid changes in the channelresponse, and thus the channel estimation process becomesmore complicated. Due to the sparse distribution of scatteringobjects, a multipath channel is sparse in the time domain asshown in Fig. 36. By taking sparsity into consideration, chan-nel estimation can be simpliﬁed and/or made more accurate.The sparse time varying multipath channel is modeled as: h ( t, τ ) = k − X l =0 α l ( t ) δ ( t − τ ) (82)where k is the number of taps, α l is the l th complex path gain,and τ l is the corresponding path delay. At time t , the transferfunction is given by: H ( t, f ) = Z + ∞−∞ h ( t, τ ) e − j πfτ dτ (83)The estimation of the multipath channel impulse response isvery much similar to the determination of analog epochs andamplitudes of discontinuity for ﬁnite rate of innovation asshown in section II-B.4 Fig. 8 and (19). Essentially, if a knowntrain of impulses is transmitted and the received signal fromthe multipath channel is ﬁltered and sampled (informationdomain as shown in Fig. 8 -rate of innovation), the channelimpulse response can be estimated from these samples usingan annihilating ﬁlter (the Prony method [28]) deﬁned inthe Z -transform and a pseudo-inverse matrix inversion, inprinciple . Once the channel impulse response is estimated, itseffect is compensated; this process can be repeated accordingto the dynamics of the time varying channel.A special case of multipath channel is an OFDM channel,which is widely used in ADSL, DAB, DVB, WLAN, WMAN,and WIMAX . OFDM is a digital multi-carrier transmissiontechnique where a single data stream is distributed over severalsub-carrier frequencies to achieve robustness against multipathchannels besides many other advantages. Channel estimationin OFDM can be easier than for other modulation schemes;the channel impulse response is now quantized and insteadof an annihilating ﬁlter deﬁned in the Z -transform, we canuse DFT and ELP of section III-A. Also, instead of a knowntrain of impulses, some of the available sub-carriers of OFDMin each transmitting block are assigned to predeterminedpatterns, which are usually called comb-type pilots. These pilottones help the receiver to extract some of the DFT samplesof the discrete time varying channel (82) at the respectivefrequencies in each transmitting block. These characteristicsmake the OFDM channel estimation similar to unknown sparsesignal recovery of section II-A.1 and the impulsive noiseremoval of section III-A.2. Because of these advantages, our Similar to Pisarenko method for spectral estimation [28]. These acronyms are deﬁned in Table XV at the end of the paper. For simplicity, we assume a time-quantized channel impulse responsewhich works quite well. For an accurate channel estimation of OFDM channelwith AWGN, we can use fractional time delays. Delay ( µ s) A m p li t ude (a) Delay ( µ s) A m p li t ude (b) Fig. 36. The impulse response of two typical multipath channels; (a) Brazil-Dand (b) TU6 channel proﬁles. main example and simulations are related to OFDM channelestimation.

A. OFDM Channel Estimation

For OFDM, the discrete version of the time varying channelof (83) in the frequency domain becomes H [ r, i ] , H ( rT f , i ∆ f ) = k − X l =0 h [ r, l ] e − j πiln (84)where h [ r, l ] = h ( rT f , lT s ) (85)where T f and n are the symbol length and number of sub-carriers in each OFDM symbol, respectively. ∆ f is the sub-carrier spacing, and T s = f is the sample interval. The aboveequation shows that for the r th OFDM symbol, H [ r, i ] is theDFT of h [ r, l ] .Two major methods are used in the equalization process:1) zero forcing and 2) MMSE. In the zero forcing method,regardless of the noise variance, equalization is obtainedby dividing the received OFDM symbol by the estimatedchannel frequency response; while in the MMSE method, theapproximation is chosen such that the MSE of the transmitteddata vector (cid:0) E (cid:2) k X − ˆ X k (cid:3)(cid:1) is minimized, which introducesthe noise variance in the equations.

1) Statement of the Problem:

The goal of the channelestimation process is to obtain the channel impulse responsefrom the noisy values of the channel transfer function in the

Fig. 37. Graphical representation of the sparse problem involved in OFDMchannel estimation pilot positions. This is equivalent to solving the followingequation for h which is also shown graphically in Fig. 37. H i p = F i p h + ν i p (86)where i p is an index vector denoting the pilot positions inthe frequency spectrum, H i p is a vector containing the valueof the channel frequency spectrum in these pilot positions and F i p denotes the matrix obtained from taking the rows of theDFT matrix pertaining to the pilot positions. ν i p is the additivenoise on the pilot points in the frequency domain. Thus, thechannel estimation problem is equivalent to ﬁnding the sparsevector h from the above set of equations for a set of pilots.Various channel estimation methods [210] have been used withthe usual tradeoffs of optimality and complexity. The LeastSquare [210], ML [211], [212], Minimum Mean Square Error(MMSE) [213], [214], [215], and LMMSE [213], [211], [216]techniques are among some of these methods. But none ofthese techniques use the inherent sparsity of the multipathchannel h , and thus, they are not as accurate.

2) Sparse OFDM Channel Estimation:

In the following,we present two methods that utilize this sparsity to enhancethe channel estimation process.

CS Based Channel Estimation:

Recently the idea of usingtime-domain sparsity in OFDM channel estimation has beenproposed by [217], [218], [219]. The use of sparsity decreasesthe channel estimation error and hence the number of requiredpilots (overhead), thus increasing the bandwidth efﬁciency. In[217], the authors proposed to use CS for OFDM channelestimation and proved that, in case of uniform pilot insertion,the OFDM channel estimation problem satisﬁes the RestrictedIsometric Property (RIP) described in section II-B, and thusLP-based algorithms similar to the ones discussed in sectionVI-D.2 can be used for channel estimation. Simulation resultsshow that this method works effectively even in fast timevarying channels. Furthermore, from (7), a lower bound onthe number of pilots can be obtained for a given number ofchannel taps (sparsity).However, the authors of [217] did not consider zero paddingat the endpoints of the OFDM bandwidth in their scenariowhich is an essential part of the current standards basedon OFDM transmission. This assumption causes the matrix F i p deﬁned in (86) to be ill-conditioned and thus, the RIP Fig. 38. Graphical representation of the equation involved in obtaining thetap values above the threshold condition deﬁned in (9) is not satisﬁed. Also we do not haveany pilots in the zero padded parts which complicates thechannel estimation techniques. In [24], we propose a methodthat exploits the inherent sparsity and also solves the zeropadding problem. This algorithm is brieﬂy discussed in thefollowing.

Modiﬁed IMAT (MIMAT) for OFDM Channel Estimation:

In this method, the spectrum of the channel is initially esti-mated using a simple interpolation such as linear interpolationbetween pilot sub-carriers. This initial estimate is furtherimproved in a series of iterations between time (sparse) andfrequency (information) domains to ﬁnd the sparsest channelimpulse response by using an adaptive thresholding scheme; ineach iteration, after ﬁnding the location of the taps (locationswith previously estimated amplitudes higher than the thresh-old), their respective amplitudes are again found using theMMSE criterion. In each iteration, due to thresholding, someof the false taps that are noise samples with amplitudes abovethe threshold are discarded. Thus, the new iteration starts witha lower number of false taps. Moreover, because of the MMSEestimator, the valid taps approach their actual values in eachnew iteration. In the last iteration, the actual taps are detectedand the MMSE estimator gives their respective values. Thismethod is similar to RDE and IDE methods discussed insection III-A.2 and VI-D.4.Table XIV summarizes the steps in the MIMAT algorithm.In the threshold of the MIMAT algorithm, α and β areconstants which depend on the number of taps and initialpowers of noise and channel impulses. In the ﬁrst iteration,the threshold is a small number and with each iteration itis gradually increased. Intuitively, this gradual increase ofthe threshold with the iteration number, results in a gradualreduction of false taps (taps that are created due to noise). Ineach iteration, the tap values are obtained from: ˆ H LSi p = H i p + ν i p = ˜ F · h t (87)where t denotes the index of nonzero impulses obtained fromthe previous step and ˜ F is obtained from F i p by keepingthe columns determined by t . A graphical representation ofthe above equation is given in Fig. 38. The amplitudes of TABLE XIVMIMAT A

LGORITHM FOR

OFDM

CHANNEL ESTIMATION • Initialization : – Find an initial estimate of the time domain channelusing linear interpolation: ˆ h (0) = ˆ h linear • Iterations :1) Set Threshold= βe αi .2) Using the threshold from the previous step,ﬁnd the locations of the taps t by threshold-ing the time domain channel from the previousiteration( ˆ h ( i − ).3) Solve for the value of the non-zero impulsesusing MMSE: ˆ h t = SNR · ˜ F H (˜ F · SNR · ˜ F H + I ) − (88)4) Find the new estimate of the channel ( ˆ h ( i ) ) bysubstituting the taps in their detected positions.5) Stop if the estimated channel is the same as theprevious iteration or when a maximum numberof iterations is reached. −1 CNR (dB) SE R Linear Interp.Ideal channelMIMAT

Fig. 39. SER (Symbol Error Rate) vs. CNR (Carrier to Noise Ratio) for theideal channel, linear interpolation, and the MIMAT for the Brazil channel. nonzero impulses can be obtained from simple iterations,pseudo-inverse, or the MMSE equation (88) of Table XIV thatyields better results under additive noise environments.The equation that has to be solved in (87) is usually over-determined which helps the suppression of the noise in eachiteration step. Note that the solution presented in (88), rep-resents a variant of the MMSE solution when the location ofdiscrete impulses are known. If further statistical knowledge isavailable, this solution can be modiﬁed and a better estimationis obtained; however, this makes the approximation processmore complex. This algorithm does not need many steps ofiterations; the positions of the non-zero impulses are perfectlydetected in or iterations for most types of channels. B. Simulation Results and Discussions

For OFDM simulations, the DVB-H standard was usedwith the -QAM constellation in the K mode ( FFTsize). The channel proﬁle was the Brazil channel D. Fig.39 shows the Symbol Error Rate (SER) versus the Carrier-to-Noise Ratio (CNR) after equalizing using the proposedMIMAT algorithm and the standard linear interpolation in thefrequency domain using the noisy pilot samples. As can beseen in Fig. 39, the SER obtained from the MIMAT algorithm −2 −1 CNR (dB) SE R Fd = 100Fd = 75Fd = 50Fd = 30No Doppler

Fig. 40. SER (Symbol Error Rate) vs. CNR (Carrier to Noise Ratio) ofMIMAT method for the Brazil channel with various Doppler frequencies. coincides with the one obtained from the hypothetical idealchannel (where the exact channel frequency response is usedfor equalization). Thus, in this sense the proposed channelestimation is perfect in time invariant channels. Also, thisﬁgure shows that the standard linear interpolation methodperforms poorly compared to MIMAT.This estimation technique is highly robust in rapidly chang-ing channels and shows only minor performance degradationas the Doppler frequency increases as shown in Fig. 40.VIII. C

ONCLUSION

A uniﬁed view of sparse signal processing has been pre-sented in tutorial form. The sparsity in the key areas ofsampling, coding, spectral estimation, array processing, com-ponent analysis, and channel estimation has been carefullyexploited. Some form of uniform or random sampling has beenshown to underpin the associated sparse processing methodsused in each of these ﬁelds. The reconstruction methods usedin each application domain have been introduced and theinterconnections among them have been highlighted.This development has revealed; for example, that the it-erative methods developed for random sampling can be ap-plied to real-ﬁeld block and convolutional channel codingfor impulsive noise (salt-and-pepper noise in the case ofimages) removal, sparse component analysis, and channelestimation for orthogonal frequency division multiplexing sys-tems. These iterative reconstruction methods have been shownto be naturally extendable to spectral estimation and sparsearray processing due to their similarity to channel codingin terms of mathematical models. Conversely, the minimumdescription length method developed for spectral estimationand array processing has potential for application in otherareas. The error locator polynomial method developed forchannel coding has, moreover, been shown to be a discreteversion of the annihilating ﬁlter used in sampling with a ﬁniterate of innovation and the Prony method in spectral estimation;the Pisarenko and MUSIC methods are further improvementsof the Prony method.Linkages with emergent areas such as compressive sensingand sensor networks have also been considered. In addition,it has been suggested that the linear programming methodsdeveloped for compressive sensing and sparse component analysis can be applied to other applications with possiblereduction of sampling rate. As such, this tutorial has providedthe route for new applications of sparse signal processing toemerge, which can potentially reduce computational complex-ity and improve performance quality.A

PPENDIX IA CCELERATION M ETHODS : C

HEBYSHEV AND C ONJUGATE G RADIENT (CG) [65]

A. Chebyshev Algorithm • Initialization: x [ i ] = 0 x [ i ] = 2 A + B PS{ x [ i ] } ρ = B − AB + Aλ = 2 (89) • For n = 2 , . . . , N : λ n = (cid:0) − ρ λ n − (cid:1) − x n +1 [ i ] = x n − [ i ] + λ n (cid:18) x n [ i ] − x n − [ i ]+ 2 A + B PS{ x [ i ] − x n [ i ] } (cid:19) (90)where S and P are the sampling and ﬁltering operators,respectively. Also, A and B are frame bound parameters and N is the number of iterations. B. Conjugate Gradient Algorithm • Initialization: x [ i ] = 0 r [ i ] = p [ i ] = PS{ x [ i ] } λ = 2 (91) • For n = 2 , . . . , N : λ n = h r n [ i ] , p n [ i ] ih p n [ i ] , PS{ p n [ i ] }i x n +1 [ i ] = x n [ i ] + λ n p n [ i ] r n +1 [ i ] = r n [ i ] − λ n PS{ p n [ i ] } λ ′ n = h r n +1 [ i ] , PS{ p n [ i ] }ih p n [ i ] , PS{ p n [ i ] }i p n +1 [ i ] = r n +1 [ i ] − λ ′ n p n [ i ] (92)where S and P are the sampling and ﬁltering operators,respectively. N is the number of iterations and h x [ i ] , y [ i ] i denotes the inner product of the two functions x [ i ] and y [ i ] . A PPENDIX

IIELP D

ECODING FOR E RASURE C HANNELS [53]For lost samples, the polynomial locator for the erasuresamples is H ( z i ) = k Y m =1 (cid:0) z i − e j π · imn (cid:1) = k X t =0 h t z k − t , (93), H ( z i m ) = 0 , m = 1 , , . . . , k (94)where z i = e j π · in . The polynomial coefﬁcients h t , t =0 , . . . , k can be found from the product in (93); it is easier toﬁnd h t by obtaining the inverse FFT of H ( z ) . By multiplying(94) by e [ i m ] · (cid:0) z i m (cid:1) r (where r is an integer) and summingover m , we get k X t =0 h t · k X m =1 (cid:20) e [ i m ] · (cid:0) z i m (cid:1) k + r − t (cid:21) = 0 (95)Since the inner summation is the DFT of the missing samples e [ i m ] , we get k X t =0 h t · E [ k + r − t ] = 0 (96)where E [ . ] is the DFT of e [ i ] . The received samples, d [ i ] ,can be thought of as the original over-sampled signal, x [ i ] ,minus the missing samples e [ i m ] . The error signal, e [ i ] , is thedifference between the corrupted and the original over-sampledsignal and hence is equal to the values of the missing samplesfor i = i m and is equal to zero otherwise. In the frequencydomain, we have E [ j ] = X [ j ] − D [ j ] , j = 1 , . . . , n (97)Since X [ j ] = 0 for j ∈ Θ (see the footnote on page 10),then E [ j ] = − D [ j ] , j ∈ Θ (98)The remaining values of E [ j ] can be found from (96), bythe following recursion: E [ r ] = − h k k X t =1 h k − t E [ r + t ] (99)where r / ∈ Θ and the index additions are in mod ( n ) .A PPENDIX

IIIELP D

ECODING FOR I MPULSIVE N OISE C HANNELS [32],[96]For all integer values of r such that r ∈ Θ and r + k ∈ Θ ,we obtain a system of k equations with k + 1 unknowns ( h t coefﬁcients). These equations yield a unique solution for thepolynomial with the additional condition that the ﬁrst nonzero h t is equal to one. After ﬁnding the coefﬁcients, we need todetermine the roots of the polynomial (93). Since the rootsof H ( z ) are of the form e j π · imn , the inverse DFT (IDFT) ofthe { h m } km =0 can be used. Before performing IDFT, we haveto pad n − − k zeros at the end of the { h m } km =0 sequenceto obtain an n -point signal. We refer to the new signal (afterIDFT) as { H i } n − i =0 . Each zero in { H i } represents an error in r [ i ] at the same location. A PPENDIX

IVP

ROOFS FOR

MDL

FORMULAS

Proof of (58): The eigenvectors of R − ML are the same asthe ones for ˆ R and its eigenvalues are the reciprocals of theeigenvalues of R ML and we know the eigenvectors constitutean orthonormal set. Thus we have: tr ( R − ML ˆ R ) = tr (cid:0) ( k X i =1 ˆ λ − i ˆ v i ˆ v Hi + ˆ σ − ML n X i = k +1 ˆ v i ˆ v Hi ) · ( n X i =1 ˆ λ i ˆ v i ˆ v Hi ) (cid:1) = tr (cid:0) k X i =1 ˆ v i ˆ v Hi + n X i = k +1 ˆ λ i ˆ σ ML ˆ v i ˆ v Hi (cid:1) (100)where tr ( . ) represents the trace operator on matrices. We knowthat if both AB and BA are deﬁned, tr ( AB ) = tr ( BA ) .Thus: tr (cid:0) ˆ v i ˆ v Hi (cid:1) = tr (cid:0) ˆ v Hi ˆ v i (cid:1) (101)and since ˆ v Hi ˆ v i = 1 for i = 1 , . . . , n , we have: tr ( R − ML ˆ R ) = k X i =1 tr (cid:0) ˆ v Hi ˆ v i (cid:1) + n X i = k +1 ˆ λ i ˆ σ ML tr (cid:0) ˆ v Hi ˆ v i (cid:1) = k + ˆ σ − ML n X i = k +1 ˆ λ i (102)We also have ˆ σ ML = n − k P ni = k +1 ˆ λ i , which results in: tr ( R − ML ˆ R ) = k + n − k = n (103)Proof of (60): The term κ log m is the penalty function andwe have: − log f ( x ; R ML ) = − log (cid:18) | π R ML | m e − tr ( R − ML ˆ R ) (cid:19) = m log( π ) + m log( | R ML | )+ tr ( R − ML ˆ R )= m log | R ML | + n + m log( π ) | {z } C ( m ) (104)The ﬁrst term of the above equation can be written as: m log | R ML | = m log (cid:0) ( k Y i =1 ˆ λ i )((ˆ σ ML ) n − k ) (cid:1) = m k X i =1 log(ˆ λ i )+ m ( n − k ) log (cid:18) P ni = k +1 ˆ λ i n − k (cid:19) (105)therefore, − log f ( x ; R ML ) = m n X i =1 log(ˆ λ i )+ m ( n − k ) log (cid:18) n − k n X i = k +1 ˆ λ i (cid:19) + C ( m ) (106) where C ( m ) only depends on the parameter m and not k .Thus we can ignore this term in the MDL criterion.A PPENDIX VS PARSE M ATRIX M ANIPULATIONS

A. Solution of Linear Systems

Consider the linear system Ax = b (107)where A is an m × n matrix, x is the unknown, and b is an m × vector. For the exact solution of this system, there arethree cases:1) The square matrix A ( m = n ) is invertible (non-singular): there is a unique solution x = A − b .2) The matrix A is singular and b ∈ R ( A ) ( b belongsto the Range of A ): there are inﬁnitely many solutions x = x + v , for all v ∈ Ker ( A ) ( Null space of A ),where x is a particular solution of Ax = b .3) The matrix A is singular and b / ∈ R ( A ) : there is nosolution.However, for large n , calculating the inverse of A is acomplex task. Further, if approximate solutions are found, itmay be difﬁcult to estimate how accurate they are. This willdepend on the entries of matrix A . Further, for the case above, where n ≤ m and rank ( A ) = n , denote the pseudoinverse by A + = ( A T A ) − A T , the problem is converted toﬁnding the vector x = A + b (an approximate solution) subjectto the following conditions: • x satisﬁes the least square solution. Namely, vector x minimizes the norm of k r k , where the residual vector r = b − Ax . • x is the solution of the system Ax = b − r , when A T r =0 .Note that in the case that A is the product of certain specialmatrices (diagonal, orthogonal, triangular, and other invertiblematrices), the solution can be found by various direct methods[34], [35], [36]. B. Iterative Methods

Consider an n × n real coefﬁcient matrix A and a real n -vector b in linear system (107), the decomposition of A as: A = L + D + U (108)where D is the diagonal matrix of A with all non-zero entries,and L and U are the strict lower and upper matrices of A ,respectively. The iterative solution vector x k +1 is given by x k +1 = − ( D + L ) − (cid:2) Ux k − b (cid:3) (Gauss-Seidel) x k +1 = − D − (cid:2) ( L + U ) x k − b (cid:3) (Jacobi) (109) or in a general form, x k +1 = Gx k + f (110)where in Jacobi iterations G = G J ( A ) = I − D − A , f = D − b , and in Guass-Seidel iteration G = G GS ( A ) = I − ( D + L ) − A , f = ( D + L ) − b . This iteration can be viewedas a technique for solving the linear system: ( I − G ) x = f , or M − Ax = M − b (111)where the precondition matrix M is given by M J = D , and M GS = D + L . The iterations given above converge and thelimit is a solution of the original linear system (for details see[36], [221], [222]).

1) Krylov Subspace Methods:

Let x be an initial approx-imation to the solution of (107), r = b − Ax be theinitial residual, and let K m ( A , r ) be the Krylov subspaceof dimension m deﬁned by K m ( A , r ) = Span (cid:8) r , Ar , . . . , A m − r (cid:9) (112)where these subspaces are nested, i.e., K m ⊆ K m +1 , ( m =1 , , , . . . ). Krylov subspace methods are iterative methods inwhich x m , an approximation to the solution of (107), at the m th step, is found in x + K m , namely the approximation isof the form x = x + q m − ( A ) r (113)where, q m − is a polynomial of degree at most m − . Ifthe system is real, then coefﬁcients of q m − are real. Thisnatural expression implies that the residual r m = b − Ax m isassociated with the so-called residual polynomial p m of degreeat most m with p m (0) = 1 because r m = b − Ax m = r − A p m ( A ) r = p m ( A ) r (114)The error satisﬁes e = x m − x ∗ = p m ( A )( x − x ∗ ) ,where x ∗ is the exact solution of (107). Let us denote by P m the set of all polynomials p of degree at most m suchthat p (0) = 1 . The approximation x m ∈ x + K m is oftenfound by requiring x m to be the minimizer of some functional.There are different methods depending on the characteristicsof the matrix and implementation. Thus, each method deﬁnesimplicitly a different polynomial p m ∈ P m (for details see[223]).The extra condition imposed for convergence and complete-ness are • r rm is orthogonal to K m (Galerkin condition: r m ⊥ K m ), • Minimum residual condition: r m = min x ∈ x + K m k b − Ax k (115)We note that the nested property of the Krylov subspacesimply that any method for which one of the conditions (115)holds will terminate in at most n steps. The desired methodsare those which produce a good approximation to the solutionof (107) in many fewer than n iterations. An importantingredient that makes Krylov subspace methods work is the use of preconditioners, a matrix or operator M used to convertequation (107) into (111). There is no one method which isrecommended for all problems.Some of the applications on Sparse matrices besides the onediscussed in the main body of the paper are: C. Applications in Photonics and Electromagnetics

Numerical simulation for the development of photonics andelectromagnetic CAD software packages has been the subjectof intensive research in the past decade. The most widelyused simulation method is the Finite-Difference Time-Domain(FDTD) for time-domain analysis of Maxwell’s equations. Themore recently introduced schemes would require the solutionof large sparse linear systems at each unit of time [224].A popular method for time-domain problems is the time-domain beam propagation method [225], which necessitatesa multiplication of the input ﬁeld vector with a very sparsematrix. This sparse multiplication is advantageous in that themethod can become very efﬁcient and highly parallel [226].Another method for the time-domain simulation of Maxwell’sequations is the Finite-Element Time-Domain (FETD) tech-nique [224]. The FETD leads to an almost sparse linearsystem; the computational complexity of this method can beconsiderably reduced by inverting the sparse matrix [227].

D. Applications in Genomic Signal Processing [228]

Micro-arrays (DNA and protein) are parallel biosensorscapable of detecting a large number of different genomicparticles simultaneously. DNA micro-arrays that use tens ofthousands of probe spots detect multiple targets in a singleexperiment. This is a wasteful use of the sensing resources incomparative DNA micro-array experiments. Generally, only afraction of the total number of genes with respect to a referencesample is differentially expressed, and, thus, a vast numberof probe spots may not provide any useful information. Analternative design is the so-called compressed micro-arrays;this translates to signiﬁcantly lower costs, simpler imageacquisition and processing, and smaller amount of genomicmaterial needed for experiments. To recover signals fromcompressed micro-array measurements, ideas from CS (Sec.II-B) can be employed. For sparse measurement matrices,sparse algorithms can be used to lower the computationalcomplexity than the widely used linear-programming-basedmethods [92], [228].A

CKNOWLEDGMENTS

We would like to sincerely thank our colleagues for theirspeciﬁc contributions in various sections in this paper. Espe-cially, Dr. M. Nouri Moghadam from Math. Dept. of GeorgeWashington University, who wrote Appendix V; Dr. H. Saeedifrom University of Massachusetts who contributed to LDPCcodes, and Dr. K. Mehrany from EE Dept. of Sharif Universitywho contributed to section V-C and also M. Valiollahzadehwho edited and contributed to the SCA section. We are espe-cially indebted to Prof. B. Sankur from Bogazici University inTurkey for his careful review and comments. The contribution of some of the sparse array examples by Dr. A. Austeng fromUniversity of Oslo is also appreciated. We are also thankful tothe students of the Multimedia Lab and members of ACRI atSharif University for their invaluable help and simulations. Weare speciﬁcally indebted to V. Montazer Hodjat, S. Jafarzadeh,A. Salemi, M. Soltanalian, E. Azizi and H. Firuzi.R EFERENCES[1] P. J. S. G. Ferreira, “Mathematics for multimedia signal processingII: Discrete ﬁnite frames and signal reconstruction,”

Signal Proc. forMultimedia, IOS press , vol. 174, pp. 35–54, Dec. 1999.[2] F. Marvasti,

Nonuniform Sampling: Theory and Practice . Springer,formerly Kluwer Academic/Plenum Publishers, 2001.[3] R. G. Baraniuk, “A lecture on compressive sensing,”

IEEE Signal Proc.Magazine , vol. 24, no. 4, pp. 118–121, July 2007.[4] M. Vetterli, P. Marziliano, and T. Blu, “Sampling signals with ﬁniterate of innovation,”

IEEE Trans. on Signal Proc. , vol. 50, no. 6, pp.1417–1428, June 2002.[5] S. Lin and D. J. Costello,

Error control coding . Prentice-Hall,Englewood Cliffs, NJ, 1983.[6] T. Richardson and R. Urbanke,

Modern Coding Theory . CambridgeUniversity Press, 2008.[7] F. Marvasti, M. Hung, and M. R. Nakhai, “The application of walshtransform for forward error correction,”

Proc. of IEEE Int. Conf. onAcoustics, Speech and Signal Proc., ICASSP’99 , vol. 5, pp. 2459–2462,1999.[8] S. L. Marple,

Digital Spectral Analysis . Prentice-Hall, EnglewoodCliffs, NJ, 1987.[9] S. M. Kay and S. L. Marple, “Spectrum analysis-a modern perspective,”

Proc. IEEE, Reprinted by IEEE Press, (Modern Spectrum Analysis II) ,vol. 69, no. 11, pp. 1380–1419, Nov. 1981.[10] S. M. Kay,

Modern Spectral Estimation: Theory and Application .Prentice-Hall, Englewood Cliffs, N.J., Jan. 1988.[11] P. Stoica and R. L. Moses,

Introduction to Spectral Analysis . UpperSaddle River, NJ: Prentice-Hall, 1997.[12] P. Stoica and A. Nehorai, “Music, maximum likelihood, and Cramer-Rao bound,”

IEEE Trans. on ASSP , vol. 37, no. 5, pp. 720–741, May1989.[13] ——, “Performance study of conditional and unconditional direction-of-arrival estimation,”

IEEE Trans. on ASSP , vol. 38, no. 10, pp. 1783–1795, October 1990.[14] S. Holm, A. Austeng, K. Iranpour, and J. F. Hopperstad, “Sparse sam-pling in array processing,” in

Nonuniform Sampling: Theory and Prac-tice , F. Marvasti, Ed. Springer, formerly Kluwer Academic/PlenumPublishers, 2001, pp. 787–833.[15] S. Aeron, M. Zhao, and V. Saligrama, “Fundamental tradeoffs betweensparsity, sensing diversity and sensing capacity,”

Asilomar Conf. onSignals, Systems and Computers, ACSSC ’06, Paciﬁc Grove, CA , pp.295–299, Oct.-Nov. 2006.[16] P. Boﬁll and M. Zibulevsky, “Underdetermined blind source separationusing sparse representations,”

Signal Proc., Elsevier , vol. 81, no. 11,pp. 2353–2362, Nov. 2001.[17] M. A. Girolami and J. G. Taylor,

Self-Organising Neural Networks: In-dependent Component Analysis and Blind Source Separation . SpringerVerlag, London, 1999.[18] P. Georgiev, F. Theis, and A. Cichocki, “Sparse component analysis andblind source separation of underdetermined mixtures,”

IEEE Trans. onNeural Networks , vol. 16, no. 4, pp. 992–996, July 2005.[19] M. Aharon, M. Elad, and A. M. Bruckstein, “The k-svd: An algorithmfor designing of overcomplete dictionaries for sparse representation,”

IEEE Trans. on Signal Proc. , vol. 54, no. 11, pp. 4311–4322, Nov.2006.[20] ——, “On the uniqueness of overcomplete dictionaries, and a practicalway to retrieve them,”

Linear Algebra and Applications , vol. 416, no. 1,pp. 48–67, July 2006.[21] R. Gribonval and M. Nielsen, “Sparse representations in unions ofbases,”

IEEE Trans. on Info. Theory , vol. 49, no. 12, pp. 3320–3325,Dec. 2003.[22] P. Fertl and G. Matz, “Efﬁcient OFDM channel estimation in mobileenvironments based on irregular sampling,” in Proc. Asilomar Conf.Signals, Systems, Computers, Paciﬁc Grove, CA, Okt , pp. 1777–1781,Nov. 2006. [23] O. Ureten and N. Serinken, “Decision directed iterative equalization ofOFDM symbols using non-uniform interpolation,”

Proc. IEEE VTC’06(fall), Montreal, Canada .[24] M. Soltanolkotabi, A. Amini, and F. Marvasti, “OFDM channel estima-tion based on adaptive thresholding for sparse signal detection,” arXive0901.3948 , Jan. 2009.[25] J. L. Brown, “Sampling extentions for multiband signals,”

IEEE Trans.on Acoust. Speech, Signal Proc. , vol. 33, pp. 312–315, Feb. 1985.[26] O. G. Guleryuz, “Nonlinear approximation based image recovery usingadaptive sparse reconstructions and iterated denoising, parts I and II,”

IEEE Trans. on Image Proc. , vol. 15, no. 3, pp. 539–571, March 2006.[27] H. Rauhut, “On the impossibility of uniform sparse reconstructionusing greedy methods,”

STSIP , vol. 7, no. 2, pp. 197–215, May 2008.[28] T. Blu, P. Dragotti, M. Vetterli, P. Marziliano, and L. Coulot, “Sparsesampling of signal innovations: Theory, algorithms, and performancebounds,”

IEEE Signal Proc. Magazine , vol. 25, no. 2, March 2008.[29] F. Marvasti, “Guest editor’s comments on special issue on nonuniformsampling,”

STSIP , vol. 7, no. 2, pp. 109–112, May 2008.[30] D. Donoho, “Compressed sensing,”

IEEE Trans. on Info. Theory ,vol. 52, no. 4, pp. 1289–1306, Apr. 2006.[31] E. J. Candes, “Compressive sampling,”

Proc. of the Int. Congress ofMathematics, Madrid, Spain , vol. 3, pp. 1433–1452, 2006.[32] S. Zahedpour, S. Feizi, A. Amini, and F. Marvasti, “Impulsive noisecancellation based on soft decision and recursion,”

IEEE Trans. onInstrumentation and Measurement , vol. 67, no. 12, Dec. 2008.[33] S. Feizi, S. Zahedpour, M. Soltanolkotabi, A. Amini, and F. Marvasti,“Salt and pepper noise removal for images,”

IEEE Proc. on Int. Conf.on Telecommunications ICT’08, St. Petersburg, Russia , pp. 1–5, June2008.[34] Y. Saad,

Iterative Methods for Sparse Linear Systems . SIAMpublications, 2nd edition, 2003.[35] T. A. Davis,

Direct Methods for Sparse Linear Systems (Fundamentalsof Algorithms) . SIAM Publications, 2006.[36] J. S. Duff, A. M. Erisman, and J. K. Reid,

Direct Methods for Sparsematrices . Oxford University Press, 1986.[37] P. D. Grunwald, I. J. Myung, and M. A. Pitt,

Advances in MinimumDescription Length: Theory and Applications . MIT Press, April 2005.[38] A. Kumar, P. Ishwar, and K. Ramchandran, “On distributed samplingof bandlimited and non-bandlimited sensor ﬁelds,”

IEEE Int. Conf. onAcoustics, Speech and Signal Proc., ICASSP’04, Montreal, Canada ,vol. 3, pp. 925–928, May 2004.[39] P. Fertl and G. Matz, “Multi-user channel estimation in OFDMA uplinksystems based on irregular sampling and reduced pilot overhead,”

Proc.IEEE Int. Conf. on Acoustics, Speech and Signal Proc. ICASSP’07,Hawaii , vol. 3, pp. 297–300, April 2007.[40] F. Marvasti, “Spectral analysis of random sampling and error freerecovery by an iterative method,”

Trans. of IECE of Japan , vol. 69,no. 2, pp. 79–82, Feb. 1986.[41] R. A. DeVore, “Deterministic constructions of compressed sensingmatrices,”

Journal of Complexity , vol. 23, no. 4-6, pp. 918–925, Aug.2007.[42] F. Marvasti and A. K. Jain, “Zero crossings, bandwidth compression,and restoration of nonlinearly distorted band-limited signals,”

Journalof Opt. Soc. of America , vol. 3, no. 5, pp. 651–654, 1986.[43] A. Aldroubi, C. Cabrelli, and U. Molter, “Optimal non-linear modelsfor sparsity and sampling,” to be published in the Journal of FourierAnalysis and Applications, Special Issue on Compressed Sampling ,vol. 14, no. 6, pp. 48–67, July 2008.[44] A. Amini and F. Marvasti, “Convergence analysis of an iterative methodfor the reconstruction of multi-band signals from their uniform andperiodic nonuniform samples,”

STSIP , vol. 7, no. 2, pp. 109–112, May2008.[45] P. J. S. G. Ferreira, “Iterative and non-iterative recovery of missingsamples for 1-D band-limited signals,” in

Nonuniform Sampling:Theory and Practice , F. Marvasti, Ed. Springer, formerly KluwerAcademic/Plenum Publishers, 2001, pp. 235–281.[46] ——, “The stability of a procedure for the recovery of lost samples inband-limited signals,”

IEEE Trans. on Signal Proc. , vol. 40, no. 3, pp.195–205, Dec. 1994.[47] F. Marvasti,

A Uniﬁed Approach to Zero-crossings and Nonuni-form Sampling of Single and Multi-dimensional Signals and Systems .Nonuniform Publication, Oak Park, ILL, 1987.[48] ——,

Nonuniform Sampling , R. J. Marks, Ed. New York: Springer-Verlag, 1993.[49] A. I. Zayed and P. L. Butzer, “Lagrange interpolation and samplingtheorems,” in

Nonuniform Sampling: Theory and Practice , F. Marvasti, Ed. Springer, formerly Kluwer Academic/Plenum Publishers, 2001,pp. 123–168.[50] F. Marvasti, P. Clarkson, M. Dokic, U. Goenchanart, and C. Liu,“Reconstruction of speech signals with lost samples,”

IEEE Trans. onSignal Proc. , vol. 40, no. 12, pp. 2897–2903, Dec. 1992.[51] M. Unser, “Sampling-50 years after Shannon,”

Proc. of the IEEE ,vol. 88, no. 4, pp. 569–587, Apr. 2000.[52] F. Marvasti, “Random topics in nonuniform sampling,” in

NonuniformSampling: Theory and Practice , F. Marvasti, Ed. Springer, formerlyKluwer Academic/Plenum Publishers, 2001, pp. 169–234.[53] F. Marvasti, M. Hasan, M. Eckhart, and S. Talebi, “Efﬁcient algorithmsfor burst error recovery using FFT and other transform kernels,”

IEEETrans. on Signal Proc. , vol. 47, no. 4, April 1999.[54] F. Marvasti, M. Analoui, and M. Gamshadzahi, “Recovery of signalsfrom nonuniform samples using iterative methods,”

IEEE Trans. onASSP , vol. 39, no. 4, pp. 872–878, April 1991.[55] H. Feichtinger and K. Gr¨ochenig, “Theory and practice of irregularsampling,” in

Wavelets- Mathematics and Applications , J. J. Benedettoand et. al., Eds. CRC Publications, 1994, pp. 305–363.[56] P. J. S. G. Ferreira, “Noniterative and fast iterative methods forinterpolation and extrapolation,”

IEEE Trans. on Signal Proc , vol. 42,no. 11, pp. 3278–3282, Nov. 1994.[57] A. Aldroubi and K. Gr¨ochenig, “Non-uniform sampling and recon-struction in shift-invariant spaces,”

SIAM Review , vol. 43, no. 4, pp.585–620, 2001.[58] A. Aldroubi, “Non-uniform weighted average sampling and exactreconstruction in shift-invariant and wavelet spaces,”

Applied andComputational Harmonic Analysis , vol. 13, no. 2, pp. 151–161, sept.2002.[59] A. Papoulis and C. Chamzas, “Detection of hidden periodicities byadaptive extrapolation,”

IEEE Trans. on Acoustics, Speech, and SignalProc. , vol. 27, no. 5, pp. 492–500, Oct. 1979.[60] C. Chamzas and W. Y. Xu, “An improved version of Papoulis-Gerchberg algorithm on band-limited extrapolation,”

IEEE Trans. onAcoustics, Speech, and Signal Proc. , vol. 32, no. 2, pp. 437–440, April1984.[61] P. J. S. G. Ferreira, “Interpolation and the discrete Papoulis-Gerchbergalgorithm,”

IEEE Trans. on Signal Proc. , vol. 42, no. 10, Oct. 1994.[62] K. Gr¨ochenig and T. Strohmer, “Numerical and theoretical aspectsof non-uniform sampling of band-limited images,” in

NonuniformSampling: Theory and Practice , F. Marvasti, Ed. Springer, formerlyKluwer Academic/Plenum Publishers, 2001, pp. 283–324.[63] D. C. Youla, “Generalized image restoration by the method of alternat-ing orthogonal projections,”

IEEE Trans. Circuits Syst. , vol. 25, no. 9,pp. 694–702, Sept. 1978.[64] D. C. Youla and H. Webb, “Image restoration by the method of convexprojections: Part 1 -theory,”

IEEE Trans. Med. Imag. , vol. 1, no. 2, pp.81–94, Oct. 1982.[65] K. Gr¨ochenig, “Acceleration of the frame algorithm,”

IEEE Trans. onSignal Proc. , vol. 41, no. 12, pp. 3331–3340, Dec. 1993.[66] A. Ali-Amini, M. Babaie-Zadeh, and C. Jutten, “A new approach forsparse decomposition and sparse source separation,”

EUSIPCO2006,Florence , Sept. 2006.[67] F. Marvasti, “Applications to error correction codes,” in

NonuniformSampling: Theory and Practice , F. Marvasti, Ed. Springer, formerlyKluwer Academic/Plenum Publishers, 2001, pp. 689–738.[68] Y. Tsaig and D. Donoho, “Extensions of compressed sensing,”

SignalProc. , vol. 86, no. 3, pp. 549–571, March 2006.[69] E. Candes and T. Tao, “Near-optimal signal recovery from random pro-jections: Universal encoding strategies,”

IEEE Trans. on Info. Theory ,vol. 52, no. 12, pp. 5406– 5425, Dec. 2006.[70] A. J. Jerri, “The Shannon sampling theorem-its various extension andapplications: a tutorial review,”

Proc. of IEEE , vol. 65, no. 11, pp.1565–1596, April 1977.[71] E. Candes and M. Wakin, “An introduction to compressive sampling,”

IEEE Signal Proc. Magazine , vol. 25, no. 2, pp. 21–30, March 2008.[72] Y. Eldar, “Compressed sensing of analog signals,”

Preprint , 2008.[73] O. Christensen,

An introduction to frames and Riesz basis , ser. Appliedand Numerical Harmonic Analysis. Birkhauser, 2003.[74] E. Candes and J. Romberg, “Sparsity and incoherence in compressivesampling,”

Inverse Problems , vol. 23, pp. 969–985, April 2007.[75] D. Donoho and X. Hou, “Uncertainty principle and ideal atomicdecomposition,”

IEEE Trans. on Info. Theory , vol. 47, no. 7, pp. 2845–2862, Nov. 2001.[76] R. G. Baraniuk, M. Davenport, R. DeVore, and M. B. Wakin, “Asimple proof of the restricted isometry property for random matrices,”

Preprint , 2007. [77] A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin, “Algo-rithmic linear dimension reduction in the ℓ norm for sparse vectors,”preprint, 2006.[78] D. L. Donoho, “For most large underdetermined systems of linearequations the minimal ℓ -norm solution is also the sparsest solution,” Comm. Pure Appl. Math , vol. 59, pp. 797–829, 2004.[79] J. A. Tropp, “Recovery of short linear combinations via ℓ minimiza-tion,” IEEE Trans. on Info. Theory , vol. 90, no. 4, pp. 1568–1570, July2005.[80] E. Candes and T. Tao, “Decoding by linear programming,”

IEEE Trans.on Info. Theory , vol. 51, pp. 4203–4215, Dec. 2005.[81] J. A. Tropp, “Greed is good: Algorithmic results for sparse approxi-mation,”

IEEE Trans. on Info. Theory , vol. 50, no. 10, pp. 2231–2242,Oct. 2004.[82] J. Tropp and A. Gilbert, “Signal recovery from partial information viaorthogonal matching pursuit,”

IEEE Trans. on Info. Theory , vol. 53,no. 12, pp. 4655–4666, Dec. 2007.[83] E. Candes and J. Romberg, “Quantitative robust uncertainty principlesand optimally sparse decompositions,”

Foundations of Comput. Math. ,vol. 6, no. 2, pp. 227–254, Apr. 2006.[84] E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency infor-mation,”

IEEE Trans. on Info. Theory , vol. 52, no. 2, pp. 489–509,Feb. 2006.[85] V. Saligrama, “Deterministic designs with deterministic gaurantees:Toepliz compressed sensing matrices, sequence design and systemidentiﬁcation,” arXiv:0806.4958 , July 2008.[86] R. Berinde, A. C. Gilbert, P. Indyk, H. Karloff, and M. J. Strauss,

Combining geometry and combinatorics: A uniﬁed approach to sparsesignal recovery . preprint 2008.[87] M. F. Duarte, M. B. Wakin, and R. G. Baraniuk, “Fast reconstruction ofpiecewise smooth signals from random projections,”

In Proc. SPARS05,Rennes, France , vol. 1, pp. 1064–1070, Nov. 2005.[88] A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss, “One-pass wavelet decompositions of data streams,”

IEEE Trans. Knowl.Data Eng. , vol. 15, no. 3, pp. 541–554, May/June 2003.[89] A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin, “One sketchfor all: fast algorithms for compressed sensing,”

In ACM STOC 2007 ,pp. 237–246, 2007.[90] S. Sarvotham, D. Baron, and R. G. Baraniuk, “Compressed sensingreconstruction via belief propagation,”

Technical Report ECE-0601,ECE Dept., Rice University , July. 2006.[91] ——, “Sudocodes - fast measurement and reconstruction of sparsesignals,”

IEEE ISIT , 2006.[92] W. Xu and B. Hassibi, “Efﬁcient compressive sensing with determin-istic guarantees using expander graphs,”

IEEE Info. Theory Workshop,ITW’07 , pp. 414–419, Sep. 2007.[93] A. Aldroubi, H. Wang, and K. Zaringhalam,

Sequential compressedsampling via Huffman codes . preprint 2008.[94] P. L. Dragotti, M. Vetterli, and T. Blu, “Sampling moments andreconstructing signals of ﬁnite rate of innovation: Shannon meetsstrang-ﬁx,”

IEEE Trans. on Signal Proc. , vol. 55, no. 5, pp. 1741–1757, May 2007.[95] I. Maravic and M. Vetterli, “Sampling and reconstruction of signalswith ﬁnite rate of innovation in the presence of noise,”

IEEE Trans.on Signal Proc. , vol. 53, no. 8, pp. 2788–2805, Aug. 2005.[96] P. Azmi and F. Marvasti, “Robust decoding of DFT-based error-controlcodes for impulsive and additive white gaussian noise channels,”

IEEProc. Comm. , vol. 152, no. 3, pp. 265–271, June 2005.[97] F. Marvasti and M. Naﬁe, “Sampling theorem: A uniﬁed outlook oninformation theory, block and convolutional codes,”

Special Issue onInfo. Theory and its Applications, IEICE Trans. on Fundamentals ofElectronics, Comm. and Computer Sciences, Section E , vol. 76, no. 9,pp. 1383–1391, Sept. 1993.[98] J. Wolf, “Redundancy, the discrete Fourier transform, and impulse noisecancellation,”

IEEE Trans. on Comm. , vol. 31, no. 3, pp. 458–461, Mar1983.[99] R. E. Blahut, “Transform techniques for error control codes,”

IBMJournal of Research and Development , vol. 23, no. 3, pp. 299–315,May 1979.[100] T. G. M. Jr., “Coding of real-number sequences for error correction:A digital signal processing problem,”

IEEE Trans. on Selected Areasin Comm. , vol. 2, no. 2, March 1984.[101] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limiterror- correcting coding and decoding: Turbo codes,”

Geneva, May1993 , vol. 2, pp. 1064–1070, Aug. 1993. [102] C. N. Hadjicostis and G. C. Verghese, “Coding approaches to faulttolerance in linear dynamic systems,” IEEE Trans. on Info. Theory ,vol. 51, no. 1, pp. 210–228, Jan. 2005.[103] C. J. Anﬁnson and F. T. Luk, “A linear algebraic model of algorithm-based fault tolerance,”

IEEE Trans. on Comput. , vol. 37, no. 12, pp.1599–1604, Dec. 1988.[104] V. S. S. Nair and J. A. Abraham, “Real-number codes for fault-tolerantmatrix operations on processor arrays,”

IEEE Trans. on Computer ,vol. 39, no. 4, pp. 426–435, Apr. 1990.[105] A. L. N. Reddy and P. Banerjee, “Algorithm-based fault detection forsignal processing applications,”

IEEE Trans. on Computer , vol. 39,no. 10, pp. 1304–1308, Oct. 1990.[106] J. M. N. Vieira and P. J. S. G. Ferreira, “Interpolation, spectrumanalysis, error-control coding, and fault tolerant computing,”

In Proc.of ICASSP’97, Munich, Germany , vol. 3, pp. 1831–1834, Apr. 1997.[107] F. Marvasti, “Error concealment of speech, image and video signals,”

United Stated Patents 6,601,206 , July 2003.[108] M. Naﬁe and F. Marvasti, “Implementation of recovery of speech withmissing samples on a DSP chip,”

Electronics Letters , vol. 30, no. 1,pp. 12–13, Jan. 1994.[109] A. Momenai and S. Talebi, “Improving the stability of the DFT errorrecovery codes by using the vandermonde fast decoding,”

Proc. ofICASSP’06 , 2006.[110] ——, “Improving the stability of DFT error recovery codes by usingsparse oversampling patterns,” , vol. 87, no. 6, pp. 1448–1461, June 2007.[111] C. Wong, F. Marvasti, and W. Chambers, “Implementation of recoveryof speech with impulsive noise on a DSP chip,”

Electronics Letters ,vol. 31, no. 17, pp. 1412–1413, Aug. 1995.[112] D. Mandelbaum, “On decoding of Reed-Solomon codes,”

IEEE Trans.on Info. Theory , vol. 17, no. 6, pp. 707–712, Nov. 1971.[113] R. G. Gallager, “Low-density parity-check codes,”

IRE Trans. on Info.Theory , vol. 8, pp. 21–28, Jan. 1962.[114] D. J. C. MacKay and R. M. Neal, “Near Shannon-limit performanceof low-density parity-check codes,”

Electronics Letters , vol. 32, pp.1645–1646, Aug. 1996.[115] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binaryblock and convolutional codes,”

IEEE Trans. on Info. Theory , vol. 42,pp. 429–445, March 1996.[116] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design ofcapacity approaching irregular low-density parity-check codes,”

IEEETrans. on Info. Theory , vol. 47, no. 2, pp. 619–637, Feb. 2001.[117] R. M. Tanner, “A recursive approach to low complexity codes,”

IEEETrans. on Info. Theory , vol. 27, no. 5, pp. 533–547, Sept. 1981.[118] J. Feldman, “Decoding error-correcting codes via linear programming,”Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge,MA, 2003.[119] J. Feldman, M. J. Wainwright, and D. R. Karger, “Using linearprogramming to decode binary linear codes,”

IEEE Trans. on Info.Theory , vol. 51, no. 3, pp. 954–972, March 2005.[120] P. O. Vontobel and R. Koetter, “On low-complexity linear-programmingdecoding of LDPC codes,”

European Trans. on Telecommunications ,vol. 18, no. 5, pp. 509–517, 2007.[121] S. K. Chilappagari and M. Chertkov, “Provably efﬁcient instantonsearch algorithm for LP decoding of LDPC codes over the BSC,” submitted to IEEE Trans. on Info. Theory, arXiv:0808.2515v2 , Sept.2008.[122] B. G. R. de Prony, “Essai ´experimental et analytique: sur les lois dela dilatabilit´e de ﬂuides ´elastique et sur celles de la force expansivede la vapeur de lalkool, ´a diff´erentes temp´eratures,”

Journal de l ´EcolePolytechnique , vol. 1, pp. 24–76, 1795.[123] J. J. Fuchs, “Extension of the Pisarenko method to sparse linear arrays,”

IEEE Trans. on Signal Proc. , vol. 45, no. 10, pp. 2413–2421, Oct. 1997.[124] R. Schmidt, “Multiple emitter location and signal parameter estima-tion,”

IEEE Trans. on Antennas and Propagation , vol. 34, no. 3, pp.276–280, March 1986.[125] H. Krim and M. Viberg, “Two decades of array signal processingresearch: the parametric approach,”

IEEE Signal Proc. Magazine ,vol. 13, no. 4, pp. 67–94, July 1996.[126] B. D. V. Veen and K. M. Buckley, “Beamforming: a versatile approachto spatial ﬁltering,”

IEEE ASSP Magazine , vol. 5, no. 2, pp. 4–24, Apr.1988.[127] S. Valaee and P. Kabal, “An information theoretic approach to sourceenumeration in array signal processing,”

IEEE Trans. on Signal Proc. ,vol. 52, no. 5, pp. 1171–1178, May 2004. [128] R. Roy and T. Kailath, “Esprit-estimation of signal parameters viarotational invariance techniques,”

IEEE Trans. on ASSP , vol. 37, no. 7,pp. 984–995, July 1989.[129] M. Wax and T. Kailath, “Detection of signals by information theoreticcriteria,”

IEEE Trans. on ASSP , vol. 33, no. 2, pp. 387–392, April1985.[130] I. Ziskind and M. Wax, “Maximum likelihood localization of multiplesources by alternating projection,”

IEEE Trans. on ASSP , vol. 36,no. 10, pp. 1553–1560, October 1988.[131] M. Viberg and B. Ottersten, “Sensor array processing based on sub-space ﬁtting,”

IEEE Trans. on Signal Proc. , vol. 39, no. 5, pp. 1110–1121, May 1991.[132] S. Shahbazpanahi, S. Valaee, and A. B. Gershman, “A covariance ﬁttingapproach to parametric localization of multiple incoherently distributedsources,”

IEEE Trans. on Signal Proc. , vol. 52, no. 3, pp. 592–600,March 2004.[133] J. Rissanen, “Modeling by shortest data description,”

Automatica ,vol. 14, pp. 465–471, 1978.[134] H. Akaike, “A new look on the statistical model identiﬁcation,”

IEEETrans. on Automatic Control , vol. 19, no. 6, pp. 716–723, Dec. 1974.[135] M. Kaveh, H. Wang, and H. Hung, “On the theoretical performanceof a class of estimators of the number of narrow-band sources,”

IEEETrans. on ASSP , vol. 35, no. 9, pp. 1350–1352, Sept. 1987.[136] Q. T. Zhang, K. M. Wong, P. C. Yip, and J. P. Reilly, “Statisticalanalysis of the performance of information theoretic criteria in thedetection of the number of signals in array processing,”

IEEE Trans.on ASSP , vol. 37, no. 10, pp. 1557–1567, Oct. 1989.[137] H. V. Poor,

An Introduction to Signal Detection Estimation . Springer,1994.[138] T. M. Cover and J. A. Thomas,

Elements of Information Theory, 2ndEd.

John Wiley & Sons, 2006.[139] J. Rissanen, “A universal prior for integers and estimation by minimumdescription length,”

Annals of Statistics , vol. 11, no. 2, pp. 416–431,June 1983.[140] T. W. Anderson and T. Wilbur,

An Introduction to Multivariate Statis-tical Analysis, 2nd Ed.

Wiley, 1958.[141] F. Haddadi, M. R. M. Mohammadi, M. M. Nayebi, and M. R. Aref,“Statistical performance analysis of detection of signals by informationtheoretic criteria,” submitted to IEEE Trans. on Signal Proc. , 2008.[142] R. M. Leahy and B. D. Jeffs, “On the design of maximally sparsebeamforming arrays,”

IEEE Trans. on Antennas and Propagation ,vol. 39, no. 8, pp. 1178–1187, Aug. 1991.[143] A. Austeng, J. E. Kirkebo, and S. Holm, “A ﬂexible algorithm forlayout-optimized sparse cmut arrays,”

Proc. IEEE Ultrasonics Symp.,(Montreal, Canada) , vol. 2, pp. 1266–1269, Aug. 2004.[144] S. Holm, A. Austeng, and J. E. Kirkebo, “Experience with sparse arraysat the university of Oslo,” in Proc. 29th ESA Antenna Workshop onMultiple Beams and Reconﬁgurable Antennas, Noordwijk, Netherlands ,pp. 215–218, Apr. 2007.[145] H. S. r Jacobsen and K. Madsen, “Synthesis of nonuniformly spacedarrays using a general nonlinear minimax optimization method,”

IEEETrans. on Antennas and Propagation , vol. 24, no. 4, pp. 501–506, July1976.[146] S. Holm, B. Elgetun, and G. Dahl, “Properties of the beampattern ofweight- and layout-optimized sparse arrays,”

IEEE Trans. on Ultrason-ics, Ferroelectrics and Frequency Control , vol. 44, no. 5, pp. 983–991,Sept. 1997.[147] W. J. Hendricks, “The totally random versus the bin approach forrandom arrays,”

IEEE Trans. Antennas and Propagation , vol. 39,no. 12, pp. 1757–1762, Dec. 1991.[148] G. R. Lockwood and F. S. Foster, “Optimizing the radiation pattern ofsparse periodic two-dimensional arrays,”

IEEE Trans. on Ultrasonic,Ferroelectrics and Frequency Control , vol. 43, pp. 15–19, Jan. 1996.[149] A. Austeng and S. Holm, “Sparse 2-D arrays for 3-D phased arrayimaging - design methods,”

IEEE Trans. on Ultrasonic, Ferroelectricsand Frequency Control , vol. 49, no. 8, pp. 1073–1086, Aug. 2002.[150] ——, “Sparse 2-D arrays for 3-D phased array imaging - experimentalvalidation,”

IEEE Trans. on Ultrasonic, Ferroelectrics and FrequencyControl , vol. 49, no. 8, pp. 1087–1093, Aug. 2002.[151] J. E. Kirkebo and A. Austeng, “Improved beamforming using curvedsparse 2-D arrays in ultrasound,”

IEEE Trans. Ultrasonic, Ferro-electrics and Frequency Control , vol. 46, no. 2, pp. 119–128, May2007.[152] J. E. Kirkebo, A. Austeng, and S. Holm, “Layout-optimized cylindricalsonar arrays,”

Proc. IEEE OCEANS, Kobe, Japan , vol. 2, pp. 598–602,Nov. 2004. [153] M. Gastpar and M. Vetterli, “Source-channel communication in sensornetworks,” in Lecture Notes in Computer Science, Springer, New York,NY , pp. 162–177, April 2003.[154] A. M. Sayeed, “A statistical signal modeling framework for wirelesssensor networks,” in Proc. 2nd Int. Workshop on Info. Proc. in SensorNetworks, IPSN’03, UW Tech. Rep. ECE-1-04 , pp. 162–177, Feb. 2004.[155] K. Liu and A. M. Sayeed, “Optimal distributed detection strategiesfor wireless sensor networks,” in Proc. 42nd Annual Allerton Conf. onComm., Control and Comp. , Oct. 2004.[156] A. D’Costa, V. Ramachandran, and A. Sayeed, “Distributed classiﬁca-tion of gaussian space-time sources in wireless sensor networks,” IEEEJournal on Selected Areas in Comm. , vol. 22, no. 6, pp. 1026–1036,Aug. 2004.[157] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressivewireless sensing,” in Proc. Int. Symposium on Info. Proc. in SensorNetworks, IPSN’06, Nashville, TN , pp. 134–142, Apr. 2006.[158] R. Rangarajan, R. Raich, and A. Hero, “Sequential design of exper-iments for a rayleigh inverse scattering problem,” in IEEE Workshopon Statistical Signal Proc., Bordeaux, France , July 2005.[159] Y. Yang and R. S. Blum, “Radar waveform design using minimummean-square error and mutual information,”

Fourth IEEE Workshopon Sensor Array and Multichannel Proc. , vol. 12, no. 14, pp. 234–238,July 2006.[160] A. Kumar, P. Ishwar, and K. Ramchandran, “On distributed samplingof smooth non-bandlimited ﬁelds,”

Int. Simp. On Info. Proc. In SensorNetwroks (ISPN2004) , pp. 89–98, April 2004.[161] E. Meijering, “A chronology of interpolation: From ancient astronomyto modern signal and image processing,”

In Proc. of IEE , vol. 90, no. 3,pp. 319–342, March 2002.[162] D. Ganesan, S. Ratnasamy, H. Wang, and D. Estrin, “Coping with irreg-ular spatio-temporal sampling in sensor networks,”

ACM SIGCOMMComputer Communication Review , vol. 34, no. 1, pp. 125–130, Jan.2004.[163] R. Wagner, R. Baraniuk, S. Du, D. Johnson, and A. Cohen, “Anarchitecture for distributed wavelet analysis and processing in sensornetworks,” in Proc. of Int. Workshop Info. Proc. in Sensor Networks,IPSN’06, Nashville, TN , pp. 243–250, Apr. 2006.[164] R. Wagner, H. Choi, R. Baraniuk, and V. Delouille, “Distributedwavelet transform for irregular sensor network grids,”

In Proc. IEEEStat. Signal Proc. Workshop (SSP) , July 2005.[165] S. S. Pradhan, J. Kusuma, and K. Ramchandran, “Distributed compres-sion in a dense microsensor network,”

IEEE Signal Proc. Magazine ,vol. 19, no. 2, pp. 51–60, March 2002.[166] M. Gastpar and M. Vetterli, “Power, spatio-temporal bandwidth, anddistortion in large sensor networks,”

IEEE Journal on Selected Areasin Comm. , vol. 23, no. 4, pp. 745–754, Apr. 2005.[167] W. U. Bajwa, A. M. Sayeed, and R. Nowak, “Matched source-channelcommunication for ﬁeld estimation in wireless sensor networks,”

InProc. 4th Int. Symposium on Info. Proc. in Sensor Networks, IPSN’05,Los Angeles, CA , no. 44, pp. 332–339, Apr. 2005.[168] R. Mudumbai, J. Hespanha, U. Madhow, and G. Barriac, “Scalablefeedback control for distributed beamforming in sensor networks,”

Proc. of the Int. Symposium on Info. Theory, ISIT’05 , Sept. 2005.[169] J. Haupt and R. Nowak, “Signal reconstruction from noisy randomprojections,”

IEEE Trans. on Info. Theory , vol. 52, no. 9, pp. 4036–4068, Aug. 2006.[170] D. Baron, M. B. Wakin, M. F. Duarte, S. Sarvotham, and R. G. Bara-niuk, “Distributed compressed sensing,” , Nov. 2005.[171] W. Wang, M. Garofalakis, and K. Ramchandran, “Distributed sparserandom projections for reﬁnable approximation,” in Proc. IPSN’07,Cambridge, MA , pp. 331–339, Apr. 2007.[172] M. F. Duarte, M. B. Wakin, D. Baron, and R. G. Baraniuk, “Universaldistributed sensing via random projections,” in Proc. Int. Workshop onInfo. Proc. in Sensor Networks, IPSN’06 , pp. 177–185, Apr. 2006.[173] J. Haupt, W. U. Bajwa, M. Rabbat, and R. Nowak, “Compressedsensing for networked data,”

IEEE Signal Proc. Magazine , pp. 92–101,March 2008.[174] M. Crovella and E. Kolaczyk, “Graph wavelets for spatial trafﬁcanalysis,”

INFOCOM 2003. Twenty-Second Annual Joint Conf. of theIEEE Computer and Communications Societies. IEEE , vol. 3, pp.1848–1857, Mar. 2003.[175] R. Coifman and M. Maggioni, “Diffusion wavelets,”

Applied Compu-tational and Harmonic Analysis , vol. 21, no. 1, pp. 53–94, June 2006.[176] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Joint source-channel communication for distributed estimation in sensor networks,”

IEEE Trans. on Info. Theory , vol. 53, no. 10, pp. 3629–3653, Oct.2007.[177] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossipalgorithms,”

IEEE Trans. on Info. Theory and IEEE/ACM Trans. onNetworking , vol. 52, no. 6, pp. 2508–2530, June 2006.[178] M. Rabbat, J. Haupt, A. Singh, and R. Nowak, “Decentralized compres-sion and predistribution via randomized gossiping,” in Proc. IPSN’06,Nashville, TN , pp. 51–59, Apr. 2006.[179] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradientprojection for sparse reconstruction: Application to compressed sensingand other inverse problems,”

IEEE Journal on Selected Topics in SignalProc. , vol. 1, no. 4, pp. 586–597, Dec. 2007.[180] S. J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, “A methodfor large-scale ℓ -regularized least squares problems with applicationsin signal processing and statistics,” IEEE Journal on Selected Topicsin Signal Proc. , vol. 1, no. 4, pp. 606–617, Dec. 2007.[181] Y. Rachlin, R. Negi, and P. Khosla, “Sensing capacity for discretesensor network applications,”

Proc. of the Fourth Int. Symposium onInfo. Proc. in Sensor Networks , pp. 126–132, Apr. 2005.[182] S. Aeron, M. Zhao, and V. Saligrama, “Information theoretic boundsto sensing capacity of sensor networks under ﬁxed SNR,”

IEEE Info.Theory Workshop, ITW’07, Lake Tahoe, CA , pp. 84–89, Sept. 2007.[183] ——, “Fundamental limits on sensing capacity for sensor networks andcompressed sensing,” submitted to IEEE Trans. on Info. Theory , no. 2,Apr. 2008.[184] S. Sanei and J. Chambers,

EEG Signal Processing . John Wiley.[185] J. Herault and C. Jutten, “Space or time adaptive signal processing byneural network models,”

Proc. of American Institute of Physics (AIP)Conf.: Neural Networks for Computing , pp. 206–211, 1986.[186] A. M. Aibinu, A. R. Najeeb, M. J. E. Salami, and A. A. Shaﬁe, “Op-timal model order selection for transient error autoregressive movingaverage (TERA) MRI reconstruction method,”

Proc. World Academyof Science, Engineering and Technology , vol. 32, pp. 191–195, Aug.2008.[187] M. S. Pedersen, J. Larsen, U. Kjems, and L. C. Parra,

A Survey ofConvolutive Blind Source Separation Methods , 2007.[188] P. Comon, “Independent component analysis: A new concept,”

SignalProc. , vol. 36, pp. 287–314, 1994.[189] M. Zibulevsky and B. A. Pearlmutter, “Blind source separation bysparse decomposition in a signal dictionary,”

Neural Comp. , vol. 13,no. 4, 2001.[190] ——, “Blind source separation by sparse decomposition,”

Univ. NewMexico Tech. Rep. CS99-1 , July 1999.[191] Y. Luo, J. A. Chambers, S. Lambotharan, and I. Proudler, “Exploitationof source non-stationarity in underdetermined blind source separationwith advanced clustering techniques,”

IEEE Trans. on Signal Proc. ,vol. 54, no. 6, 2006.[192] Y. Li, S. Amari, A. Cichocki, D. W. C. Ho, and S. Xie, “Underdeter-mined blind source separation based on sparse representation,”

IEEETrans. on Signal Proc. , vol. 54, no. 2, pp. 423–437, 2006.[193] A. Jourjine, S. Rickard, and O. Yilmaz, “Blind separation of disjointorthogonal signals: demixing n sources from 2 mixtures,”

Proc. of IEEEConf. on Acoustic, Speech, and Signal Proc. ICASSP’2000 , vol. 5, pp.2985–2988, 2000.[194] L. Vielva, D. Erdogmus, C. Pantaleon, I. Santamaria, J. Pereda, andJ. C. Principe, “Underdetermined blind source separation in a time-varying environment,”

Proc. of IEEE Conf. On Acoustic, Speech, andSignal Proc. ICASSP’02 , vol. 3, pp. 3049–3052, 2002.[195] I. Takigawa and et. al., “On the minimum ℓ -norm signal recoveryin underdetermined source separation,” Proc. of 5th Int. Conf. onIndependent Component Analysis , pp. 22–24, 2004.[196] C. C. Took, S. Sanei, and J. Chambers, “A ﬁltering approach to under-determined BSS with application to temporomandibular disorders,” inProc IEEE ICASSP’06, France , 2006.[197] T. Melia and S. Rickard, “Underdetermined blind source separation inechoic environment using DESPRIT,”

EURASIP Journal on Advancesin Signal Proc., Article No. 86484 , 2007.[198] K. Nazarpour, S. Sanei, L. Shoker, and J. A. Chambers, “Parallelspace-time-frequency decomposition of EEG signals for brain computerinterfacing,”

Proc. of EUSIPCO 2006, Florence, Italy , 2006.[199] R. Gribonval and S. Lesage, “A survey of sparse component analysis forblind source separation: principles, perspectives, and new challenges,” in Proc. of ESANN 2006 , pp. 323–330, Apr. 2006.[200] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequencydictionaries,”

IEEE Trans. on Signal Proc. , vol. 41, no. 12, pp. 3397–3415, Nov. 1993. [201] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decompositionby basis pursuit,” SIAM Journal on Scientiﬁc Computing , vol. 20, pp.33–61, 1998.[202] E. Candes, “ ℓ -magic: Recovery of sparse sig-nals,” IEEE Trans. on Signal Proc. , vol. 45, no. 3, pp. 600–616, March 1997.[204] G. H. Mohimani, M. Babaie-Zadeh, and C. Jutten, “Fast sparserepresentation using smoothed ℓ -norm,” to appear in IEEE Trans.on Signal Proc. [205] R. Gribonval and P. Vandergheynst, “On the exponential convergenceof matching pursuit in quasi-incoherent dictionaries,” IEEE Trans. onInfo. Theory , vol. 52, no. 1, pp. 255–261, Jan. 2006.[206] D. L. Donoho, M. Elad, and V. Temlyakov, “Stable recovery of sparseovercomplete representations in the presence of noise,”

IEEE Trans.on Info. Theory , vol. 52, no. 1, pp. 6–18, Jan. 2006.[207] M. S. Bazaraa, J. J. Jarvis, and H. D. Sherali,

Linear programming andnetwork ﬂows . John Wiley & Sons, Inc. New York, NY, USA, 1990.[208] J. J. Fuchs, “On sparse representations in arbitrary redundant bases,”

IEEE Tran. on Info. Theory , vol. 50, no. 6, pp. 1341–1344, June 2004.[209] A. Aldroubi and K. Zaringhalam,

Non-linear Least square in R N .Acta Applicanda Mathematicae, in press.[210] J. J. V. de Beek, O. Edfors, M. Sandell, S. K. Wilson, and P. Borjesson,“On channel estimation in OFDM systems,” Proc. 45th IEEE VehicularTechnology Conf., Chicago , vol. 2, pp. 815–819, Jul. 1995.[211] M. Morelli and U. Mengali, “A comparison of pilot-aided channelestimation methods for OFDM systems,”

IEEE Trans. on Signal Proc. ,vol. 49, no. 12, pp. 3065–3073, Dec. 2001.[212] E. G. Larsson and et. al., “Joint symbol timing and channel estimationfor OFDM based WLANs,”

IEEE Comm. Letters , vol. 5, no. 8, pp.325–327, Aug. 2001.[213] O. Edfors, M. Sandell, J.-J. V. de Beek, and S. K. Wilson, “Ofdmchannel estimation by singular value decomposition,”

IEEE Trans. onComm. , vol. 46, no. 7, pp. 931–939, July 1998.[214] P. Strobach, “Low-rank adaptive ﬁlters,”

IEEE Trans. on Signal Proc. ,vol. 44, no. 2, pp. 2932–2947, Dec. 1996.[215] S. Coleri, M. Ergen, A. Puri, and A. Bahai, “Channel estimationtechniques based on pilot arrangement in OFDM systems,”

IEEE Trans.on Broadcasting , vol. 48, no. 3, pp. 223–229, Sept. 2002.[216] S. G. Kang, Y. M. Ha, and E. K. Joo, “A comparative investigation onchannel estimation algorithms for OFDM in mobile communications,”

IEEE Trans. on Broadcasting , vol. 49, no. 2, pp. 142–149, June 2003.[217] G. Taub¨ock and F. Hlawatsch, “A compressed sensing technique forOFDM channel estimation in mobile environments: exploiting channelsparsity for reducing pilots,”

Proc. ICASSP’08, Las Vegas , pp. 2885–2888, March. 2008.[218] M. R. Raghavendra and K. Giridhar, “Improving channel estimationin OFDM systems for sparse multipath channels,”

IEEE Signal Proc.Letters , vol. 12, no. 1, pp. 52–55, Jan. 2005.[219] T. Kang and R. A. Iltis, “Matching pursuits channel estimation foran underwater acoustic ofdm modem,”

IEEE Int. Conf. on Acoustics,Speech and Signal Proc., ICASSP’08, Las Vegas , pp. 5296–5299,March 2008.[220] G. H. Golub and C. V. Loan,

Matrix Computations, 3rd edition . TheJohn Hopkins University Press, Baltimore, 1996.[221] Z. Zelatev,

Computational Methods for General Sparse Matrices , ser.Mathematics and its application series. Springer, 1991.[222] D. J. Evans,

Sparsity and its Applications . Cambridge UniversityPress, 1985.[223] V. Simoncini and D. B. Szyld, “Recent computational developments inKrylov subspace methods for linear systems,”

Numer. Linear AlgebraAppl. , vol. 14, pp. 1–59, Apr. 2007.[224] F. L. Teixeira, “FDTD/FETD methods: a review on some advancesand selected applications,”

Journal of Microwave and Optoelectronics ,vol. 6, no. 1, pp. 83–95, June 2007.[225] H. M. Masoudi, M. A. AlSunaidi, and J. M. Arnold, “Time-domainﬁnite-difference beam propagation method,”

IEEE Photonic Tech. Lett. ,vol. 11, no. 10, pp. 1274–1276, Oct. 1999.[226] H. M. Masoudi and J. M. Arnold, “Parallel three-dimensional ﬁnite-difference beam propagation methods,”

Int. J. Num. Mod , vol. 8, pp.95–107, March 1995.[227] B. He and F. L. Teixeira, “Sparse and explicit FETD via sparse approx-imation of hodge matrix,”

IEEE Microwave and Wireless ComponentsLett. , vol. 16, no. 6, pp. 348–356, April 2006. [228] F. Parvaresh, H. Vikalo, H. Misra, and B. Hassibi, “Recoveringsparse signals using sparse measurement matrices in compressed DNAmicroarrays,”

IEEE J. Selec. Topics in Signal Proc. , vol. 2, no. 3, pp.275–285, June 2008. TABLE XVL

IST OF A CRONYMS

ADSL : Asynchronous Digital

AIC : Akaike Information CriterionSubscriber Line AR : Auto-Regressive ARMA : Auto-Regressive Moving Average

BSS : Blind Source Separation BW : BandWidth CAD : Computer Aided Design

CFAR : Constant False Alarm Rate CG : Conjugate Gradient CS : Compressed Sensing CT : Computer Tomography DAB : Digital Audio Broadcasting DC : Direct Current: Zero-Frequency DCT : Discrete Cosine Transform Coefﬁcient

DFT : Discrete Fourier Transform

DHT : Discrete Hartley Transform

DOA : Direction Of Arrival

DST : Discrete Sine Transform DT : Discrete Transform DVB : Digital Video Broadcasting

DWT : Discrete Wavelet Transform

EEG : ElectroEncephaloGraphy

ELP : Error Locator Polynomial

ESPRIT : Estimation of Signal Parameters via

FDTD : Finite-Difference Time-Domain Rotational Invariance Techniques

FETD : Finite-Element Time-Domain

FOCUSS : FOCal Under-determined System

FPE : Final Prediction Error Solver GA : Genetic Algorithm HNQ : Hannan and Quinn method

ICA : Independent Component Analysis

IDE : Iterative Detection and Estimation

IDT : Inverse Discrete Transform

IMAT : Iterative Methods with Adaptive

KLT : Karhunen Loeve Transform Thresholding l : Absolute Summable Discrete Signals l : Finite Energy Discrete Signals LDPC : Low Density Parity Check LP : Linear Programming MA : Moving Average MAP : Maximum A Posteriori

MDL : Minimum Description Length probability

MIMAT : Modiﬁed IMAT ML : Maximum Likelihood MMSE : Minimum Mean Squared Error

MSL : Multi-Source Location

MUSIC : MUltiple SIgnal Classiﬁcation NP : Non-Polynomial time OCT : Optical Coherence Tomography

OFDM : Orthogonal Frequency Division

OFDMA : Orthogonal Frequency Division MultiplexMultiple Access

OMP : Orthogonal Matching Pursuit

OSR : Over Sampling Ratio

PCA : Principle Component Analysis

PDF : Probability Density Function

PHD : Pisarenko Harmonic Decomposition

POCS : Projection Onto Convex Sets

PPM : Pulse-Position Modulation

RDE : Recursive Detection and Estimation

RIP : Restricted Isometry Property RS : Reed-Solomon RV : Residual Variance SA : Simulated Annealing SCA : Sparse Component Analysis

SDCT : Sorted DCT

SDFT : Sorted DFT

SDR : Sparse Dictionary Representation

SER : Symbol Error Rate SI : Shift Invariant SL0 : Smoothed ℓ -norm SNR : Signal-to-Noise Ratio

ULA : Uniform Linear Array

UWB : Ultra Wide Band

WIMAX : Worldwide Inter-operability for

WLAN : Wireless Local Area Network Microwave Access