[PDF] Sparse Multipath Channel Estimation and Decoding for Broadband Vector OFDM Systems

Abstract

Vector orthogonal frequency division multiplexing (V-OFDM) is a general system that builds a bridge between OFDM and single-carrier frequency domain equalization in terms of intersymbol interference and receiver complexity. In this paper, we investigate the sparse multipath channel estimation and decoding for broadband V-OFDM systems. Unlike the non-sparse channel estimation, sparse channel estimation only needs to recover the nonzero taps with reduced complexity. Consider the pilot signals are transmitted through a sparse channel that has only a few nonzero taps with and without additive white Gaussian noise, respectively. The exactly and approximately sparse inverse fast Fourier transform (SIFFT) can be employed for these two cases. The SIFFT-based algorithm recovers the nonzero channel coefficients and their corresponding coordinates directly, which is significant to the proposed partial intersection sphere (PIS) decoding approach. Unlike the maximum likelihood (ML) decoding that enumerates symbol constellation and estimates the transmitted symbols with the minimum distance, the PIS decoding first generates the set of possible transmitted symbols and then chooses the transmitted symbols only from this set with the minimum distance. The diversity order of the PIS decoding is determined by not only the number of nonzero taps, but also the coordinates of nonzero taps, and the bit error rate (BER) is also influenced by vector block size to some extent but roughly independent of the maximum time delay. Simulation results indicate that by choosing appropriate sphere radius, the BER performance of the PIS decoding outperforms the conventional zero-forcing decoding and minimum mean square error decoding, and approximates to the ML decoding with the increase of signal-to-noise ratio, but reduces the computational complexity significantly.

Full PDF

11 Sparse Multipath Channel Estimation and Decodingfor Broadband Vector OFDM Systems

Qi Feng, Xiang-Gen Xia,

Fellow, IEEE,

Zhihui Ye, and Naitong Zhang

Abstract —Vector orthogonal frequency division multiplexing(V-OFDM) is a general system that builds a bridge betweenOFDM and single-carrier frequency domain equalization interms of intersymbol interference and receiver complexity. In thispaper, we investigate the sparse multipath channel estimationand decoding for broadband V-OFDM systems. Unlike thenon-sparse channel estimation, sparse channel estimation onlyneeds to recover the nonzero taps with reduced complexity. Weﬁrst consider a simple noiseless case that the pilot signals aretransmitted through a sparse channel with only a few nonzerotaps, and then consider a more practical scenario that the pilotsignals are transmitted through a sparse channel with additivewhite Gaussian noise interference. The exactly and approximatelysparse inverse fast Fourier transform (SIFFT) can be employedfor these two cases. The SIFFT-based algorithm recovers thenonzero channel coefﬁcients and their corresponding coordinatesdirectly, which is signiﬁcant to the proposed partial intersectionsphere (PIS) decoding approach. Unlike the maximum likeli-hood (ML) decoding that enumerates symbol constellation andestimates the transmitted symbols with the minimum distance,the PIS decoding ﬁrst generates the set of possible transmittedsymbols and then chooses the transmitted symbols only fromthis set with the minimum distance. The diversity order of thePIS decoding is determined by not only the number of nonzerotaps, but also the coordinates of nonzero taps, and the bit errorrate (BER) is also inﬂuenced by vector block size to some extentbut roughly independent of the maximum time delay. Simulationresults indicate that by choosing appropriate sphere radius,the BER performance of the PIS decoding outperforms theconventional zero-forcing decoding and minimum mean squareerror decoding, and approximates to the ML decoding with theincrease of signal-to-noise ratio, but reduces the computationalcomplexity signiﬁcantly.

Index Terms —Vector orthogonal frequency division multi-plexing, sparse multipath channel, sparse inverse fast Fouriertransform, partial intersection sphere decoding, diversity order.

I. I

NTRODUCTION O RTHOGONAL frequency division multiplexing(OFDM) has been widely adopted in both cellular

This work was supported by the China Scholarship Council (Grant No.201406190083), in part by the Satellite Communication and NavigationCollaborative Innovation Center (Grant No. SatCN-201408), in part by theProgram B for Outstanding PhD Candidate of Nanjing University (Grant No.201501B013), and in part by the Special Research Foundation of MarinePublic Service Sector (Grant No. 201205035).Q. Feng and N. Zhang are with the School of Electronic Science andEngineering, Nanjing University, Nanjing 210023, P. R. China (e-mail:[email protected], [email protected]).X.-G. Xia is with the Department of Electrical and Computer Engineering,University of Delaware, Newark, DE 19716, USA (e-mail: [email protected]).Z. Ye is with the School of Electronic and Optical Engineering, NanjingUniversity of Science and Technology, Nanjing 210094, P. R. China. Sheis also with the School of Electronic Science and Engineering, NanjingUniversity, Nanjing 210023, P. R. China (e-mail: [email protected]). systems, such as Long-Term Evolution (LTE) and Wi-Fisystems [1], [2]. The main advantage of OFDM modulationis to convert an intersymbol interference (ISI) channelinto multiple ISI-free subchannels and thus reduces thedemodulation complexity at the receiver [3]. However, sinceeach symbol is only transmitted over a parallel ﬂat fadingsubchannel, the conventional OFDM technique may notcollect multipath diversity, it thus performs worse thansingle carrier transmission [4]. Furthermore, OFDM hashigh peak-to-average power ratio (PAPR) of the transmittedsignals, which may affect its applications in broadbandwireless communications. Single-carrier frequency domainequalization (SC-FDE) is an alternative approach to deal withISI with low transmission PAPR [5]. However, induced byboth fast Fourier transform (FFT) and inverse FFT (IFFT)operations at the receiver, SC-FDE suffers from the drawbackthat transmitter and receiver have unbalanced complexities[6]. As a result, OFDM is more suitable for downlink withhigh transmission speed, whereas SC-FDE can be applied foruplink that reduces PAPR and transmitter complexity as inLTE [2].Vector OFDM (V-OFDM) for single transmit antenna sys-tems ﬁrst proposed in [7] converts an ISI channel to multiplevector subchannels where the vector size is a pre-designedparameter and ﬂexible. For each vector subchannel, the infor-mation symbols of a vector may be (are) ISI together. Sincethe vector size is ﬂexible, when it is , V-OFDM coincideswith the conventional OFDM. When the vector size is , eachvector subchannel may have two information symbols in ISI.When the vector size is large enough, say, the same as theIFFT size, then the maximal number of information symbolsare in ISI and it is then equivalent to SC-FDE. Therefore, V-OFDM naturally builds a bridge between OFDM and SC-FDEin terms of both ISI level and receiver complexity [7], [8],and it has attracted recent interests. For V-OFDM, an adaptivevector channel allocation scheme was proposed for V-OFDMsystems [9]. Some key techniques, such as carrier/sampling-frequency synchronization and guard-band conﬁguration in V-OFDM system were designed and made comparisons withthe conventional OFDM systems [10]. Iterative demodulationand decoding under turbo principle is an efﬁcient way forV-OFDM receiver [11]. Constellation-rotated V-OFDM wasproposed with improved multipath diversity [12]–[14]. Linearreceivers and the corresponding diversity order analyses arerecently given in [8] and phase noise inﬂuence is investigatein [15].For a very broadband channel, the IFFT size of an OFDMsystem needs to be very large, which may cause practical a r X i v : . [ c s . I T ] D ec implementation problems, such as high PAPR and high com-plexity. In contrast, for a V-OFDM, its IFFT size can beﬁxed and independent of a bandwidth, while its vector sizecan be increased to accommodate the increased bandwidth. Inthis paper, we are interested in V-OFDM over a broadbandsparse channel in the sense that it has a large time delayspread but only a few of nonzero taps [16]–[18]. For sparsechannels, there have been many studies in the literature, see forexample [19]–[26]. Recently, Sparse FFT (SFFT) theory wasproposed by the Computer Science & Artiﬁcial IntelligenceLab, Massachusetts Institute of Technology [27]. If a signalhas a small number k of nonzero Fourier coefﬁcients, theoutput of the Fourier transform can be represented succinctlyusing only k coefﬁcients. For such signals, the runtime issublinear in the signal size n rather than O ( n log n ) . Fur-thermore, several new algorithms for SFFT are presented,i.e., an O ( k log n ) -time algorithm for the exactly k -sparsecase, an O ( k log n log( n/k )) -time algorithm for the generalcase [28]. In this paper, a sparse channel estimation anddecoding scheme for V-OFDM systems is proposed. Inspiredby the idea of SFFT under the condition of signals withonly a few nonzero Fourier coefﬁcients, we ﬁrst use pilotsymbols to obtain channel frequency response (CFR), andthen estimate channel impulse response (CIR) by using sparseIFFT (SIFFT). Based on the estimation of nonzero channelcoefﬁcients and their corresponding coordinates, an efﬁcientpartial intersection sphere (PIS) decoding is investigated andit achieves the same diversity order as the maximum likelihood(ML) decoding. The main contributions of the paper aresummarized as follows. • We ﬁnd a connection between the exactly and approxi-mately sparse channel models in the estimation of a V-OFDM sparse multipath channel. For a multipath channelwith only a few nonzero taps, if there is no noise duringthe transmission, then the sparse channel can be estimatedby the exactly sparse multipath channel algorithm thatcorresponds to Algorithm 3.1 for exactly sparse FFTin [28]. When there is additive white Gaussian noise(AWGN) during the transmission, the sparse multipathchannel can be estimated by the approximately sparsemultipath channel algorithm that corresponds to Algo-rithms 4.1 − • By using the SIFFT-based algorithms, one can directlyrecover the nonzero channel coefﬁcients and their cor-responding coordinates, which is signiﬁcant to the PISdecoding process. • For the PIS decoding in V-OFDM systems, the bit errorrate (BER) is dependent of K nonzero taps in a sparsechannel and the vector size M to some extent, but roughlyindependent of the maximum delay D . • For any given small sphere radius, the proposed PISdecoding and ML decoding are of the same diversityorder, which is equal to the cardinality of the set ofreminder coordinates after mod M , but the PIS decodingcan substantially reduce the computational complexitywith probability .The reminder of the paper is organized as follows. In MappingScalar sequence L -point IFFT component-wisely Cyclic prefix insertionMultipath channel h Cyclic prefix removal L -point FFT component-wiselyDecodingChannel estimationDemapping   Nn n X   X L          XXX L          xxx L          yyy L          YYY

BlockingScalar sequence X l : M by 1 vector x l : M by 1 vector Y l : M by 1 vector y l : M by 1 vector   ˆ ˆ Nn n X   X P/SconversionBlocking AWGN

Fig. 1. Block diagram of vector OFDM modulation system.

Section II, the system model of V-OFDM is reviewed. InSection III, SIFFT-based channel estimation schemes for theexactly sparse case and the approximately sparse case areintroduced. In Section IV, a PIS decoding for V-OFDMsystems is proposed and analyzed. In Section V, simulationresults are presented and discussed. In Section VI, this paperis concluded. II. S

YSTEM M ODEL

We ﬁrst brieﬂy recall a V-OFDM system for single transmitantenna, which is shown in Fig. 1. The description of systemmodel follows the notations in [8] below.

A. Vector OFDM Modulation

In V-OFDM systems, N symbols X = { X n } N − n =0 areblocked into L vectors (called vector blocks (VB)) of size M . Denote the l th transmitted VB in X as X l = [ X lM , X lM +1 , . . . , X lM + M − ] T , l = 0 , , . . . , L − (1)where ( · ) T denotes the transpose. Assume the average poweris normalized, i.e., E (cid:8)(cid:12)(cid:12) X n (cid:12)(cid:12)(cid:9) = 1 , n = 0 , , . . . , N − , where E {·} denotes the mathematical expectation.Accordingly, x k is deﬁned as the normalized VB-basedIFFT of size L , i.e., x k = 1 √ L L − (cid:88) l =0 X l e j πL kl , k = 0 , , . . . , L − . (2)Here, x k is a column vector of size M represented as [ x kM , x kM +1 , . . . , x kM + M − ] T . After parallel to series (P/S)conversion, the transmitted signal sequence x = { x n } N − n =0 is obtained as (cid:2) x T0 , x T1 , . . . , x T L − (cid:3) T . In order to avoid theinterblock interference (IBI), the length of CP denoted by Γ should not be shorter than the maximum time delay of amultipath channel. Note that Γ does not need to be divisibleby M . At the transmitter, the signal sequence x inserted byCP, is transmitted serially through the channel with the order [ x N − Γ , x N − Γ +1 , . . . , x N − , x , x , . . . , x N − ] T .At the receiver, the received sequence is modeled by thetransmitted signal through a frequency selective fading channelwith complex AWGN. An inverse process as in transmitter isperformed to recover the original symbols. After removing CP, the received sequence y = { y n } N − n =0 is equal to the circularconvolution of the transmitted signal and CIR with AWGN y n = x n (cid:126) h n + ξ n , n = 0 , , . . . , N − (3)where (cid:126) denotes the circular convolution, CIR h = { h d } Dd =0 and D denotes the maximum time delay spread of the multi-path channel. After zero padding of h , CFR H = { H n } N − n =0 is the N -point FFT (without normalization) of h . Assumethe additive noise ξ is independent and identically distributed(i.i.d.) random sequence whose entry ξ n ∼ CN (cid:0) , σ (cid:1) .Accordingly, deﬁne the transmitted signal-to-noise ratio (SNR)as ρ = σ . y is then blocked into L column vectors of size M . Denote the k th vector in y as y k = [ y kM , y kM +1 , . . . , y kM + M − ] T , k = 0 , , . . . , L − . (4)Take the normalized component-wise vector FFT operationof size L as Y l = 1 √ L L − (cid:88) k =0 y k e − j πL kl , l = 0 , , . . . , L − . (5)The l th received VB in Y is also a column vector ofsize M represented as Y l = [ Y lM , Y lM +1 , . . . , Y lM + M − ] T .It is derived from [7] that the relationship between the l thtransmitted VB X l and received VB Y l as Y l = H l X l + Ξ l , l = 0 , , . . . , L − (6)where H l = H ( z ) (cid:12)(cid:12) z =e j 2 πL l is a blocked channel matrix of theoriginal ISI channel H ( z ) as H ( z ) =  (cid:101) H ( z ) z − (cid:101) H M − ( z ) · · · z − (cid:101) H ( z ) (cid:101) H ( z ) (cid:101) H ( z ) · · · z − (cid:101) H ( z ) ... ... . . . ... (cid:101) H M − ( z ) (cid:101) H M − ( z ) · · · z − (cid:101) H M − ( z ) (cid:101) H M − ( z ) (cid:101) H M − ( z ) · · · (cid:101) H ( z )  (7)where (cid:101) H m ( z ) = (cid:80) k h kM + m z − k is the m th polyphase compo-nent of H ( z ) , m = 0 , , . . . , M − . The additive noise Ξ l in(6) is the blocked version of Ξ whose entries have the samepower spectral density as in ξ that are i.i.d. complex Gaussianrandom variables.Note that H l can be diagonalized as H l = U H l H l U l (8)where U l is a unitary matrix whose entries [ U ] r,c = √ M e − j πN ( l + rL ) c , r, c = 0 , , . . . , M − , and ( · ) H denotesthe conjugate transpose, H l is an M × M diagonal matrixdeﬁned as H l = diag (cid:8) H l , H l + L , . . . , H l +( M − L (cid:9) . (9)Furthermore, the unitary matrix U l can be rewritten as U l = F M Λ l (10)where F M denotes the M × M normalized discreteFourier transform (DFT) matrix whose entries [ F M ] r,c = √ M e − j πM rc , r, c = 0 , , . . . , M − , and Λ l is a diagonal matrix deﬁned as Λ l = diag (cid:110) , e − j πN l , . . . , e − j πN ( M − l (cid:111) . (11)It can be seen from (6) that the original ISI channel H ( z ) of D + 1 symbols interfered together is converted to L vectorsubchannels, each of which may have M symbols interferedtogether. Note that M is the vector size and can be ﬂexiblydesigned. When M = 1 , (6) is back to the original OFDM,i.e., no ISI occurs in each subchannel. When M = N , all D + 1 symbols are interfered together and it is back to theSC-FDE. B. Pilot Pattern

Now, we rewrite the relationship of inputs and outputs in (6)for the better understanding of channel transmission structure U l Y l = H l U l X l + U l Ξ l , l = 0 , , . . . , L − . (12)It is straightforward to show that after the unitary trans-formation, the l th VB X l is transmitted parallel over thesubchannels H l , H l + L , . . . , H l +( M − L . Mathematically, U l is a kind of rotation matrix, H l can be thus viewed as theequivalent channel Fourier coefﬁcients.Denote P as the number of pilot channels and as-sume L is divisible by P . If the l p th VB X l p = (cid:2) X l p M , X l p M +1 , . . . , X l p M + M − (cid:3) T is allocated to transmitpilot symbols, then the pilot symbols are transmitted parallelover the equivalent channels H l p , H l p + L , . . . , H l p +( M − L .Furthermore, if P pilot channels are evenly distributed over L subchannels, i.e., l p = pLP , p = 0 , , . . . , P − (13)then the equivalent channels allocated to transmit pilot sym-bols are aligned as H P = (cid:2) H l , H l , . . . , H l P − , H l + L , H l + L , . . . , H l P − + L , . . .. . . , H l +( M − L , H l +( M − L , . . . , H l P − +( M − L (cid:3) T = (cid:2) H , H LP , . . . , H N − LP (cid:3) T (14)Therefore, it is not difﬁcult to ﬁnd that pilot symbols arealso evenly distributed over the equivalent channels. Fig. 2shows the pilot pattern for V-OFDM systems with parameters L = 8 , M = 2 , P = 2 . Due to the even distribution, H P canbe regarded as the downsampling of the CFR H such that formost cases D < M P , we usually only need to perform IFFTwith size

M P rather than N to exactly estimate the CIR h .III. S PARSE

IFFT

FOR C HANNEL E STIMATION

With the increase of communication bandwidth, the numberof equivalent channels N needs to be increased and a signalsequence can be transmitted over more parallel channels si-multaneously. Accordingly, either the number of VBs L or theVB size M increases proportionally. More parallel channelsmeans higher rate for transmission, which however, increasesthe computational complexity of both channel estimation anddecoding. In this section, an SIFFT-based approach is proposed x  x  x  x  x  x  x  x  x  x  x  x  x  x  x  x  Fig. 2. Pilot pattern for vector OFDM system with L = 8 , M = 2 , P = 2 . for sparse multipath channel estimation that can directly ob-tain the nonzero channel coefﬁcients and their correspondingcoordinates.Denote (cid:101) Y l = U l Y l , (cid:102) X l = U l X l , and (cid:101) Ξ l = U l Ξ l . Then(12) is further rewritten as (cid:101) Y l = H l (cid:102) X l + (cid:101) Ξ l , l = 0 , , . . . , L − . (15)Note that after the unitary transformation, the entries of (cid:101) Ξ l arealso i.i.d. complex Gaussian random variables. H l is deﬁnedas a column vector (cid:2) H l , H l + L , . . . , H l +( M − L (cid:3) T and (cid:99) H l isthe estimator of H l . It is convenient to estimate H l by theleast squares approach such that (cid:99) H l = (cid:2) diag (cid:8) (cid:102) X l (cid:9)(cid:3) − (cid:101) Y l , l =0 , , . . . , L − .Consider the pilot channels are evenly distributed over L channels as (13). Denote (cid:99) H P and (cid:98) h as the estimators of H P and h , respectively. According to (14), H P is comprised of H l p , p = 0 , , . . . , P − , and can be estimated by (cid:99) H l p = (cid:2) diag (cid:8) (cid:102) X l p (cid:9)(cid:3) − (cid:101) Y l p , p = 0 , , . . . , P − .With the knowledge of column vector H P , we can furtherobtain column vector h by implementing the IFFT operationwithout normalization. Since the additive noise (cid:101) Ξ l p is an M × i.i.d. random sequence whose entries (cid:2) (cid:101) Ξ l p (cid:3) m ∼ CN (cid:0) , σ (cid:1) , m = 0 , , . . . , M − , it is not difﬁcult to check that (cid:99) H l p is an unbiased estimator of H l p , i.e., E (cid:8)(cid:99) H l p (cid:9) = H l p , p =0 , , . . . , P − . Besides, the mean squared error (MSE) of (cid:99) H l p , denoted by Σ l p , can be derived as Σ l p = E (cid:104)(cid:0)(cid:99) H l p − H l p (cid:1)(cid:0)(cid:99) H l p − H l p (cid:1) H (cid:105) = σ (cid:104) diag (cid:110)(cid:12)(cid:12) (cid:102) X l p (cid:12)(cid:12) (cid:111)(cid:105) − (16)Since pilot signals are transmitted through a multipathchannel with AWGN, our goal is to design the pilot symbolsto minimize the MSE of estimator (cid:98) h , i.e., min X lp ∈ X M , p =0 , ,...,P − . tr (cid:110) E (cid:104)(cid:0)(cid:98) h − h (cid:1)(cid:0)(cid:98) h − h (cid:1) H (cid:105)(cid:111) (17)where X denotes the constellation of the transmitted symbol X n , n = 0 , , . . . , N − , tr {·} is the trace of square matrix deﬁned as the sum of the main diagonal. Obviously, (cid:98) h is alsoan unbiased estimator of h due to the linear transformation E (cid:8)(cid:98) h (cid:9) = E (cid:8) F − MP (cid:99) H P (cid:9) √ M P = F − MP E (cid:8)(cid:99) H P (cid:9) √ M P = F − MP H P √ M P = h (18)where F − MP denotes the M P × M P normalized in-verse DFT (IDFT) matrix whose entries (cid:2) F − MP (cid:3) r,c = √ MP e j πMP rc , r, c = 0 , , . . . , M P − .Now, we check the MSE of estimator (cid:98) h as E (cid:104)(cid:0)(cid:98) h − h (cid:1)(cid:0)(cid:98) h − h (cid:1) H (cid:105) = F MP ΣF − MP M P (19)where Σ is a diagonal matrix whose diagonal entries [ Σ ] p + mP = (cid:2) Σ l p (cid:3) m , m = 0 , , . . . , M − , p = 0 , , . . . , P − . It is not difﬁcult to ﬁnd that the diagonal entries in (19)are the same, hence the entries of (cid:0)(cid:98) h − h (cid:1) are identicallydistributed with complex Gaussian random variables, but maynot be independent of each other. The trace of (19) can alsobe simpliﬁed as tr (cid:110) E (cid:104)(cid:0)(cid:98) h − h (cid:1)(cid:0)(cid:98) h − h (cid:1) H (cid:105)(cid:111) = tr { Σ } M P (20)Note that (20) is only dependent on the sum of the MSEof each pilot channel. Therefore, (17) can be optimized bydesigning the pilot symbols for each channel, respectively,such that min X lp ∈ X M tr { Σ lp } MP , p = 0 , , . . . , P − .Consider the total number of transmitted signals N = 1024 is ﬁxed, and P pilot channels are evenly distributed over L channels with l p = 16 p . Assume the transmitted sequence X = { x n } N − n =0 is binary phase-shift keying (BPSK) signalsin V-OFDM system. For the l p th pilot channel, the nor-malized expectation of MSE over M symbols, deﬁned as σ l p = tr { Σ lp } Mσ , should be minimized. According to Parseval’stheorem, σ l p (cid:62) . Note that such pilot symbol design is notunique. Table I lists a type of pilot symbol design for different L and M . It is shown that such design can keep σ l p within ∼ , and does not increase with M . In Section III B, we willﬁnd that with such pilot symbol design, the SNR and the powerratio of the dominant entries to the rest entries in (cid:98) h are in thesame order of magnitude, regardless of the parameters M and P , which is accordance with approximately sparse channel.For the conventional OFDM signal, the MSEs of the CFRestimator for all subchannels are the same. For the V-OFDM,however, the entries in (cid:102) X l may no longer be constant modulussince unitary transformation U l is evolved in the originalpilot symbols, and Σ l p varies from subchannel to subchannel.Similar to the channel spectral nulls in the OFDM systems, ifsymbol spectral nulls are existed in (cid:102) X l , the overall estimationerror in the l th subchannel may be very large. For the sparsechannel that the pilot symbol are not well designed, the noisecomponent may be comparable to the dominant component inthe estimator (cid:98) h , and in what follows, (cid:98) h may be obtained as anon-sparse channel.For a very broadband channel, P M may become very large.

TABLE IA T

YPE OF P ILOT S YMBOLS D ESIGN FOR

BPSK M

ODULATED V ECTOR

OFDM S

YSTEM (a) L = 256 , M = 4 , N = 1024 , P = 16 p X l p σ l p p X l p σ l p p X l p σ l p p X l p σ l p − − − . − − − . − − − . − − − . − . − − − . − − − . − − − . − − − . − − − . − − − . − − − . − − − . − − − . − − − . − − − . (b) L = 128 , M = 8 , N = 1024 , P = 8 p X l p σ l p p X l p σ l p − − − − − . − − − − − − . − − − − − . − − − − − . − − − − − . − − − − − . − − − − − . − − − − − . (c) L = 64 , M = 16 , N = 1024 , P = 4 p X l p σ l p − − − − − − − − − − . − − − − − − . − − − − − − − − − − − . − − − − − − − − − − . In this case, it becomes expensive to implement the largesize IFFT directly. Inspired by the idea of SFFT proposed in[28], for a sparse multipath channel, we can perform SIFFTto estimate CIR from CFR. In what follows, we will focuson such sparse multipath channel estimation with and withoutAWGN in the transmission, respectively.

A. Exactly Sparse Multipath Channel

We ﬁrst consider a simple noiseless sparse multipath chan-nel, i.e., ξ = , which is called an exactly sparse multipathchannel. The pilot signals are transmitted through a sparsechannel with only K nonzero taps spread but without additivenoise. If IFFT is performed to estimate CIR from CFR, in thiscase, it is no doubt that the estimator (cid:98) h = F − MP (cid:99) H P √ MP is also acolumn vector of size M P with only K nonzero entries.The SIFFT-based channel estimation for exactly sparsemultipath channel is illustrated in Algorithm 1. The input (cid:99) H P is ﬁrst permuted by P σ, ,b , then multiplied by ﬂat window ﬁl-tering G B,α,ε . After the permutation and ﬁltering, the nonzerochannel taps can be sampled at the interval nB . Substitutingthe permutation operator P σ, ,b for P σ, ,b and repeating theabove process, the nonzero coordinates can be recovered fromthe phase difference between these two permutations, and theircorresponding values are obtained by permutation P σ, ,b . Afterrepeating K times, one can eventually recover (cid:98) h withexact K -sparse. The algorithm includes three functions: • E XACTLY S PARSE

IFFT: Iterate C

OORDINATE V ALUE and update (cid:98) h , repeat K times and eventually ﬁnd (cid:98) h with exact K -sparse. • C OORDINATE V ALUE : Access to H

ASH T O B INS and ob-tain nonzero coordinates and their corresponding values.It can ﬁnd more than half of nonzero entries in (cid:98) h eachtime. • H ASH T O B INS : Permute (cid:99) H P and guarantee that nonzeroentries in (cid:98) h are separated into different bins and thencompute B -dimensional IFFT in O ( B log B ) , where B denotes the number of bins and is set proportional to K .Similar to [28], the proposed algorithm has two fundamentalsteps, i.e., permutation and ﬂat window ﬁltering. The purposeof permutation is to separate nonzero coefﬁcients into differentbins randomly. The design of ﬁltering is a tradeoff betweenthe ﬁlter ﬂatness and the support of the window. Rather thanthe exactly sparse algorithm presented in [28] that recoversthe sparse signal with only a few nonzero Fourier coefﬁcients,Algorithm 1 is aimed to estimate the CIR from the CFR byusing the pilot symbols. The main differences between themare listed as follows. • Instead of the permutation operator presented in [28],in this subsection, we redeﬁne the permutation operator P σ,a,b as P σ,a,b ( X ) k = X π σ,a ( k ) e j πn bk (21)where X = { X k } n − k =0 is a discrete sequence in frequencydomain with size n , and π σ,a ( k ) = ( σk − a ) mod n .Denote p σ,a,b ( x ) as the IDFT of P σ,a,b ( X ) . It is not hardto derive that p σ,a,b ( x ) π σ,b ( k ) = x k e j πn ak (22)where π σ,b ( k ) = ( σk − b ) mod n , and x is the IDFTof X . Compared with the deﬁnition of permutation in[28], when computing the coordinates of nonzero channelcoefﬁcients in Algorithm 1, the proposed permutation canbe recovered directly without the required dictionary. • Different from the exactly sparse algorithm in [28] beingonly suitable for integers, we further expand the applica-tion to complex domain since any nonzero tap in sparse channel is a complex number. More speciﬁcally, denotethe resolution δ as the minimum value that nonzeroentries can be detected, if δ is set less than or equal tothe minimum magnitude of nonzero channel taps, then allnonzero channel coefﬁcients can be recovered with highprobability. For the ﬂat window with Gaussian ﬁltering,the sample sequence should be collected with the lengthat least O ( Bα log MPε ) , where ε = δ n ∆ , it may thusincrease the sample sequence length if the sparse chan-nel exists small nonzero taps. Therefore, the proposedalgorithm can be suitable for the complex channel at theexpense of a potential higher complexity.The exactly sparse case in [28] is the case when a signalhas only a few nonzero Fourier coefﬁcients. Accordingly,Algorithm 1 is suitable for the case that the estimator (cid:98) h hasonly a few nonzero taps. Therefore, it is straightforward toemploy Algorithm 1 to recover K nonzero entries in (cid:98) h .It has been proved in [28] that the complexity of ex-actly sparse algorithm is O ( K log M P ) . Furthermore, if thelength of symbol sequence M P is sufﬁciently large such that

M P (cid:62) O ( Bα log MPε ) is satisﬁed, Algorithm 1 can recover thecorrect coordinates and their corresponding values with highprobability. B. Approximately Sparse Multipath Channel

Now, we consider a more practical scenario that the pilotsignals are transmitted through a sparse channel with only K nonzero taps spread and AWGN, which is called an approx-imately sparse multipath channel. Since AWGN is inducedduring the transmission, the estimator (cid:98) h is no longer with only K nonzero entries. In fact, the estimator (cid:98) h = F − MP (cid:99) H P √ MP has K dominant entries and the rest entries are small, when the SNRis not low. For the approximately sparse vector (cid:98) h , deﬁne theparameter η as the maximum expectation power ratio of the K selected entries to the rest entries such that η = max |J | = K E (cid:40) (cid:13)(cid:13)(cid:98) h J (cid:13)(cid:13) (cid:13)(cid:13)(cid:98) h − (cid:98) h J (cid:13)(cid:13) (cid:41) (23)where (cid:107) · (cid:107) denotes the (cid:96) norm of a vector. η reﬂects howapproximately the sparse multipath channel is and determinesthe root-mean-square error (RMSE) of SIFFT algorithm. Inparticular, exactly sparse is an extreme case for η → ∞ .The SIFFT-based channel estimation for approximatelysparse multipath channel is shown in Algorithm 2 that hasthe following basic idea. To deal with noise, the algorithmestimates the nonzero coordinates and their correspondingvalues separately. For the coordinate estimation, all the co-ordinates are ﬁrst divided into small regions. The input (cid:99) H P is permutated randomly by P σ,a,b and P σ,a + τ,b , respectively,then multiplied by ﬂat window ﬁltering G B,α,ε . The phasedifference between these two permutations determines thecircular distance to each region. Select the appropriate regionswith the nearest circular distance and get one vote. Afterrepeating the above process T R times, choose the ﬁnal regionswith more than T R votes. By narrowing the regions of nonzerocoordinates in each iteration, the algorithm eventually obtains Algorithm 1

Exactly Sparse Multipath Channel

Input: (cid:99) H P , K, M, P Output: (cid:98) h function E XACTLY S PARSE

IFFT( (cid:99) H P , K, M, P ) initialization: (cid:98) h ← , n ← M P for t ← , , . . . , log k do k ← K t , α ∝ t (cid:98) h ← (cid:98) h + C OORDINATE V ALUE ( (cid:99) H P , (cid:98) h , k, n, α ) end for (cid:98) h ← arg max |J | = K (cid:13)(cid:13)(cid:98) h J (cid:13)(cid:13) return (cid:98) h end functionfunction C OORDINATE V ALUE ( (cid:99) H P , (cid:98) h , k, n, α ) B ∝ kε ← δ n ∆ , for ∆ (cid:62) max (cid:12)(cid:12)(cid:98) h (cid:12)(cid:12) , δ (cid:54) min (cid:12)(cid:12)(cid:98) h (cid:12)(cid:12) Choose σ randomly from { , , . . . , n − } Choose b randomly from { , , . . . , n − } w ← H ASH T O B INS ( (cid:99) H P , (cid:98) h , n, P σ, ,b , B, α, ε ) w (cid:48) ← H ASH T O B INS ( (cid:99) H P , (cid:98) h , n, P σ, ,b , B, α, ε ) initialization: (cid:98) h ← J = (cid:8) j (cid:12)(cid:12) | w j | (cid:62) δ (cid:9) for all j ∈ J do i ← round (cid:0) n π ∠ w (cid:48) j w j (cid:1) mod n (cid:98) h i ← w j end forreturn (cid:98) h end functionfunction H ASH T O B INS ( (cid:99) H P , (cid:98) h , n, P σ,a,b , B, α, ε ) U ← G B,α,ε P σ,a,b ( (cid:99) H P ) for i ← , , . . . , B − do V i ← (cid:80) j U i + Bj end for v ← F − ( V ) for j ← , , . . . , B − do w j ← v j − (cid:2) g B,α,ε ∗ p σ,a,b (cid:0)(cid:98) h (cid:1)(cid:3) nB j end forreturn w end function the nonzero coordinates. For the value estimation, after thepermutation and ﬁltering, the nonzero values corresponding tothe coordinates estimated before are obtained by permutation P σ,a,b . Repeating T V times and choose the median as the esti-mations of the values such that the estimation error decreasesexponentially with T V . Repeat the above process T A timesand ultimately recover (cid:98) h with K dominant taps. The algorithmincludes ﬁve functions, in which H ASH T O B INS is deﬁned thesame as in Algorithm 1. • A PPROXIMATELY S PARSE

IFFT: Iterate C

OORDINATE and V

ALUE , then update (cid:98) h . In each iteration, reduce k -sparse to k -sparse, repeat T A times and eventually ﬁnd (cid:98) h with K dominant entries. • C OORDINATE : Access to R

ANGE and narrow the range ofdominant coordinates, repeat T C times until the dominant Algorithm 2

Approximately Sparse Multipath Channel

Input: (cid:99) H P , K, M, P Output: (cid:98) h function A PPROXIMATELY S PARSE

IFFT( (cid:99) H P , K, M, P ) initialization: (cid:98) h ← , n ← M P, ε ← n T A ∝ log K log log K for t ← , , . . . , T A − do α ∝ t +1) , B ∝ K ( t +1) , k ∝ K (cid:81) i =1 , ,...,t i T V ∝ log( Bkα ) L ← C OORDINATE ( (cid:99) H P , (cid:98) h , n, B, α, ε ) (cid:98) h ← (cid:98) h + V ALUE ( (cid:99) H P , (cid:98) h , k, n, B, ε, L , T V ) end for (cid:98) h ← arg max |J | = K (cid:13)(cid:13)(cid:98) h J (cid:13)(cid:13) return (cid:98) h end functionfunction V ALUE ( (cid:99) H P , (cid:98) h , k, n, B, ε, L , T V ) for t ← , , . . . , T V − do Choose σ randomly from { , , . . . , n − } Choose a, b randomly from { , , . . . , n − } w ( t ) ← H ASH T O B INS ( (cid:99) H P , (cid:98) h , n, P σ,a,b , B, ε, α ) end forinitialization: (cid:98) h ← for all (cid:96) ∈ L do (cid:98) h (cid:96) ← median t ∈{ , ,...,T V } (cid:110) w ( t ) (cid:126) σ,b ( i ) e − j πn σa(cid:96) (cid:111) end for (cid:98) h ← arg max |J | = k (cid:13)(cid:13)(cid:98) h J (cid:13)(cid:13) return (cid:98) h end functionfunction H ASH T O B INS ( (cid:99) H P , (cid:98) h , n, P σ,a,b , B, α, ε ) U ← G B,α,ε P σ,a,b ( (cid:99) H P ) for i ← , , . . . , B − do V i ← (cid:80) j U i + Bj end for v ← F − ( V ) for j ← , , . . . , B − do w j ← v j − (cid:2) g B,α,ε ∗ p σ,a,b (cid:0)(cid:98) h (cid:1)(cid:3) nB j end forreturn w end function coordinates are uniquely determined. • R ANGE : Permute (cid:99) H P randomly with T R times, divide allthe coordinates into several regions, ﬁnd the appropriateregions with the nearest circular distance and then getsone vote. After repeating T R times, choose the ﬁnalregions with more than T R votes. • V ALUE : Access to H

ASH T O B INS and obtain the estima-tions of the values, repeat T V times and take the medianof such values with real and imaginary parts, respectively.Similar to [28], the permutation operator P σ,a,b in thissubsection is deﬁned as P σ,a,b ( X ) k = X π σ,a ( k ) e j πn σbk (24) function C OORDINATE ( (cid:99) H P , (cid:98) h , n, B, α, ε ) initialization: (cid:96) i ← nB i for i ∈ { , , . . . , B − } Choose σ randomly from { , , . . . , n − } Choose b randomly from { , , . . . , n − } λ ← nB , J ← log n, T C ← (cid:6) log J λ (cid:7) , T R ← (cid:6) log log n (cid:7) for t ← , , . . . , T C − do (cid:96) ← R ANGE ( (cid:99) H P , (cid:98) h , n, B, σ, b, α, ε, (cid:96) , λ (cid:0) J (cid:1) t , J, T R ) end for L ← π − σ,b ( (cid:96) ) return L end functionfunction R ANGE ( (cid:99) H P , (cid:98) h , n, B, σ, b, α, ε, (cid:96) , λ, J, T R ) initialization: µ i,j ← for i ∈ { , , . . . , B − } , j ∈{ , , . . . , J − } ν ∝ α for t ← , , . . . , T R − do Choose a randomly from { , , . . . , n − } Choose random variable τ evenly distributed from (cid:8)(cid:6) nJν λ (cid:7) , (cid:6) nJν λ (cid:7) + 1 , . . . , (cid:4) nJν λ (cid:5)(cid:9) w ← H ASH T O B INS ( (cid:99) H P , (cid:98) h , n, P σ,a,b , B, α, ε ) w (cid:48) ← H ASH T O B INS ( (cid:99) H P , (cid:98) h , n, P σ,a + τ,b , B, α, ε ) for i ← , , . . . , B − dofor j ← , , . . . , J − do θ i,j ← πn (cid:0) (cid:96) i + j +12 J λ + σb (cid:1) mod n if min (cid:8) ± (cid:0) τ θ i,j − ∠ w (cid:48) i w i (cid:1) mod 2 π (cid:9) (cid:54) πν then µ i,j ← µ i,j + 1 end ifend forend forend forfor i ← , , . . . , B − do J ← (cid:8) j (cid:12)(cid:12) µ i,j > T R (cid:9) if J (cid:54) = Ø then (cid:96) i ← min j ∈J (cid:8) (cid:96) i + jJ λ (cid:9) else (cid:96) i ← ∅ end ifend forreturn (cid:96) end function where π σ,a ( k ) = σ ( k − a ) mod n . Accordingly, the IDFT of P σ,a,b is derived as p σ,a,b ( x ) π σ,b ( k ) = x k e j πn σak (25)where π σ,b ( k ) = σ ( k − b ) mod n .As we will see from the proof in Appendix A, Algorithm2 is suitable for the case when the estimator (cid:98) h has a fewdominant taps and thus it can be applied to recover K dominant taps in (cid:98) h .For a sparse multipath channel with AWGN, thecomplexity of the approximately sparse algorithm is O ( K log M P log

MPK ) , which is more complicated than the exactly case, but still much simpler than the IFFTwith O ( M P log

M P ) operations. If the condition M P (cid:62) O ( Bα log MPε ) holds, Algorithm 2 can estimate the coordinatesand their corresponding values with low RMSE. In SectionV, we will present some simulation results to show that theRMSE of channel estimation is inﬂuenced by η and ultimatelydetermined by both ρ and the design of pilot symbols.As a result, the SIFFT-based channel estimation algorithmcan not only reduce computational complexity, but also returnthe nonzero channel coefﬁcients and their corresponding co-ordinates directly, which is signiﬁcant to the following PISdecoding process.IV. P ARTIAL I NTERSECTION S PHERE D ECODING

In V-OFDM systems, the performances of several commondecoding approaches were analyzed in [7], [8], [11]–[14]. Itwas proved in [12]–[14] that the diversity order of the MLdecoding is min { M, D + 1 } . In [8], it was shown that thediversity order of the minimum mean square error (MMSE)decoding can achieve min (cid:8)(cid:4) M − R (cid:5) , D (cid:9) + 1 , where R rep-resents the spectrum efﬁciency in bits/symbol, while for thezero-forcing (ZF) decoding, the diversity order is . For all thedemodulations of the ML, ZF and MMSE, they need to obtain H = { H n } N − n =0 which is computed by the N -point FFT of thezero padded h . If N is very large, the computational load ishigh. For a sparse channel, most entries in H l are zero and thenonzero entries are regularly placed. Based on this observation,it may be better to extract nonzero entries over each row andsearch all possible symbol sequences lying in a certain sphereof radius around the received signal. Hence, the complexityof searching such possible sequences is exponential to thenumber of nonzero entries in each row of H l , which ismuch less than M when M is large as what is studiedin this paper. In this section, a partial intersection sphere(PIS) decoding algorithm is proposed for a sparse multipathchannel. Here, partial intersection means the intersection ofthe existed and the current nonzero coordinate sets. In eachiteration, the algorithm only needs to compare the currentsymbol sequences corresponding to the coordinates belongingto the partial intersection with the existed ones.The proposed PIS decoding algorithm is illustrated asAlgorithm 3 and explained below in detail. Assume the sparsechannel h has only K nonzero taps with the maximum delay D . Denote J as the set of coordinates of nonzero taps for thesparse multipath channel h , i.e., J = (cid:8) j (cid:12)(cid:12) j ∈ { , , . . . , D } , h j (cid:54) = 0 (cid:9) (26)and the cardinality of set J is equal to K , i.e., |J | = K .Considering the special structure of H , denote I as the set ofthe reminders of the nonzero channel coefﬁcient coordinatesmodulo M , i.e., I = (cid:8) i (cid:12)(cid:12) ∀ j ∈ J , i = j mod M (cid:9) (27)Denote κ as the cardinality of I , i.e., κ = |I| . Suppose i , i , . . . , i κ − are the κ entries in I with the ascendingorder (cid:54) i < i < · · · < i κ − (cid:54) M − . For the case inthis paper, we have κ (cid:54) K (cid:28) M . In what follows, it will be found that the diversity order for the PIS decoding is onlyrelated to κ .In the m th iteration with (cid:54) m (cid:54) M − , denote U ( m ) and V ( m ) as the sets of the existed coordinates of the nonzeroentries in the ﬁrst m − rows and the current coordinatesof the nonzero entries in the m th row of H l , respectively. W ( m ) (called partial intersection) is deﬁned as the intersectionof U ( m ) and V ( m ) , i.e., W ( m ) = U ( m ) (cid:84) V ( m ) . Note that H l is from a pseudo-circulant matrix (7) where the numberof nonzero entries in each row is equal to κ . Recall that X is the constellation of the transmitted symbol X n . For theinitialization, the set of the existed coordinates of the nonzeroentries U (0) and the set of entire symbol sequences X (0) areempty sets, respectively, i.e., U (0) = Ø , X (0) = Ø . Then,we describe the updating process of PIS decoding in the m thiteration with (cid:54) m (cid:54) M − as follows.1) Extract κ entries from the m th row and ( m − i ) mod M th, ( m − i ) mod M th, . . . , ( m − i κ − ) mod M th columns of H l , generate H ( m ) l as × κ vector (cid:2) [ H l ] m, ( m − i ) mod M , [ H l ] m, ( m − i ) mod M , . . .. . . , [ H l ] m, ( m − i κ − ) mod M (cid:3) .2) Search all possible symbol sequences S =[ S , S , . . . , S κ − ] T , S ∈ X κ , that lie in the certainsphere of radius r around the received signal Y ( m ) l andgenerate the set of symbol sequences S ( m ) as S ( m ) = (cid:110) S (cid:12)(cid:12)(cid:12) S ∈ X κ , (cid:12)(cid:12) Y ( m ) l − H ( m ) l S (cid:12)(cid:12) (cid:54) r (cid:111) (28)where Y ( m ) l is the m th entry of the column vector Y l .3) For each S ∈ S ( m ) , construct an injective mappingof coordinates f : k → ( m − i k ) mod M, k ∈{ , , . . . , κ − } . For each symbol sequence X ( m ) ∈X ( m ) , where X ( m ) is the set of entire symbol se-quences generated from the previous iteration, comparethe current symbol sequence S for the coordinatesbelonging to the partial intersection W ( m ) with theexisted symbol sequence X ( m ) , namely, if X ( m ) w = S f − ( w ) holds for all w ∈ W ( m ) , where X ( m ) w standsfor the w th entry in X ( m ) , then X ( m ) is put intothe set of symbol sequences X ( m ) S , which can be ex-pressed as X ( m ) S = (cid:8) X ( m ) (cid:12)(cid:12) X ( m ) ∈ X ( m ) , ∀ w ∈W ( m ) , X ( m ) w ≡ S f − ( w ) (cid:9) . Then, insert the symbolswhose coordinates belong to the complement of thepartial intersection W ( m ) to each symbol sequence X ( m ) , i.e., ∀ v ∈ (cid:123) V ( m ) W ( m ) , where (cid:123) V ( m ) W ( m ) stands for the complement of W ( m ) in V ( m ) , set X ( m +1) v = S f − ( v ) , insert X ( m +1) v to each symbolsequence X ( m ) in X ( m ) S and generate X ( m +1) , the newset of symbol sequences is thus updated as X ( m +1) S = (cid:8) X ( m +1) (cid:12)(cid:12) X ( m ) ∈ X ( m ) S , ∀ u ∈ U ( m ) , X ( m +1) u = X ( m ) u ; ∀ v ∈ (cid:123) V ( m ) W ( m ) , X ( m +1) v = S f − ( v ) (cid:9) .4) Repeat Step 3 by enumerating all S ∈ S ( m ) . Thenthe set of entire symbol sequences X ( m +1) is ob-tained by the union of all X ( m +1) S , i.e., X ( m +1) = (cid:83) S ∈S ( m ) X ( m +1) S . U ( m +1) is updated to U ( m ) (cid:83) V ( m ) asthe existed coordinates of nonzero entries for the next Algorithm 3

Partial Intersection Sphere Decoding

Input: Y , h , D, L, M, r Output: (cid:99) X function S PARSE

PIS( Y , h , D, L, M, r ) initialization: U (0) ← Ø , X (0) ← Ø Calculate H l , l ∈ { , , . . . , L − } according to (7) J ← (cid:8) j (cid:12)(cid:12) j ∈ { , , . . . , D } , h j (cid:54) = 0 (cid:9) I ← (cid:8) i (cid:12)(cid:12) ∀ j ∈ J , i ← j mod M (cid:9) , κ ← |I| i , i , . . . , i κ − are κ entries in I with the ascendingorder (cid:54) i < i < · · · < i κ − (cid:54) M − for l ← , , . . . , L − dofor m ← , , . . . , M − do Y ( m ) l is the m th entry of column vector Y l H ( m ) l is × κ vector aligned as the m th rowand ( m − i ) mod M th, ( m − i ) mod M th , . . . , ( m − i κ − ) mod M th columns of H l S ( m ) ← (cid:110) S (cid:12)(cid:12)(cid:12) S ∈ X κ , (cid:12)(cid:12) Y ( m ) l − H ( m ) l S (cid:12)(cid:12) (cid:54) r (cid:111) V ( m ) ← (cid:8) ι ( m ) (cid:12)(cid:12) ∀ i ∈ I , ι ( m ) ← ( m − i ) mod M (cid:9) W ( m ) ← U ( m ) (cid:84) V ( m ) f : k → ( m − i k ) mod M, k ∈ { , , . . . , κ − } for all S ∈ S ( m ) do X ( m ) S ← (cid:8) X ( m ) (cid:12)(cid:12) X ( m ) ∈ X ( m ) , ∀ w ∈W ( m ) , X ( m ) w = S f − ( w ) (cid:9) X ( m +1) S ← (cid:8) X ( m +1) (cid:12)(cid:12) X ( m ) ∈ X ( m ) S , ∀ u ∈U ( m ) , X ( m +1) u ← X ( m ) u ; ∀ v ∈ (cid:123) V ( m ) W ( m ) , X ( m +1) v ← S f − ( v ) (cid:9) end for X ( m +1) ← (cid:83) S ∈S ( m ) X ( m +1) S U ( m +1) ← U ( m ) (cid:83) V ( m ) end for (cid:99) X l ← arg min X ( M ) ∈X ( M ) (cid:13)(cid:13) Y l − H l X ( M ) (cid:13)(cid:13) end forreturn (cid:99) X end function iteration.After M iterations, the set of possible VB sequences X ( M ) can be ultimately obtained, then choose the symbolsequence X ( M ) ∈ X ( M ) with the minimum (cid:96) distance of (cid:13)(cid:13) Y l − H l X ( M ) (cid:13)(cid:13) as the estimation of the transmitted VB X l .For a V-OFDM system with the PIS decoding, assumethe CIR h and the average power of complex AWGN σ are known at the receiver. Denote S ( m ) † as the correctsymbol sequence corresponding to the transmitted symbols,i.e., S ( m ) † is extracted from the ( m − i ) mod M th, ( m − i ) mod M th, . . . , ( m − i κ − ) mod M th entries of X l and generated as (cid:2) [ X l ] ( m − i ) mod M , [ X l ] ( m − i ) mod M , . . .. . . , [ X l ] ( m − i κ − ) mod M (cid:3) . Hence, the distance between Y ( m ) l and H ( m ) l S ( m ) † , i.e., (cid:12)(cid:12) Y ( m ) l − H ( m ) l S ( m ) † (cid:12)(cid:12) , is Rayleigh dis-tributed with mean √ π σ and variance − π σ . According tothe cumulative distribution function of Rayleigh distribution,the probability that the transmitted symbol S ( m ) † lies in the sphere radius r in the m th iteration is Pr (cid:8)(cid:12)(cid:12) Y ( m ) l − H ( m ) l S ( m ) † (cid:12)(cid:12) (cid:54) r (cid:9) = 1 − e − r σ (29)From the previous analysis, the additive noise Ξ l in (6) is an M × vector whose entries are i.i.d. complex Gaussian randomvariables. Hence, for m = 0 , , . . . , M − , the events that thetransmitted symbol S ( m ) † lies in the sphere are independent andthe probabilities that each event occurs are the same. After M iterations, the occurrence of event X l ∈ X ( M ) is equivalentto the occurrence of all the M events (cid:12)(cid:12) Y ( m ) l − H ( m ) l S ( m ) † (cid:12)(cid:12) (cid:54) r, m = 0 , , . . . , M − , then we have Pr (cid:110) X l ∈ X ( M ) (cid:111) = (cid:0) − e − r σ (cid:1) M (30)Note that X ( M ) ⊆ X M is a set of possible VB sequenceswhose corresponding κ -dimensional symbol sequences lie inthe sphere radius such that (28) holds for all M rows in (6).In fact, the choice of sphere radius r is a tradeoff between thesymbol error rate (SER) performance and the computationalcomplexity. With the increase of r , Pr (cid:8) X l ∈ X ( M ) (cid:9) alsoincreases which consequently improves the SER performance.However, this means that more possible symbol sequencesneed to be compared in each iteration. The SER of theproposed PIS decoding P PIS ( r ) is a function of sphere radius r and calculated by the law of total probability P PIS ( r ) = Pr (cid:8) (cid:99) X l (cid:54) = X l (cid:12)(cid:12) X l ∈ X ( M ) (cid:9) Pr (cid:110) X l ∈ X ( M ) (cid:111) + Pr (cid:8) (cid:99) X l (cid:54) = X l (cid:12)(cid:12) X l / ∈ X ( M ) (cid:9) Pr (cid:110) X l / ∈ X ( M ) (cid:111) (31)Note that for X l / ∈ X ( M ) , there is no doubt that Pr (cid:8) (cid:99) X l (cid:54) = X l (cid:12)(cid:12) X l / ∈ X ( M ) (cid:9) = 1 since (cid:99) X l is chosen from X ( M ) . Denote P X ( M ) as the probability that symbol error occurs conditionedon X l ∈ X ( M ) , i.e., P X ( M ) = Pr (cid:8) (cid:99) X l (cid:54) = X l (cid:12)(cid:12) X l ∈ X ( M ) (cid:9) (32)Substituting (32) into (31), P PIS ( r ) can be further simpliﬁedas P PIS ( r ) = (cid:0) − e − r σ (cid:1) M P X ( M ) + 1 − (cid:0) − e − r σ (cid:1) M (33)For a V-OFDM system, the signal vector X l needs to bespeciﬁcally rotated/transformed to achieve full diversity forthe ML decoding as done in [12]. In Section II B, it wasanalyzed that if P subchannels with the indices , LP , . . . , L − LP are allocated to transmit pilot symbols, the IFFT/SIFFT-based channel estimation can be applied to recover CIR h . Forthe remaining subchannels allocated to transmit data symbols,it is proved in Appendix B that the diversity order of the MLdecoding for a sparse multipath channel is κ . We describe thediversity order by the exponential equality P ML . = ρ − κ , whichis mathematically deﬁned as lim ρ →∞ ln P ML ln ρ = − κ [29], wherethe SER of the ML decoding P ML can be found in [7], [12],[14].Instead of the ML decoding that enumerates symbol con-stellation and estimates the transmitted symbols with theminimum distance, the PIS decoding ﬁrst generates the set ofpossible transmitted symbols X ( M ) and then chooses the trans- mitted symbols only from X ( M ) with the minimum distance.It is proved in Appendix C that P X ( M ) (cid:54) P ML . Accordingly,for sufﬁciently large ρ , P X ( M ) is exponentially less than orequal to ρ − κ , which can be expressed as P X ( M ) ˙ (cid:54) ρ − κ ,i.e., lim ρ →∞ ln P X ( M ) ln ρ (cid:54) − κ . Therefore, from (33), P PIS ( r ) is exponentially less than or equal to ρ − κ if and only if − (cid:0) − e − r σ (cid:1) M is exponentially less than or equal to ρ − κ ,i.e., P PIS ( r ) ˙ (cid:54) ρ − κ ⇐⇒ − (cid:0) − e − r σ (cid:1) M ˙ (cid:54) ρ − κ (34)We say the sphere radius r is the asymptotically greater thanor equal to the sphere radius r (cid:48) , denoted by r (cid:60) r (cid:48) , when lim ρ →∞ ln P PIS ( r )ln ρ (cid:54) lim ρ →∞ ln P PIS ( r (cid:48) )ln ρ (35)For a sufﬁciently large ρ = σ , the inﬁnitesimal − (cid:0) − e − r σ (cid:1) M approximates to M e − r σ . Substituting (34)into (35) and supposing ρ is sufﬁciently large, we have r (cid:62) σ √ ln M − κ ln σ . Furthermore, for a sufﬁciently large ρ , the term ln M can be neglected compared with − κ ln σ ,then the necessary and sufﬁcient condition of P PIS ( r ) ˙ (cid:54) ρ − κ is r (cid:60) (cid:114) κρ ln ρ (36)It is well known that the SER of the proposed PIS decodingcan not be better than that of the ML decoding. According to(36), we have the following lemma that gives the criterion ofsphere radius satisfying P PIS . = ρ − κ . Lemma 1.

The SER of the proposed PIS decoding is expo-nentially equal to that of the ML decoding in the choice ofsphere radius r (cid:60) (cid:113) κρ ln ρ for a sufﬁciently large ρ . For the V-OFDM system, it is known that the complexitieswith respect to complex multiplication operation of MMSEdecoding and ML decoding are O ( LM log M + LM R ) and O ( LM RM ) , respectively. The PIS decoding only needs M Rκ trials with κ complex multiplication operation in eachtrial. Hence, the complexity with respect to complex mul-tiplication operation of the PIS decoding is O ( κLM Rκ ) .Besides, the evaluation and comparison operations shouldbe taken into account in PIS decoding, which in fact, mayvary from O ( κLM ) to O ( κLM RM ) and are related to thecardinality of X ( m ) and ultimately dependent of sphere radius r . As illustrated in Algorithm 3, the evaluation operation isan operator used for assignment where the source S f − ( v ) is a complex number and the destination X ( m ) v is the v thentry in the symbol sequence X ( v ) , i.e., X ( m ) v = S f − ( v ) ,while the comparison operation is one of relational operatorused to check the equality of two complex numbers X ( m ) w and S f − ( w ) , i.e., X ( m ) w ≡ S f − ( w ) , if the equality holdsreturn , otherwise return . In assembly language, an eval-uation operation or a comparison operation usually executes instruction cycle, whereas a real multiplication operationexecutes instruction cycles or slightly more due to hardware.Although different operations may have different execute time,the number of instruction cycles for any operation is ﬁxed and can be seen as a constant. Then, the total complexitydepends ultimately on the number of operations executed inthe program. With the increase of ρ , it is wise to decreasethe sphere radius r such that the computational complexitieswith respect to the evaluation and comparison operations canreach the lower bound O ( κLM ) with probability . Note that lim ρ →∞ (cid:113) κρ ln ρ = 0 , then we have the following theorem. Theorem 1.

For any given small sphere radius, the PISdecoding can achieve the diversity order κ which is the sameas the ML decoding for a sparse multipath channel, but thecomputational complexity decreases from O ( LM RM ) to O ( κLM Rκ ) with probability .Proof: See Appendix D for the proof.Therefore, by choosing r asymptotically equal to (cid:113) κρ ln ρ ,the proposed PIS decoding algorithm can balance the tradeoffbetween the SER performance and the computational com-plexity. Since the diversity order is κ that depends on the set ofthe reminders of the K nonzero channel coefﬁcient coordinatesmodulo M , in practice, for a given channel model, i.e., for agiven set of coordinates of nonzero channel coefﬁcients, onemay properly choose M such that κ is maximized.V. S IMULATION R ESULTS

In this section, we provide simulation results to verifythe previous analysis. The BPSK modulation is employed inthe V-OFDM system. Sparse multipath channel h is mod-elled as K i.i.d. complex Gaussian distributed nonzero taps h j ∼ CN (0 , , j ∈ J randomly distributed within themaximum delay D . We ﬁrst employ the RMSE to evaluate theperformances of the SIFFT-based sparse channel estimation.Then, we give an example of different channels withdeterministic nonzero coordinates to make a comparison ofthe diversity order. Besides, we investigate the relationshipbetween the BER performance of PIS decoding and the param-eters D , K , M , respectively. Furthermore, the PIS decodingis compared with the conventional ZF, MMSE, ML decodingschemes in the V-OFDM system. Finally, channel estimationand decoding algorithm are jointly considered to show theBER performances in both OFDM and V-OFDM systems.Figs. 3 and 4 show the RMSE performances of the SIFFT-based sparse channel estimation with and without noise,respectively. In the simulation of the SIFFT-based exactlysparse multipath channel estimation, the parameters α and B are set to K t +4 such that Bα log MPε is a constant regardlessof K . It can be seen from Fig. 3 that the RMSE of thechannel estimation is below . but reduces the complexityto O ( K log M P ) , where P is pilot channel number. Theestimation error is mainly caused by the imperfect permutationthat the nonzero entries are not separated into different bins.For the SIFFT-based approximately sparse multipath channelestimation, suppose the parameters α = t +1) and B = K ( t +1) that can keep the collision at a relatively low level. Fig.4 indicates that with the increase of ρ , K dominant entries areslightly inﬂuenced by the rest entries η , which consequently,reduces the RMSE of channel estimation. For instance, when K = 4 , the RMSE of sparse channel estimation is below . .

16 17 18 19 20 21 2200.0020.0040.0060.0080.010.0120.0140.0160.0180.02 log ( MP ) R oo t m ean s qua r e e rr o r K = 2 K = 4 K = 8 K = 16 Fig. 3. SIFFT-based algorithm for exactly sparse multipath channel.

Whereas when K = 16 , however, there is a sharp decreasefor M P (cid:54) since the condition M P (cid:62) O ( Bα log MPε ) does not hold in this case. The complexity for the SIFFT-based approximately sparse multipath channel estimation is O ( K log M P log

MPK ) .In Fig. 5, we give an example of different channelswith deterministic nonzero coordinates and for each channel,the nonzero channel coefﬁcients are i.i.d. complex Gaussiandistribution. Suppose L = 256 , M = 8 , D = 32 , and thenonzero coordinates for Channel A: J A = { } , ChannelB: J B = { , } , Channel C: J C = { , , } , Channel D: J D = { , , } , Channel E: J E = { , , , } , ChannelF: J F = { , , , } . Accordingly, the reminders of thenonzero coordinates modulo M for Channel A: I A = { } ,Channel B: I B = { , } , Channel C: I C = { , } , ChannelD: I D = { , , } , Channel E: I E = { , , } , Channel F: I F = { , , , } . It can be seen from Fig. 5 that the diversityorder of Channel A is , Channel B and Channel C are , Channel D and Channel E are , Channel F is . It ispointed out that although |J C | = |J D | and |J E | = |J F | , theircorresponding diversity orders are different. As a result, wecan verify the previous analysis that the diversity order ofsparse multipath channel is determined by the cardinality ofthe set of reminder coordinates after mod M , rather than thecardinality of coordinate set itself.Figs. 6 − D , K , M inﬂuence onthe BER performance of PIS decoding, respectively. Supposethe transmitted SNR ρ = 10dB . In Fig 6, we compare theBER with respect to the maximum delay D . It can be seen thatthe BER is roughly irrelevant to the variation of D , since K nonzero taps are randomly distributed within D + 1 taps. Fig.7 investigates the relationship between the number of nonzerotaps K and the BER performance. Simulation result indicatesthat with the increase of K , the BER decreases almost linearlyin the logarithmic scale. The reason is that the diversity ordercan be directly determined by κ and increases with K withlarge probability. Fig. 8 shows the BER performance withrespect to VB size M in the V-OFDM system. The BER ﬁrst

16 17 18 19 20 21 2200.0050.010.0150.020.0250.030.0350.040.045 log ( MP ) R oo t m ean s qua r e e rr o r ρ = 10dB, K = 4 ρ = 10dB, K = 16 ρ = 20dB, K = 4 ρ = 20dB, K = 16 Fig. 4. SIFFT-based algorithm for approximately sparse multipath channel. decreases with the increase of M for M (cid:54) , whereas for M (cid:62) , the BER increases with M instead. On one hand,a larger M can avoid K nonzero taps interacting with eachother better after mod M , in this case, κ = K with highprobability which may improve the BER performance. Onthe other hand, for a given sphere radius r , with the increaseof M , the probability that the transmitted symbols lie in thecertain sphere decreases exponentially according to (30), andthus diminishes the advantage of the PIS decoding. Therefore,one can improve the BER performance of the PIS decodingby choosing an appropriate VB size M .Fig. 9 compares different decoding approaches in the V-OFDM system. Suppose the parameters D = 16 , K = 4 , L = 256 , M = 4 , r = (cid:113) κρ ln ρ . Since the condition K (cid:28) M does not hold, κ < K with not a small probabilitywhich may diminish the multipath diversity orders of theMMSE decoding, ML decoding and PIS decoding. In fact,when D is sufﬁciently large such that the reminders of thenonzero coordinates modulo M can be regarded randomlydistributed at the coordinates , , . . . , M − , for the given K and M , the probability mass function of κ is P κ = (cid:0) Mκ (cid:1)(cid:0) Kκ (cid:1)(cid:0) κM (cid:1) K κ − κ κ ! , κ = 1 , , . . . , min { K, M } . After av-eraging over the random nonzero coordinates of channel, thediversity orders of the ZF decoding, MMSE decoding, MLdecoding and PIS decoding are corresponding to the minimumof κ and thus equal to . For the different decoding ap-proaches, denote the BERs of ZF decoding, MMSE decoding,ML decoding, PIS decoding as P ZF , P MMSE , P ML , P PIS ,respectively. It is well known that P ML < P MMSE < P ZF .Simulation result indicates that the PIS decoding loses certainBER performance since (36) does not hold if ρ is not largeenough, while for ρ > , the proposed PIS decodingoutperforms the ZF decoding and MMSE decoding and grad-ually approximates to the ML decoding with the increase of ρ . Furthermore, the complexity of the PIS decoding decreaseswith ρ and is much less than the ML decoding.In Fig. 10, we consider the channel estimation and decodingalgorithms jointly. The BER performance is not only depen- −6 −5 −4 −3 −2 −1 Signal−to−noise ratio (dB) B i t e rr o r r a t e Channel AChannel BChannel CChannel DChannel EChannel F

Fig. 5. Diversity orders for different channels with L = 256 and M = 8 .

20 30 40 50 60 70 8010 −4 −3 −2 D B i t e rr o r r a t e K = 2, M = 8 K = 2, M = 16 K = 2, M = 32 K = 4, M = 8 K = 4, M = 16 K = 4, M = 32 Fig. 6. PIS decoding for different D with ρ = 10dB and r = (cid:113) κρ ln ρ . dent of the decoding approaches, but also inﬂuenced by thechannel estimation accuracy. Suppose the parameters D = 64 , K = 4 . For the OFDM system, the receiver employs theFFT-based interpolation for channel estimation and symbol-by-symbol decoding with parameters N = 1048576 , pilotchannel number P = 65536 . For the V-OFDM system withlinear receivers, we estimate channel by the conventional IFFT-based approach, and employ the ZF decoding and the MMSEdecoding with parameters L = 131072 , M = 8 , P = 8192 ,respectively. For the V-OFDM system with the ML decoding,we estimate the channel by the conventional IFFT-basedapproach as well with the parameters L = 262144 , M = 4 , P = 16384 . For the V-OFDM system with the PIS decoding,the SIFFT-based algorithm is employed for sparse multipathchannel estimation with parameters L = 131072 , M = 8 , P = 8192 . If a slight bias is induced during the process ofchannel estimation, the sphere radius should not be extremelysmall since a robust sphere radius is needed to guarantee thatthe probability of the transmitted symbols lying in the sphere −5 −4 −3 −2 K B i t e rr o r r a t e D = 20, M = 16 D = 20, M = 32 D = 40, M = 16 D = 40, M = 32 Fig. 7. PIS decoding for different K with ρ = 10dB and r = (cid:113) κρ ln ρ . −4 −3 −2 −1 log( M ) B i t e rr o r r a t e D = 20, K = 2 D = 40, K = 2 D = 60, K = 2 D = 20, K = 4 D = 40, K = 4 D = 60, K = 4 Fig. 8. PIS decoding for different M with ρ = 10dB and r = (cid:113) κρ ln ρ . does not decrease. An empirical method to balance the tradeoffbetween the estimation error and the complexity is to choosesphere radius r = max (cid:110)(cid:113) κρ ln ρ, (cid:113) κρ ln ρ (cid:111) , ρ = 20dB . Itcan be seen from Fig. 10 that V-OFDM system outperforms theconventional OFDM system. Compared with the ZF decodingand the MMSE decoding, the proposed SIFFT-based channelestimation and PIS decoding reduces the BER signiﬁcantly.VI. C ONCLUSION

In this paper, we investigate sparse multipath channel esti-mation and decoding for broadband V-OFDM systems. Fromthe system model, if the pilot channels are evenly allocatedover multiple subcarriers, the pilot symbols are evenly dis-tributed over the equivalent channels. For the sparse multipathchannel estimation, we ﬁrst design a type of pilot symbolsthat can minimize the MSE of an estimator. Then, we giveSIFFT-based algorithms for exactly and approximately sparsemultipath channel estimations corresponding to the cases −7 −6 −5 −4 −3 −2 −1 Signal−to−noise ratio (dB) B i t e rr o r r a t e ZFMMSEMLPIS

Fig. 9. Comparison of different decoding schemes with L = 256 and M = 4 . with and without AWGN induced during the transmission,respectively. The remarkable signiﬁcance of the SIFFT-basedapproach is to estimate the nonzero channel coefﬁcients andtheir corresponding coordinates directly. For the PIS decodingalgorithm, the diversity order is determined by not only thenumber of nonzero taps, but also the coordinates of nonzerotaps. Simulation results indicate that the BER performance ofthe PIS decoding is comparable to that of the ML decodingwith certain sphere radius for a sufﬁciently large SNR, butreduces the complexity substantially.A PPENDIX AP ROOF OF S PARSE M ULTIPATH C HANNEL W ITH

AWGN

Proof:

For the pilot signals being transmitted throughmultipath channel with AWGN, it was derived in (18) that (cid:98) h is an unbiased estimator of h . The diagonal entries of (19)corresponding to the variance of (cid:98) h , which are all equal to tr { Σ } M P . Hence, (cid:98) h is a random vector and can be regarded as h with an additive noise whose entries are identically distributedwith complex Gaussian noise, i.e., CN (cid:0) , tr { Σ } M P (cid:1) , but may notbe white. For the sparse channel with only K nonzero taps, η approximates the expectation power ratio of K dominantentries in (cid:98) h to the rest entries such that η = (cid:107) h (cid:107) + KM P tr { Σ } MP − KM P tr { Σ } (37)Considering the sparse channel that K (cid:28) M P , (37) can thusbe further simpliﬁed as η = M P (cid:107) h (cid:107) tr { Σ } (38)It was listed in Table I that the designed pilot symbols canreduce the interference efﬁciently. Suppose (cid:96) norm of thesparse channel is normalized, i.e., (cid:107) h (cid:107) = 1 . Since tr { Σ } isproportional to σ , we have η = ρζ (39) −7 −6 −5 −4 −3 −2 −1 Signal−to−noise ratio (dB) B i t e rr o r r a t e Interpft−OFDMIFFT−ZF M = 8IFFT−MMSE M = 8IFFT−ML M = 4SIFFT−PIS M = 8 Fig. 10. Comparison of joint channel estimation and decoding algorithms. where the typical value of ζ is ∼ and roughly independentof M .Therefore, it can be naturally concluded that if the designedpilot symbols are transmitted through the normalized sparsemultipath channel with AWGN, then the estimator (cid:98) h is anapproximately sparse vector with the parameter η = ρζ , where ρ denotes the transmitted SNR, the typical value of ζ is ∼ and roughly independent of M .A PPENDIX BP ROOF OF D IVERSITY O RDER OF

ML D

ECODING FOR A S PARSE M ULTIPATH C HANNEL

Proof:

Assume the ideal channel state information isknown at the receiver. Denote (cid:99) X l as the estimation of X l with the ML decoding, then the SER of the ML decodingconditioned on H l is written as Pr (cid:8) (cid:99) X l (cid:54) = X l (cid:12)(cid:12) H l (cid:9) . Ac-cording to the ML decoding, for the transmitted symbol X l and a distinct symbol X (cid:48) l from the symbol constellation, if (cid:13)(cid:13) Y l − H l X l (cid:13)(cid:13) > (cid:13)(cid:13) Y l − H l X (cid:48) l (cid:13)(cid:13) holds, then the symbolerror occurs. Since the event (cid:99) X l (cid:54) = X l is equivalent to (cid:83) X (cid:48) l ∈ X M , X (cid:48) l (cid:54) = X l (cid:13)(cid:13) Y l − H l X l (cid:13)(cid:13) > (cid:13)(cid:13) Y l − H l X (cid:48) l (cid:13)(cid:13) [31], it isnot difﬁcult to derive a lower bound of the SER as Pr (cid:8) (cid:99) X l (cid:54) = X l (cid:12)(cid:12) H l (cid:9) (cid:62) max X (cid:48) l ∈ X M , X (cid:48) l (cid:54) = X l Pr (cid:110)(cid:13)(cid:13) Y l − H l X l (cid:13)(cid:13) > (cid:13)(cid:13) Y l − H l X (cid:48) l (cid:13)(cid:13) (cid:12)(cid:12)(cid:12) H l (cid:111) (40)and an upper bound of the SER as Pr (cid:8) (cid:99) X l (cid:54) = X l (cid:12)(cid:12) H l (cid:9) (cid:54) (cid:88) X (cid:48) l ∈ X M , X (cid:48) l (cid:54) = X l Pr (cid:110)(cid:13)(cid:13) Y l − H l X l (cid:13)(cid:13) > (cid:13)(cid:13) Y l − H l X (cid:48) l (cid:13)(cid:13) (cid:12)(cid:12)(cid:12) H l (cid:111) (41)Furthermore, the total number of elements in the symbolconstellation X M is RM , then the upper bound (41) can be further simpliﬁed as Pr (cid:8) (cid:99) X l (cid:54) = X l (cid:12)(cid:12) H l (cid:9) (cid:54) (cid:0) RM − (cid:1) × max X (cid:48) l ∈ X M , X (cid:48) l (cid:54) = X l Pr (cid:110)(cid:13)(cid:13) Y l − H l X l (cid:13)(cid:13) > (cid:13)(cid:13) Y l − H l X (cid:48) l (cid:13)(cid:13) (cid:12)(cid:12)(cid:12) H l (cid:111) (42)Therefore, the upper and low bounds of the SER have thesame tendency and only differ by a constant multiplier. Denote e l = X (cid:48) l − X l , then we have Pr (cid:110)(cid:13)(cid:13) Y l − H l X l (cid:13)(cid:13) > (cid:13)(cid:13) Y l − H l X (cid:48) l (cid:13)(cid:13) (cid:12)(cid:12)(cid:12) H l (cid:111) = Q (cid:16) (cid:107) H l e l (cid:107) σ (cid:17) (43)where the Q -function is deﬁne as Q ( x ) = √ π (cid:82) + ∞ x e − t d t .Substituting (8) into (43) and note that the (cid:96) distance does notchange after the unitary transformation, (43) can be simpliﬁedas Pr (cid:110)(cid:13)(cid:13) Y l − H l X l (cid:13)(cid:13) > (cid:13)(cid:13) Y l − H l X (cid:48) l (cid:13)(cid:13) (cid:12)(cid:12)(cid:12) H l (cid:111) = Q (cid:16) (cid:107) H l U l e l (cid:107) σ (cid:17) (44)Consider the sparse channel h has only K i.i.d. nonzero tapsand each nonzero entry is a complex Gaussian random variablewith zero mean and unit variance. Recall that J is the setof coordinates of the nonzero taps. Suppose j , j , . . . , j K − are the K entries in J with the ascending order (cid:54) j (cid:13)(cid:13) Y l − H l X (cid:48) l (cid:13)(cid:13) (cid:12)(cid:12)(cid:12) H l (cid:111) = Q (cid:16) (cid:101) h H E H l E l (cid:101) h σ (cid:17) (45)where E l = diag { U l e l } (cid:101) F l . Denote r l as the rank of E H l E l ,i.e., r l = rank (cid:0) E H l E l (cid:1) , and λ ,l , λ ,l , . . . , λ r l − ,l are the r l nonzero eigenvalues corresponding to such positive semideﬁ-nite Hermitan matrices E H l E l with the descending order λ ,l (cid:62) λ ,l (cid:62) . . . (cid:62) λ r l − ,l > . Denote the constellation of pairwiseerror E = (cid:8) e l (cid:12)(cid:12) e l = X (cid:48) l − X l (cid:54) = , X l ∈ X M , X (cid:48) l ∈ X M (cid:9) .After averaging over the complex Gaussian random channel (cid:101) h , an upper bound of the SER (42) can be calculated by theChernoff bound as [12], [30], [32] Pr (cid:8) (cid:99) X l (cid:54) = X l (cid:9) (cid:54) (cid:0) RM − (cid:1) max e l ∈E E (cid:110) e − (cid:101) h H E H l E l (cid:101) h σ (cid:111) (cid:54) (cid:0) RM − (cid:1)(cid:104)(cid:16) r l min − (cid:89) i =0 λ i,l (cid:17) rl min σ (cid:105) − r l min (46)where r l min (cid:44) min e l ∈E r l . Denote r min (cid:44) min l ∈L D r l min and L D (cid:44) (cid:8) l (cid:12)(cid:12) (cid:54) l (cid:54) L − , l (cid:54) = 0 , LP , . . . , P − P L (cid:9) . Hence, the SNRof the ML decoding is exponentially less than or equal to theminimum rank of E H l E l , i.e., P ML ˙ (cid:54) ρ − r min .To derive a lower bound of the SER, (40) can be further calculated by the recursion of the integration by parts Pr (cid:8) (cid:99) X l (cid:54) = X l (cid:9) (cid:62) max e l ∈E E (cid:110) Q (cid:16) (cid:101) h H E H l E l (cid:101) h σ (cid:17)(cid:111) (cid:62) max e l ∈E (cid:115) λ ,l λ ,l + 4 σ ∞ (cid:88) k = r l (2 k − k +1 k !(1 + λ ,l σ ) k = (cid:115) λ ,l λ ,l + 4 σ ∞ (cid:88) k = r l min (2 k − k +1 k !(1 + λ ,l σ ) k (47)where the factorial k ! = 1 × × · · · × k , and the doublefactorial (2 k − × × · · · × (2 k − . Then, the SNRof the ML decoding is exponentially greater than or equal tothe minimum rank of E H l E l , i.e., P ML ˙ (cid:62) ρ − r min .According to (46) and (47), we conclude that the diversityorder of the ML decoding is equal to the minimum rank of E H l E l among all non-all-zero e l and all l ∈ L D , which is alsodenoted by r min = min l ∈L D min e l ∈E rank (cid:0) E H l E l (cid:1) . Note that rank (cid:0) E H l E l (cid:1) = rank ( E l ) = rank (cid:0) diag { U l e l } (cid:101) F l (cid:1) (48)Now, we consider two matrices diag { U l e l } and (cid:101) F l sep-arately. Note that U l = F M Λ l and Λ l is a rotation matrixevolved by V-OFDM modulation itself. According to Theorem2 in [14], for the pulse-amplitude modulation (PAM) orBPSK modulation, min e l (cid:54) =0 rank (diag { U l e l } ) = M for l =1 , , . . . , L − , while for the quadrature amplitude modulation(QAM) or quadrature phase-shift keying (QPSK) modulation, min e l (cid:54) =0 rank (diag { U l e l } ) = M for l = 1 , , . . . , L − , L +1 , . . . , L − . In other words, for the conventional modulation, diag { U l e l } has full rank in most subchannels. Furthermore,from the previous analysis in Section II B, the P subchannelswith the indices , LP , . . . , L − LP are allocated to transmitpilot symbols. Hence, diag { U l e l } always has full rank forthe channels allocated to transmit data symbols.On the other hand, (cid:101) F l can be constructed as (cid:2) (cid:101) F , (cid:101) F , . . . , (cid:101) F K − (cid:3) , where the column vector (cid:101) F q = (cid:2) e − j πN lj q , e − j πN ( l + L ) j q , . . . , e − j πN [ l +( M − L ] j q (cid:3) T .Recall that I is the coordinates of the nonzero taps modulo M and i , i , . . . , i κ − are the κ entries in I with the ascendingorder (cid:54) i < i < . . . < i κ − (cid:54) M − . According to (27), ∀ q ∈ { , , . . . , K − } , ∃ p ∈ { , , . . . , κ − } and an integer k such that j q = i p + kM . Then, (cid:101) F q = e − j πL k (cid:98) F p , where (cid:98) F p = (cid:2) e − j πN li p , e − j πN ( l + L ) i p , . . . , e − j πN [ l +( M − L ] i p (cid:3) T .If there exists a distinct q (cid:48) and an integer k (cid:48) such that j q (cid:48) = i p + k (cid:48) M still holds, then (cid:101) F q (cid:48) = e − j πL k (cid:48) (cid:98) F p = e − j πL ( k (cid:48) − k ) (cid:101) F q . Hence, the vectors (cid:101) F q and (cid:101) F q (cid:48) are linearly dependent of (cid:98) F p . Since there areonly κ such p in total, if let (cid:98) F l = (cid:2) (cid:98) F , (cid:98) F , . . . , (cid:98) F κ − (cid:3) ,we have rank (cid:0)(cid:101) F l (cid:1) = rank (cid:0)(cid:98) F l (cid:1) . Since the vectors in eachcolumn of a DFT matrix are linearly independent, thevector columns (cid:98) F p , ∀ p ∈ { , , . . . , κ − } which are equalto the i p th columns of the M -point FFT matrix withoutnormalization while multiplied by the factor e − j πN li p arealso linearly independent. Then, the maximal number oflinearly independent columns of (cid:98) F l is κ . Thus, we have rank (cid:0)(cid:101) F l (cid:1) = rank (cid:0)(cid:98) F l (cid:1) = κ .Since diag { U l e l } is an invertible matrix, rank (cid:0) diag { U l e l } (cid:101) F l (cid:1) = rank (cid:0)(cid:101) F l (cid:1) = κ . As a result, itcan be concluded that the diversity order of the ML decodingfor a sparse multipath channel is κ .A PPENDIX CP ROOF OF P X ( M ) (cid:54) P ML Proof:

Denote the estimation symbol sequence (cid:99) X (cid:48) l asthe conventional ML decoding of X l . In contrast to the PISdecoding, the SER of the ML decoding can be written as P ML = Pr (cid:8) (cid:99) X (cid:48) l (cid:54) = X l (cid:12)(cid:12) X l ∈ X M (cid:9) (49)Since X ( M ) ⊆ X M , we have P X ( M ) = Pr (cid:8) (cid:99) X l (cid:54) = X l (cid:12)(cid:12) X l ∈ X ( M ) (cid:9) = 1 − Pr (cid:110) X l = arg min X ( M ) ∈X ( M ) (cid:13)(cid:13) Y l − H l X ( M ) (cid:13)(cid:13) (cid:12)(cid:12)(cid:12) X l ∈ X ( M ) (cid:111) (cid:54) − Pr (cid:110) X l = arg min X ( M ) ∈ X M (cid:13)(cid:13) Y l − H l X ( M ) (cid:13)(cid:13) (cid:12)(cid:12)(cid:12) X l ∈ X ( M ) (cid:111) = 1 − Pr (cid:110) X l = arg min X ( M ) ∈ X M (cid:13)(cid:13) Y l − H l X ( M ) (cid:13)(cid:13) (cid:12)(cid:12)(cid:12) X l ∈ X M (cid:111) = Pr (cid:8) (cid:99) X (cid:48) l (cid:54) = X l (cid:12)(cid:12) X l ∈ X ( M ) (cid:9) = P ML (50)A PPENDIX DP ROOF OF T HEOREM Proof:

Assume sparse channel h has only K i.i.d. nonzerotaps and each channel coefﬁcient follows complex Gaussianrandom distribution. It is not difﬁcult to ﬁnd that the nonzeroentries in H l are also complex Gaussian random variables butthe variances may not be equal. Recall that S ( m ) was deﬁnedin (28) that all possible symbol sequences lying in the certainsphere of radius r around the received signal Y ( m ) l . S ( m ) † isthe correct symbol sequence corresponding to the transmittedsymbols. d ( m ) is the distance between Y ( m ) l and H ( m ) l S ( m ) † ,i.e., d ( m ) = (cid:12)(cid:12) Y ( m ) l − H ( m ) l S ( m ) † (cid:12)(cid:12) . Since the noise Ξ l in (6)is complex AWGN, d ( m ) is Rayleigh distributed with mean √ π σ and variance − π σ . According to the triangle inequality, ∀ S ∈ X κ , we have (cid:12)(cid:12) Y ( m ) l − H ( m ) l S (cid:12)(cid:12) + (cid:12)(cid:12) Y ( m ) l − H ( m ) l S ( m ) † (cid:12)(cid:12) (cid:62) (cid:12)(cid:12) H ( m ) l (cid:0) S − S ( m ) † (cid:1)(cid:12)(cid:12) (51)Let S ( m ) † be the set of a type of symbol sequence S suchthat the distance between H ( m ) l S and H ( m ) l S ( m ) † is less thanor equal to the sphere of radius r + d ( m ) , i.e., S ( m ) † = (cid:110) S (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) H ( m ) l (cid:0) S − S ( m ) † (cid:1)(cid:12)(cid:12) (cid:54) r + d ( m ) (cid:111) (52)Compared with S ( m ) deﬁned in (28), it can be found that S ( m ) ⊆ S ( m ) † . If r is chosen exponentially equal to (cid:113) κρ ln ρ such that Lemma 1 is satisﬁed, then r → as ρ → ∞ . Due toRayleigh distribution, d ( m ) → with probability as ρ → ∞ .Since the probability density function of each nonzero entry in H l is a complex Gaussian function, for any given S ∈ S ( m ) † ,if S (cid:54) = S ( m ) † , we have Pr (cid:110)(cid:12)(cid:12) H ( m ) l (cid:0) S − S ( m ) † (cid:1)(cid:12)(cid:12) (cid:54) r + d ( m ) (cid:12)(cid:12)(cid:12) S (cid:54) = S ( m ) † (cid:111) = 1 − e − ( r + d ( m ) ) (cid:107) S − S ( m ) † (cid:107) (53)It is found that for ρ → ∞ , r + d ( m ) → such that(53) approximates to . Since S ( m ) † has only a bounded ﬁniteelements, we have Pr (cid:8) S (cid:54) = S ( m ) † (cid:12)(cid:12) S ∈ S ( m ) † (cid:9) = 0 , thus, S ( m ) † has only one entry S ( m ) † with probability when ρ → ∞ .Obviously, S ( m ) † ∈ S ( m ) with probability 1 as ρ → ∞ . Since S ( m ) is a subset of S ( m ) † , S ( m ) also has only one entry S ( m ) † with probability .For any given small sphere radius r , Lemma 1 is satisﬁedas ρ → ∞ such that the PIS decoding achieves the samemultipath diversity order as the ML decoding, which is equalto κ . Furthermore, the cardinality (cid:12)(cid:12) X ( m ) (cid:12)(cid:12) of the set X ( m ) of possible symbol sequences in Algorithm 3 remains ineach iteration that the complexity for updating X ( m ) S canbe reduced signiﬁcantly. For (cid:12)(cid:12) X ( m ) (cid:12)(cid:12) = 1 , the evaluationand comparison operations are performed O ( κ ) times in the m th iteration. Consider M iterations and L subchannels, thecomplexities of the evaluation and comparison operations are O ( κLM ) with probability . In the previous analysis ofthe PIS decoding, the complexity of complex multiplicationoperation is O ( κLM Rκ ) . Since the evaluation and com-parison operations for a complex number are faster than acomplex multiplication operation, the total complexity of thePIS decoding is O ( κLM Rκ ) with probability .R EFERENCES[1] L. J. Cimini, “Analysis and simulation of a digital mobile channel usingorthogonal frequency division multiplexing,”

IEEE Trans. Commun. , vol.COM-33, no. 7, pp. 665 − IEEE Commun.Mag. , vol. 47, no. 4, pp. 44 −

51, Apr. 2009.[3] T. Hwang, C. Yang, G. Wu, S. Li, and G. Y. Li, “OFDM and its wirelessapplications: a survey,”

IEEE Trans. Veh. Technol. , vol. 58, no. 4, pp.1673 − IEEE Trans. Commun. , vol. 53, no.3, pp. 391 − IEEE SignalProcess. Mag. , vol. 25, no. 5, pp. 37 −

56, Sep. 2008.[6] D. Falconer, S. L. Ariyavisitakul, A. Benyamin-Seeyar, and B. Eidson,“Frequency domain equalization for single-carrier broadband wirelesssystems,”

IEEE Commun. Mag. , vol. 40, no. 4, pp. 58 −

66, Apr. 2002.[7] X.-G. Xia, “Precoded and vector OFDM robust to channel spectralnulls and with reduced cyclic preﬁx length in single transmit antennasystems,”

IEEE Trans. Commun. , vol. 49, no. 8, pp. 1363 − IEEE Trans. Signal Process. , vol.60, no. 10, pp. 5268 − IEEE Trans. Mobile Comput. , vol.1, no. 2, pp. 132 − [10] H. Zhang, X.-G. Xia, L. J. Cimini, and P. C. Ching, “Synchroniza-tion techniques and guard-band-conﬁguration scheme for single-antennavector-OFDM systems,” IEEE Trans. Wireless Commun. , vol. 4, no. 5,pp. 2454 − IEEE Trans. Veh. Technol. , vol.55, no. 4, pp. 1447 − IEEE Trans. Commun. , vol. 58, no. 3, pp. 828 − IEEE Trans. Commun. , vol. 62, no. 6, pp. 1931 − IEEE Trans. Commun. ,vol. 59, no. 7, pp. 1878 − IEEE Trans. Signal Process. , vol. 62, no. 23, pp. 6143 − IEEE J. SelectAreas Commun. , vol. 20, no. 9, pp. 1613 − Proc. IEEE , vol. 83,no. 6, pp. 958 − IEEE Trans. Signal Process. ,vol 58, no. 3, pp. 1708 − IEEE Trans. Commun. , vol.50, no. 3, pp. 374 − IEEE Signal Process.Lett. , vol. 12, no. 1, pp. 52 −

55, Jan. 2005.[21] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressed channel sensing: a new approach to estimating sparse multipath chan-nels,”

Proc. IEEE , vol. 98, no. 6, pp. 1058 − IEEE Trans. Inf. Theory ,vol. 57, no. 10, pp. 6619 − IEEE Trans. Signal Process. , vol. 53, no. 8, pp.2806 − IEEE Trans. Signal Process. , vol. 53, no. 8, pp. 2819 − IEEE Trans. Signal Process. , vol. 62, no. 9,pp. 2212 − IEEE Trans. Signal Process. , vol. 62, no. 14, pp.3591 − Proc. ACM-SIAM Symposiumon Discrete Algorithms , pp. 1183 − Proc. ACM Symposium on Theory of Computing ,pp. 563 − IEEE Trans. Inf. Theory , vol. 49,no. 5, pp. 1073 − IEEE Trans. Inf. Theory , vol. 44, no. 2, pp. 744 − IEEE J. Sel. Areas Commun. ,vol. 30, no. 9, pp. 1623 − IEEE Trans. Inf. Theory , vol. 50,no. 10, pp. 2331 −−