[PDF] Cross Domain Iterative Detection for Orthogonal Time Frequency Space Modulation

Abstract

Recently proposed orthogonal time frequency space (OTFS) modulation has been considered as a promising candidate for accommodating various emerging communication and sensing applications in high-mobility environments. In this paper, we propose a novel cross domain iterative detection algorithm to enhance the error performance of OTFS modulation. Different from conventional OTFS detection methods, the proposed algorithm applies basic estimation/detection approaches to both the time domain and delay-Doppler (DD) domain and iteratively updates the extrinsic information from two domains with the unitary transformation. In doing so, the proposed algorithm exploits the time domain channel sparsity and the DD domain symbol constellation constraints. We evaluate the estimation/detection error variance in each domain for each iteration and derive the state evolution to investigate the detection error performance. We show that the performance gain due to iterations comes from the non-Gaussian constellation constraint in the DD domain. More importantly, we prove the proposed algorithm can indeed converge and, in the convergence, the proposed algorithm can achieve almost the same error performance as the maximum-likelihood sequence detection even in the presence of fractional Doppler shifts. Furthermore, the computational complexity associated with the domain transformation is low, thanks to the structure of the discrete Fourier transform (DFT) kernel. Simulation results are consistent with our analysis and demonstrate a significant performance improvement compared to conventional OTFS detection methods.

Full PDF

aa r X i v : . [ c s . I T ] J a n Cross Domain Iterative Detection for OrthogonalTime Frequency Space Modulation

Shuangyang Li,

Student Member, IEEE,

Weijie Yuan,

Member, IEEE,

Zhiqiang Wei,

Member, IEEE, and Jinhong Yuan,

Fellow, IEEE

Abstract

Recently proposed orthogonal time frequency space (OTFS) modulation has been considered as apromising candidate for accommodating various emerging communication and sensing applications in high-mobility environments. In this paper, we propose a novel cross domain iterative detection algorithm toenhance the error performance of OTFS modulation. Different from conventional OTFS detection methods,the proposed algorithm applies basic estimation/detection approaches to both the time domain and delay-Doppler (DD) domain and iteratively updates the extrinsic information from two domains with the unitarytransformation. In doing so, the proposed algorithm exploits the time domain channel sparsity and the DDdomain symbol constellation constraints. We evaluate the estimation/detection error variance in each domainfor each iteration and derive the state evolution to investigate the detection error performance. We showthat the performance gain due to iterations comes from the non-Gaussian constellation constraint in the DDdomain. More importantly, we prove the proposed algorithm can indeed converge and, in the convergence,the proposed algorithm can achieve almost the same error performance as the maximum-likelihood sequencedetection even in the presence of fractional Doppler shifts. Furthermore, the computational complexityassociated with the domain transformation is low, thanks to the structure of the discrete Fourier transform(DFT) kernel. Simulation results are consistent with our analysis and demonstrate a signiﬁcant performanceimprovement compared to conventional OTFS detection methods.

Index Terms

Orthogonal time frequency space, reduced-complexity detection, cross domain detection, state evolution,performance analysis.

I. I

NTRODUCTION

Orthogonal time frequency space (OTFS) modulation [1] has attracted substantial attention re-cently, due to its robust performance over the high-mobility environments. In particular, beyond

January 12, 2021 DRAFT the ﬁfth-generation (B5G) wireless communication systems are required to accommodate variousemerging applications in high-mobility environments, such as mobile communications on boardaircraft (MCA), low-earth-orbit satellites (LEOSs), high speed trains, and unmanned aerial vehicles(UAVs) [2], [3]. In those cases, the performance of currently deployed orthogonal frequency divisionmultiplexing (OFDM) modulation may degrade because the signiﬁcant Doppler spread induced bythe high-mobility can severely undermine the orthogonality between subcarriers. Therefore, OTFSmodulation has been recognized as a potential solution to supporting heterogeneous requirements ofB5G wireless systems in high-mobility scenarios [1].Different from the conventional OFDM modulation, OTFS modulation considers the delay-Doppler(DD) domain signal representation instead of the time-frequency (TF) domain, where the channelresponses are relatively sparse and robust [1]. The DD domain channel property gives rise to thesymbol placement in the DD domain which allows the information symbols directly interact withthe DD domain channel response, resulting in a much simpler input-output relationship comparedto that of the OFDM modulation in high-mobility channels [4]. Furthermore, by invoking the two-dimensional (2D) symplectic ﬁnite Fourier transform (ISFFT), each DD domain symbol spreads ontothe whole TF domain and thus principally experiences the whole ﬂuctuations of the TF channel overan OTFS frame. Therefore, OTFS modulation offers the potential of achieving the full channeldiversity in a high-mobility environment [4], [5].Although OTFS modulation shows many advantages over the conventional OFDM modulation,it requires complex detection algorithms in order to achieve the potential full channel diversity.With the help of cyclic preﬁx (CP), a single-tap frequency domain equalization is usually sufﬁcientfor OFDM modulation, where the intersymbol interference (ISI) due to the multipath effect canbe largely mitigated at a cost of the signaling overhead [6]. In contrast, the DD domain channelproperty enables a reduced CP frame structure for OTFS modulation, where the interference dueto the channel impairments has to be equalized via advanced algorithms [7]. Many existing studiesfocused on the low-complexity detection for OTFS modulation. In [8], the authors developed aniterative receiver based on the classic message passing algorithm (MPA), where the interferencefrom other information symbols is treated as Gaussian variables to reduce the detection complexity.However, due to the short cycles on probabilistic graphical model, MPA may fail to converge andresults in performance degradation. To solve this issue, the authors of [9] proposed a convergenceguaranteed receiver based on the variational Bayes framework. An approximate message passing(AMP)-based approach was developed in [10], which can be efﬁciently implemented and can obtain

January 12, 2021 DRAFT the minimum mean square error (MMSE) detection performance with a reduced complexity. Wenotice that most of the existing works on the OTFS detection take advantage of the DD domainchannel sparsity to reduce the detection complexity. However, when an OTFS frame duration isnot sufﬁciently long, the resultant DD domain effective channel can be dense due to the insufﬁcientresolution of the Doppler frequency, i.e., fractional Doppler [8]. In such a case, conventional detectionmethods may experience a very high detection complexity since the channel sparsity no longer holds.This fact motivates us to consider the OTFS detection based on different domains.In this paper, we propose a novel cross domain iterative detection algorithm for OTFS modulation,where the extrinsic information is passed between the time domain and DD domain via the cor-responding unitary transformations. This special detection structure is motivated by the connectionbetween the orthogonality and message passing based on the view from the orthogonal approximatemessage passing (OAMP) algorithm proposed by Ma and Li [11]. In particular, the rationale behindour work is that the unitary transformation between the time domain and DD domain allows thedetection/estimation errors in one domain to be principally orthogonal to the detection/estimationerrors in the other domain, which can suppress the correlation between the input errors and the outputerrors for each domain detection/estimation [11], [12]. Therefore, the detection/estimation instabilitydue to the positive feedback effect, which usually caused by the correlation between the input andoutput errors for the iterative receivers, can be greatly reduced during iterations, and this stability inreturn improves the error performance [11], [12]. In speciﬁc, we apply conventional linear minimummean squared error (L-MMSE) estimator for equalization in the time domain, while adopt a simple symbol-by-symbol detection algorithm in the DD domain. Interestingly, by combining such two basicmethods, the proposed algorithm shows promising error performance even in very severe and complexfractional Doppler cases. Note that conventional iterative receiver improves the error performance byiteratively exchanging the extrinsic information between two disjoint components, such as the detectorand decoder, via interleaving/de-interleaving. In contrast, our proposed iterative algorithm exchangesthe extrinsic information within the same component (detector) but between two orthogonal domainsvia unitary transformation. In other words, we separate the OTFS detection problem into two parts,corresponding to the time domain (performing de-correlation) and the DD domain (performing de-noising) and iteratively exchange the extrinsic information via unitary transformation. In particular,we provide a detailed proof that explains the advantage of the proposed algorithm and brieﬂy discussits detection complexity. The main contributions of this paper are summarized as follows. • Based on the time domain sparsity, we propose a cross domain iterative detection algorithm that

January 12, 2021 DRAFT works in both the time domain and DD domain. Furthermore, according to the unitary propertyof the domain transformation, the details of cross domain message passing are discussed. We alsoprove that the symbol-by-symbol DD domain detection cannot provide any extrinsic informationitself and therefore the extrinsic information need to be calculated in the time domain. • We provide theoretical analysis for the MMSE performance of the proposed algorithm basedon the state evolution [12]. In particular, we derive the average MMSE per iteration and provethat the proposed algorithm can converge after a few iterations. Furthermore, we show thatthe average MMSE for both the time domain and the DD domain share the same valuein the convergence and the error performance improvement brought by the cross domainmessage passing is originated from the non-Gaussian distribution of practical DD domain signalconstellations. • We investigate the effective DD domain signal-to-noise ratio (SNR) in order to study theerror performance in the convergence. We prove that the corresponding effective SNR canapproach the maximum receiver SNR for a given fading channel, i.e., almost all the energyfrom different paths are collected and coherently combined, which indicates that the proposedalgorithm can theoretically approach the error performance of the optimal maximum-likelihoodsequence estimation (MLSE) detection. On the other hand, we show that the computationalcomplexity of the proposed algorithm is much lower compared to that of the MLSE detectionthanks to the efﬁcient implementation based on the discrete Fourier transform (DFT) kernel forcross domain message passing between the time domain and DD domain. We also show thatthe overall detection complexity of the proposed algorithm does not increase in the presence offractional Doppler. • The error performance of the proposed algorithm is evaluated by numerical simulations. Sim-ulation results agree with our theoretical analysis and demonstrate a signiﬁcant performanceimprovement compared to conventional OTFS detection methods.The rest of this paper is organized as follows. We provide a brief overview and the system modelof OTFS modulation in Section II. In Section III, the proposed cross domain iterative detectionalgorithm is presented. The performance analysis of the proposed algorithm is given in Section IVand ﬁnally a summary is provided in Section V.

Notations:

The blackboard bold letter A denotes an energy normalized constellation set, whosesize is X ( A ) ; The blackboard bold letter C denotes the complex number ﬁeld; Boldface capitalsand lower-case letters are used to deﬁne a matrix and a vector, respectively; The blackboard bold January 12, 2021 DRAFT

Modulation ISFFT IFFT ( ) tx g t u x ( ) s t ( ) rx g t FFTSFFTDetection ( ) r t

OTFS ModulationOTFS Demodulation ( ) , h tn Channel Y ˆ u DD Domain TF Domain Time Domain TF X Z TF Y R Fig. 1. An OTFS system model. letter E [ · ] denotes the expectation operator; The notations ( · ) T , ( · ) ∗ , k·k , ( · ) − , and ( · ) H denote thetranspose, the conjugate, the Euclidean norm, the inverse, and the Hermitian operations for a matrix,respectively; F N and I M denote the normalized discrete Fourier transform (DFT) matrix of size N × N and the identity matrix of size M × M , respectively; “ ⊗ ” denotes the Kronecker productoperator; vec ( · ) denote the vectorization operation; Tr ( · ) denotes the trace operation; Pr( · ) denotesthe probability of an event; ∝ represents both sides of the equation are multiplicatively connectedto a constant; f ( x ) denotes an function of x and its second-order derivative with respect to x isdenoted by f ′′ ( x ) ; The big-O notation O ( · ) asymptotically describes the order of computationalcomplexity. II. S YSTEM M ODEL

Without loss of generality, the considered OTFS system diagram is given in Fig. 1. Let N be thenumber of time slots and M be the number of sub-carriers for each OTFS frame, respectively. Let u be the information bit sequence that is modulated into the DD domain transmitted symbol vector x , where x is of length M N , i.e., x ∈ A MN . In speciﬁc, the DD domain transmitted symbol vector x can be arranged as a two-dimensional (2D) matrix X ∈ A M × N , i.e., x ∆ = vec ( X ) and the ( k, l ) -thelement x [ k, l ] of X is the DD domain transmitted symbol in the k -th Doppler and l -th delay grid[1], for ≤ k ≤ N − , and ≤ l ≤ M − . Then, we can obtain the TF domain transmitted symbol X [ n, m ] , ≤ n ≤ N − , and ≤ m ≤ M − , according to X via the ISFFT [1], i.e., X [ n, m ] = 1 √ N M N − X k =0 M − X l =0 x [ k, l ] e j π ( nkN − mlM ) . (1) January 12, 2021 DRAFT

With the TF domain transmitted symbols, the OTFS signal s ( t ) can be produced by the conventionalOFDM modulator, such as s ( t ) = N − X n =0 M − X m =0 X [ n, m ] g tx ( t − nT ) e j πm ∆ f ( t − nT ) , (2)where ∆ f is the frequency spacing between adjacent sub-carriers and g tx ( t ) is the transmitter shapingpulse.We consider the OTFS signal transmitting over a time-varying channel, whose response is fullycharacterized by its DD domain representation [1], i.e., h ( τ, ν ) = P X i =1 h i δ ( τ − τ i ) δ ( ν − ν i ) . (3)In (3), P is the number of paths, h i , τ i , and ν i are the path gain, delay and Doppler shift correspondingto the i -th path, respectively. Speciﬁcally, we denote by l i and k i the indices of delay and Doppler,respectively, where we have τ i = l i M ∆ f , and ν i = k i + κ i N T . (4)Note that the term − ≤ κ i ≤ denotes the fractional Doppler which corresponds to thefractional shift from the nearest Doppler grid [8]. On the other hand, since the typical value of thesampling time M ∆ f in the delay domain is usually sufﬁciently small, the impact of fractionaldelays in typical wide-band systems can be neglected [13].Let us turn our attention to the receiver side. Let w ( t ) be the additive white Gaussian noise(AWGN) process with one-sided power spectral density (PSD) N . The received signal can then bewritten as r ( t ) = Z Z h ( τ, ν ) s ( t − τ ) e j πν ( t − τ ) dτ dν + w ( t ) . (5)Let g rx ( t ) be the ﬁlter adopted at the receiver side. The received symbols Y [ n, m ] in the TF domainare then obtained by Y [ n, m ] = Z r ( t ) g ∗ rx ( t − nT ) e − j πm ∆ f ( t − nT ) d t. (6)By performing SFFT to Y [ n, m ] , we can obtain the DD domain received symbols as y [ k, l ] = 1 √ N M N − X n =0 M − X m =0 Y [ n, m ] e − j π ( nkN − mlM ) + ˜ w [ k, l ] , (7)where ˜ w [ k, l ] denotes the equivalent AWGN samples in the DD domain. January 12, 2021 DRAFT

For the ease of derivation, we are interested in the vector form representation of the input-outputrelationship of the OTFS system in the corresponding domains. We use X ∈ A M × N and Y ∈ C M × N to denote the DD domain transmitted symbol matrix and received symbol matrix, respectively.Furthermore, we use the X TF ∈ C M × N and Y TF ∈ C M × N to denote the TF domain transmittedsymbol matrix and received symbol matrix, respectively. Similarly, the time domain transmittedsymbol matrix and received symbol matrix are respectively denoted by Z ∈ C M × N and R ∈ C M × N .According to [14], two normalized DFT matrices F M and F N of size M × M and N × N can beused to characterize the SFFT in (1). Thus, we have X TF = F M XF H N . (8)Then, by considering the rectangular pulse for the transmitter shaping pulse, the time domaintransmitted symbol matrix can be obtained by [14] Z = I M F H M X TF = XF H N . (9)According to (8) and (9), we can obtain the corresponding vector form of the transmitted symbolsin the DD, TF, and time domains respectively, i.e., x ∆ = vec ( X ) , (10) x TF ∆ = vec ( X TF ) = (cid:0) F H N ⊗ F M (cid:1) x , and (11) z ∆ = vec ( Z ) = (cid:0) F H N ⊗ I M (cid:1) x , (12)respectively. In particular, corresponding to the time domain transmitted symbol vector z , the timedomain effective channel H eﬀT with a reduced CP frame format can be given by [14] H eﬀT = P X i =1 h i Π l i ∆ k i + κ i , (13)where Π is the permutation matrix (forward cyclic shift), i.e., Π =  · · · . . . ... . . . . . . ... · · ·  MN × MN , (14)and ∆ = diag { α , α , ..., α MN − } is a diagonal matrix with α ∆ = e j πMN [14]. Thus, the received timedomain symbol vector r is given by r = H eﬀT z + w , (15) January 12, 2021 DRAFT where w is the corresponding AWGN sample vectors in the time domain. Speciﬁcally, r can berearranged into the 2D time domain received symbol matrix R . By applying the rectangular pulseas the receiver ﬁltering pulse, we can obtain the corresponding TF domain and DD domain receivedsymbol matrices as Y TF = F M I M R = F M R , Y = F H M Y TF F N = RF N . (16)Therefore, we can derive the corresponding vector form of the received symbols in the TF domainand DD domain by y TF ∆ = vec ( Y TF ) = ( I N ⊗ F M ) r , (17) y ∆ = vec ( Y ) = ( F N ⊗ I M ) r . (18)Based on the previous analysis, we are ready to demonstrate the input-output relationship of OTFSmodulation with respect to different domains. Let us denote the effective channel matrices in TFdomain and DD domain by H eﬀTF and H eﬀDD , respectively. In speciﬁc, based on (11), (13), and (17),we have H eﬀTF = P X i =1 h i ( I N ⊗ F M ) Π l i ∆ k i + κ i (cid:0) I N ⊗ F H M (cid:1) . (19)Similarly, based on (10), (13), and (18), we have H eﬀDD = P X i =1 h i ( F N ⊗ I M ) Π l i ∆ k i + κ i (cid:0) F H N ⊗ I M (cid:1) . (20)It can be shown that with the fractional Doppler shifts, the TF domain and DD domain effectivechannel matrices H eﬀTF and H eﬀDD as in (19) and (20) can be very dense. However, the time domaineffective channel matrix H eﬀT in (13) remains to be sparse. Speciﬁcally, there are only at most P non-zero entries in each row and column of H eﬀT . For a better illustration, we show an example of H eﬀDD and H eﬀT in the presence of the fractional Doppler in Fig. 2, where the properties of H eﬀDD and H eﬀT are clearly illustrated. Based on the properties of the effective channel matrices, we are motivatedto consider the detection based on the effective time domain channel H eﬀT as given by (15), insteadof H eﬀTF and H eﬀDD in the presence of fractional Doppler . For notational brevity, we henceforth use z [ k ] to denote the k -th entry in z , where ≤ k ≤ M N − . Our proposed detection method will bepresented in next section. For reference, the effect of fractional Doppler can be largely mitigated by the window design [15].

January 12, 2021 DRAFT

50 100 150 200 250 300 350 400 450 500

Symbol index S y m bo l i nde x (a) DD domain effective channel matrix H eﬀDD .

50 100 150 200 250 300 350 400 450 500

Symbol index S y m bo l i nde x (b) Time domain effective channel matrix H eﬀT .Fig. 2. Equivalent channel matrices for DD domain and time domain, where P = 5 and the channel coefﬁcients, delay indicesand Doppler indices are [0 .

28 + 0 . i, − .

36 + 0 . i, − . − . i, − . − . i, − . − . i ] , [0 , , , , , and [7 . , . , − . , . , − . , respectively. The lines in the ﬁgures indicate the magnitudes of the corresponding matrix elements. x ofSymbol-by-symbol Detection ofTime DomainEstimation z Module B Module A H N M ˜ F I

Ext Ext

N M ˜ F I effT , r H DD domain Time domain ,T ,T , e e z z m C ,DD ,DD , e e z z m C ,DD ,DD , p p z z m C ,DD ,DD , p p x x m C ,DD ,T a e = z z C C ,DD ,T a e = x x m m ,T e x m ,T ,T , p p z z m C ,T ,DD a e = z z m m ,T ,DD a e = z z C C

Fig. 3. The diagram of the proposed OTFS detector.

III. C

ROSS D OMAIN I TERATIVE D ETECTION FOR

OTFS M

ODULATION

Notice that information symbols are multiplexed on the DD domain while the channel sparsityholds in the time domain. Therefore, we propose an iterative detector which performs de-correlationin the time domain to eliminate the effects of fading, multi-path, and Doppler, while performingde-noising in the DD domain to reduce the inaccuracy due to the noise. In particular, we assumethat the entries in x independently take values from a normalized constellation set A with equalprobabilities, and thus we have E (cid:2) xx H (cid:3) = I MN . Then, it can be shown that entries in the time January 12, 2021 DRAFT0 domain OTFS signal vector z are also independent from each other, i.e., E (cid:2) zz H (cid:3) = (cid:0) F H N ⊗ I M (cid:1) E (cid:2) xx H (cid:3) ( F N ⊗ I M ) = I MN . (21)Notice that the time domain OTFS signal usually behaves like a Gaussian variable due to thespreading effect of ISFFT. Therefore, we assume that the entries in z are independent and identicallydistributed (i.i.d.) Gaussian variables with a unit variance according to (21). We consider a detectionstructure that consists of two individual modules corresponding to the time domain and the DDdomain, namely, module A and module B, as shown in Fig. 3. In speciﬁc, module A aims toestimate the time domain OTFS signals z and passes the estimates to module B for the detection ofDD domain OTFS symbols x . On the other hand, module B carries out a simple symbol-by-symboldetection for the DD domain transmitted symbol vector x according to the estimates of time domainOTFS signal z . In Fig. 3, “Ext” denotes the calculation of extrinsic information, while “ F N ⊗ I M ”and “ F H N ⊗ I M ” denote the corresponding unitary transformation from the time domain to the DDdomain and from the DD domain to the time domain, respectively. For notational brevity, we useseveral notations denoting the a priori , a posteriori , and extrinsic information of the means (denotedby m ) and covariance matrices (denoted by C ) for the time domain OTFS signal vector z and DDdomain symbol vector x , respectively, as shown in Table I. The details of the two modules and the TABLE IN

OTATIONS FOR P ROPOSED A LGORITHM P ARAMETERS

Time Domain DD Domain a priori a posteriori extrinsic (from) a priori a posteriori extrinsic (from) z m a, T z , C a, T z m p, T z , C p, T z m e, T z , C e, T z m a, DD z , C a, DD z m p, DD z , C p, DD z m e, DD z , C e, DD z x m a, T x , C a, T x m p, T x , C p, T x m e, T x , C e, T x m a, DD x , C a, DD x m p, DD x , C p, DD x m e, DD x , C e, DD x cross domain message passing will be made explicit in the coming subsections. A. Module A: L-MMSE Estimator for Time Domain OTFS Signal

To estimate the time domain OTFS signal z , we apply the conventional L-MMSE estimator inmodule A. In speciﬁc, the estimation is based on the time domain received symbol vector r andthe time domain effective channel H eﬀT with the aid of the a priori information, m a, T z and C a, T z ,that is fed back from module B. It should be noted that C a, T z is a diagonal matrix due to the i.i.d. January 12, 2021 DRAFT1 assumption and it is initialized as I MN for the ﬁrst iteration. Therefore, it is straightforward to obtainthe L-MMSE estimation matrix W MMSE as [16] W MMSE = C a, T z (cid:0) H eﬀT (cid:1) H (cid:16) H eﬀT C a, T z (cid:0) H eﬀT (cid:1) H + N I MN (cid:17) − . (22)Furthermore, it can be shown that the a posteriori estimation output m p, T z of z is given by m p, T z = m a, T z + W MMSE (cid:0) r − H eﬀT m a, T z (cid:1) = m a, T z + C a, T z (cid:0) H eﬀT (cid:1) H (cid:16) H eﬀT C a, T z (cid:0) H eﬀT (cid:1) H + N I MN (cid:17) − (cid:0) r − H eﬀT m a, T z (cid:1) , (23)and the a posteriori covariance matrix C p, T z associated with z is given by C p, T z = C a, T z − C a, T z (cid:0) H eﬀT (cid:1) H (cid:16) H eﬀT C a, T z (cid:0) H eﬀT (cid:1) H + N I MN (cid:17) − H eﬀT C a, T z . (24)It should be noted that the diagonal entries of C p, T z are the a posteriori MSEs of the estimates of z after the L-MMSE estimation, while the non-diagonal entries can be discarded (treated as zeros),because only the diagonal entries are of interest according to the i.i.d. assumption [12]. The detailsof the L-MMSE estimation are summarized in Algorithm 1. Algorithm 1

L-MMSE Estimation for Time Domain OTFS Signal z Input: r , H eﬀT , m a, T z and C a, T z Steps: Compute the L-MMSE estimator matrix W MMSE by (22). Calculate the estimation output m p, T z by (23). Calculate the MSE matrix C p, T z by (24). return m p, T z and C p, T z . B. Cross Domain Message Passing: from Time Domain to DD Domain

Based on the time domain L-MMSE estimation, we can obtain the a posteriori information of timedomain OTFS signal vector z . However, in order to perform the iterative detection, it is important topass the extrinsic information rather than the a posteriori information between two the modules. Letus deﬁne m e, T z and C e, T z as the extrinsic mean and covariance matrix from the L-MMSE estimation.Then, we have [16] C e, T z = (cid:16)(cid:0) C p, T z (cid:1) − − (cid:0) C a, T z (cid:1) − (cid:17) − , (25) m e, T z = C e, T z (cid:16)(cid:0) C p, T z (cid:1) − m p, T z − (cid:0) C a, T z (cid:1) − m a, T z (cid:17) . (26) January 12, 2021 DRAFT2

According to our DD domain detection formulation as will be discussed in the coming subsection, weneed to obtain the DD domain a priori mean of x , i.e., m a, DD x , and DD domain a priori covariancematrix of z , i.e., C a, DD z . In speciﬁc, based on the time domain extrinsic mean of z , we can obtainthe time domain extrinsic mean of x , which will then be forwarded to module B as the a priori information, i.e., m a, DD x = m e, T x = ( F N ⊗ I M ) m e, T z . (27)On the other hand, the extrinsic covariance matrix C e, T z will be directly passed to module B as apriori information as well. We have C a, DD z = C e, T z . With m a, DD x and C a, DD z in hand, the DD domaindetection is now ready to perform. C. Module B: Symbol-by-symbol Detection for DD Domain Symbols

By considering the relationship between the time domain OTFS signals z and DD domain OTFSsymbols x in (12), we can formulate the DD domain detection problem by m e, T z = (cid:0) F H N ⊗ I M (cid:1) x + ˆw , (28)where ˆw are white Gaussian noise samples with zero mean and the variance of the k -th entry in ˆw is C a, DD z [ k, k ] . The justiﬁcation of (28) is necessary and it is discussed in Appendix A. Accordingto the ML detection rule, the detection output ˆx should satisfy ˆx = arg max x Pr (cid:0) m e, T z | x (cid:1) , (29)where Pr (cid:0) m e, T z | x (cid:1) ∝ exp  − (cid:16) C a, DD z (cid:17) (cid:13)(cid:13) m e, T z − (cid:0) F H N ⊗ I M (cid:1) x (cid:13)(cid:13)  . (30)In particular, the probability factorization based on the form of (30) is referred to as the Forneyobservation model, which was ﬁrstly introduced by Forney in [17]. The MLSE detection complexityof the Forney observation model based on (30) is exponential to the number of non-zero entries ineach row of the matrix F H N ⊗ I M . However, as the DFT matrix F N is a dense matrix, the detectioncomplexity of the Forney observation model can be very high. Therefore, we consider a differentprobability factorization that is equivalent to (30) but only requires a linear detection complexity bytaking advantage of the unitary property of the matrix F H N ⊗ I M . January 12, 2021 DRAFT3

By expanding the norm operation and noticing that ( F N ⊗ I M ) (cid:0) F H N ⊗ I M (cid:1) = I MN , (30) can beequivalently expressed by Pr (cid:0) m e, T z | x (cid:1) ∝ exp  − (cid:16) C a, DD z (cid:17) (cid:16)(cid:13)(cid:13) m e, T z (cid:13)(cid:13) − (cid:8) x H ( F N ⊗ I M ) m e, T z (cid:9) + x H ( F N ⊗ I M ) (cid:0) F H N ⊗ I M (cid:1) x (cid:17)! ∝ exp  (cid:16) C a, DD z (cid:17) (cid:0) (cid:8) x H ( F N ⊗ I M ) m e, T z (cid:9) − x H x (cid:1) ∝ exp  (cid:16) C a, DD z (cid:17) (cid:0) (cid:8) x H m a, DD x (cid:9) − x H x (cid:1) . (31)In particular, the probability factorization in the form of (31) is referred to as the Ungerboeckobservation model, which was ﬁrstly introduced by Ungerboeck in [18]. Both Forney and Ungerboeckobservation models have been widely used for data detection, and the OTFS detection based onForney and Ungerboeck observation models has also been considered in the previous works [8],[19]. It should be noted that, since both m a, DD x and x are vectors, (31) can be equivalently expressedin a symbol-by-symbol fashion, such as Pr (cid:0) m e, T z | x (cid:1) = MN − Y k =0 Pr (cid:0) m e, T z | x [ k ] (cid:1) , (32)where Pr (cid:0) m e, T z | x [ k ] (cid:1) = Pr (cid:0) m a, DD x [ k ] | x [ k ] (cid:1) ∝ exp (cid:16) C a, DD z [ k, k ] (cid:0) (cid:8) x ∗ [ k ] m a, DD x [ k ] (cid:9) − | x [ k ] | (cid:1)(cid:17) . (33)It can be observed from (33) that, the optimal MLSE detection for DD domain symbols can becarried out in a simple symbol-by-symbol form, and the corresponding inputs for detection are theDD domain symbol estimates m a, DD x and the covariance matrix C a, DD z of the time domain OTFSsignal z .With the i.i.d. assumption of x [ k ] , the a posteriori probability Pr (cid:0) x [ k ] (cid:12)(cid:12) m e, T z (cid:1) is essentially thesame as that of Pr (cid:0) m e, T z | x [ k ] (cid:1) , i.e., Pr (cid:0) x [ k ] (cid:12)(cid:12) m e, T z (cid:1) ∝ Pr (cid:0) m e, T z | x [ k ] (cid:1) . Therefore, we can obtainthe a posteriori mean m p, DD x [ k ] of x [ k ] by m p, DD x [ k ] = E (cid:2) x [ k ] | m e, T z (cid:3) = X ( A ) X i =1 Pr (cid:0) x [ k ] = A [ i ] (cid:12)(cid:12) m e, T z (cid:1) × A [ i ] , (34) January 12, 2021 DRAFT4 where A [ i ] is the i -th DD domain constellation point. Meanwhile, the a posteriori covariance matrix C p, DD x of x is a diagonal matrix due to the i.i.d. assumption, whose k -th element in the main diagonalis the a posteriori variance of x [ k ] and is given by C p, DD x [ k, k ] = E h(cid:12)(cid:12) x [ k ] − E (cid:2) x [ k ] | m e, T z (cid:3)(cid:12)(cid:12) i = X ( A ) X i =1 Pr (cid:0) x [ k ] = A [ i ] (cid:12)(cid:12) m e, T z (cid:1) × | A [ i ] | − (cid:12)(cid:12) m p, DD x [ k ] (cid:12)(cid:12) . (35)Based on the a priori and a posteriori information, the extrinsic information of the DD domaindetection is ready to be computed. However, we notice that the DD domain detection with (33) is acomponent-wise operation, and therefore in principle, the DD domain detection cannot provide anyextrinsic information. The following Proposition clariﬁes this problem. Proposition 1 (Extrinsic information from the DD domain detection) : The DD domain detectionwith (33) is a component-wise operation and thus it cannot provide any extrinsic information.

Proof : The proof is given in Appendix B.According to Proposition 1, we know that directly computing the extrinsic information basedon the DD domain detection is not a good choice. Therefore, we ﬁrstly convert the a posteriori probability of the DD domain symbols x to the a posteriori probability of time domain signals z and then compute the extrinsic information. In speciﬁc, the a posteriori mean m p, DD x and covariancematrix C p, DD x will be served as the outputs of module B. The details of how to compute the extrinsicinformation based on the outputs of module B will be discussed in the coming subsection. Withthe above discussion, we summarize the symbol-by-symbol detection for DD domain symbols inAlgorithm 2. D. Cross Domain Message Passing: from DD Domain to Time Domain

Based on the description in the previous subsection, we will discuss the message passing fromDD domain to time domain. According to (12), m p, DD x and C p, DD x are converted to the a posteriori mean m p, DD z and covariance matrix C p, DD z of the time domain OTFS signal z by m p, DD z = (cid:0) F H N ⊗ I M (cid:1) m p, DD x , (36) C p, DD z = (cid:0) F H N ⊗ I M (cid:1) C p, DD x ( F N ⊗ I M ) . (37)Again, we note that C p, DD z can be a dense matrix if the diagonal entries of C p, DD x are not of thesame value. In this case, we can discard the non-diagonal entries (treated as zeros), according to the January 12, 2021 DRAFT5

Algorithm 2

Symbol-by-symbol Detection for DD Domain Symbols

Input: m a, DD x , C a, T z , and A Steps: for k from 0 to M N − do Calculate the a posteriori probability of x k by (33). Make hard decision on x k , which is denoted as ˆ x [ k ] . Compute the a posteriori mean m p, DD x [ k ] of x k by (34). Compute the a posteriori variance of x k by (35). end for Compute m p, DD x of x , based on m p, DD x [ k ] , ≤ k ≤ M N − . Compute C p, DD x of x , based on C p, DD x [ k, k ] , ≤ k ≤ M N − . return m p, DD x , C p, DD x , and ˆx .i.i.d. assumption [12]. Similar to (25) and (26) , we can compute the extrinsic information of z interms of the mean and covariance matrix, which are given by C e, DD z = (cid:16)(cid:0) C p, DD z (cid:1) − − (cid:0) C a, DD z (cid:1) − (cid:17) − , (38) m e, DD z = C e, DD z (cid:16)(cid:0) C p, DD z (cid:1) − m p, DD z − (cid:0) C a, DD z (cid:1) − m e, T z (cid:17) . (39)Finally, the extrinsic information of z is fed back to module A as the a priori mean and covariancematrix for the next iteration, i.e., m a, T z = m e, DD z and C a, T z = C e, DD z .Based on the above discussion, we summarize the proposed OTFS detection in Algorithm 3,where the term L max is referred to as the maximum number of iterations. For notational brevity, wedrop the iteration index l of the corresponding matrices in Algorithm 3. So far, we have introducedthe proposed cross domain iterative detection. In the following section, we will investigate its errorperformance and the computation complexity.IV. P ERFORMANCE A NALYSIS

In this section, we will investigate the asymptotical error performance of the proposed detectionalgorithm with

M N → ∞ and its detection complexity. Since the proposed algorithm involvesseveral iterations between two modules, we characterize the error performance by the recursion oftwo states corresponding to each module. Meanwhile, we will also discuss the detection complexitycorresponding to each module.

January 12, 2021 DRAFT6

Algorithm 3

Cross Domain Iterative Detection for OTFS Modulation

Input: r , H eﬀT , L max , and A Initialization:

Set m a, T z [ k ] = 0 , for ≤ k ≤ M N − and C a, T z = I MN . Steps: for l from 1 to L max do Perform the L-MMSE estimation for time domain OTFS signal according to Algorithm 1. Compute C a, DD z based on (25) and m a, DD x based on (26). Perform symbol-by-symbol detection for DD domain symbols according to Algorithm 2. Compute m p, DD z and C p, DD z based on (36) and (37). Compute C a, T z and m a, T z based on (38) to (39). end for return ˆx . A. MSE Performance Analysis via State Evolution

Without loss of generality, we ﬁrst investigate the MSE performance with a given time domaineffective channel H eﬀT and the convergence behaviour of the proposed algorithm. To this end, wecan investigate the average a priori variance of the inputs to module A and module B during eachiteration. In particular, by noticing the i.i.d. assumption of both the DD domain OTFS symbols andthe time domain OTFS signal, we deﬁne the two states for module A and module B at the l -thiteration by v a, T z ( l ) ∆ = E (cid:2) C e, DD z [ k, k ] (cid:3) = lim MN →∞ M N Tr (cid:0) C e, DD z (cid:1) , (40) v a, DD z ( l ) ∆ = E (cid:2) C e, T z [ k, k ] (cid:3) = lim MN →∞ M N Tr (cid:0) C e, T z (cid:1) , (41)where the expectation is with respect to the index k . In speciﬁc, these two states can be viewed asthe average MSE of the inputs to module A and module B at the l -th iteration, respectively. Fornotational brevity, we further deﬁne the ratios between the OTFS signal energy and the average apriori variance of the inputs to each module, i.e., the effective SNR for the time domain and DDdomain, by η T ( l ) ∆ = 1 v a, T z ( l ) , (42) η DD ( l ) ∆ = 1 v a, DD z ( l ) . (43) January 12, 2021 DRAFT7

Now, we focus on the evolution between the two states v a, T z ( l ) and v a, DD z ( l ) . W will investigate thecorresponding average variance of the inputs and outputs for each module. In particular, we considerthe following assumption. Assumption 1 : For the l -th iteration, the main diagonal entries of a priori covariance matrices C a, T z and C a, DD z are of the same value as v a, T z ( l ) and v a, DD z ( l ) , respectively.It should be noted that the above assumption is reasonable with a sufﬁciently large number of M N , due to the strong law of large numbers. With this assumption, we will discuss the connectionbetween the two states based on Algorithm 3. In speciﬁc, we can ﬁrst derive the a posteriori covariance matrix C p, T z according to (24) and further derive the extrinsic covariance matrix C e, T z according to (25), such as v a, DD z ( l ) = 1 v p, T z ( l ) − v a, T z ( l ) , (44)where v p, T z ( l )= v a, T z ( l ) − (cid:0) v a, T z ( l ) (cid:1) M N Tr (cid:18)(cid:0) H eﬀT (cid:1) H (cid:16) v a, T z ( l ) H eﬀT (cid:0) H eﬀT (cid:1) H + N I MN (cid:17) − H eﬀT (cid:19) . (45)The above equations demonstrate the connection between the state v a, T z ( l ) and v a, DD z ( l ) at the l -thiteration. In the following, we will consider the update of the state v a, T z ( l + 1) based on the state v a, DD z ( l ) . Let us deﬁne the MSE of the DD domain symbol detection, given an AWGN observationwith an SNR η by [12] M SE ( η ) = E (cid:2) | x − E [ x | x + ξ ] | (cid:3) , (46)where x is an arbitrary DD domain OTFS symbol and ξ is an AWGN sample with zero mean andvariance /η . According to the law of large numbers, it can be shown that the average of the maindiagonal entries v p, DD x ( l ) of the a posteriori covariance matrix C p, DD x satisﬁes [12] v p, DD x ( l ) ∆ = E (cid:2) C p, DD z [ k, k ] (cid:3) = lim MN →∞ M N Tr (cid:0) C p, DD x (cid:1) = M SE ( η DD ( l )) . (47)Since the extrinsic information is calculated in the time domain, we need to convert the DD domain a posteriori covariance matrix C p, DD x to the time domain a posteriori covariance matrix C p, DD z ,according to (37). Denote by v p, DD z ( l ) the average of the main diagonal entries of C p, DD z . Accordingto the law of large numbers, we can show that v p, DD z ( l ) = v p, DD x ( l ) , due to the unitary transformationbetween the DD domain and the time domain. Thus, according to (38), we have v a, T z ( l + 1) = 1 v pz , DD( l ) − v a, DD z ( l ) . (48) January 12, 2021 DRAFT8

Based on (48), the state evolution from the state v a, DD z ( l ) to the state v a, T z ( l + 1) is now explicit.We notice the above state evolution requires the calculation of MSE (46). In order to calculatethe MSE, we need to compute the a posteriori mean of x . However, the calculation of the aposteriori mean is in general a nonlinear function of the observation x + ξ with respect to thespeciﬁc constellation shape, unless x is Gaussian distributed [20]. Therefore, in order to obtainsome general conclusions regarding the MSE characteristics, we consider a Monte Carlo approachto calculate the MSE. In particular, by considering a sufﬁciently large value of M N , we produce m a, DD x by using the Monte Carlo approach with a given constellation set A according to (28), wherethe variance of ˆw is set to be /η DD ( l ) due to the law of large numbers. Therefore, based on thegenerated m a, DD x , we can obtain the MSE value based on (46).According to the above analysis, we notice that there exists a ﬁxed point in the state evolution,where the overall MSE performance of the proposed algorithm does not change anymore with theincrease of number of iterations l , i.e., the algorithm is converged. In particular, we consider theconverged MSE performance and denote the corresponding average a posteriori variance with respectto the time domain estimates and to the DD domain detection outputs by v p, T z and v p, DD x . Then, wehave the following Theorem. Theorem 1 (Fixed point of state evolution) : When the algorithm is converged, the average aposteriori variance with respect to the time domain estimates and the DD domain detection outputsshare the same value, i.e., v p, T z = v p, DD x . (49) Proof : Note that the values of the states v a, T z ( l ) and v a, DD z ( l ) will not change with the increase ofthe iteration number if the algorithm is converged. Therefore, we combine (44) and (48) and dropthe iteration index l , yielding v p, T z − v a, T z = 1 v p, DD z − v a, DD z . (50)After some straightforward manipulations, we can obtain (49). This completes the proof of Theorem1. (cid:4) Theorem 1 indicates that the proposed algorithm can converge and in the convergence, both timedomain estimation and DD domain detection can provide the same accuracy regarding the datarecovery. Other than the convergence behavior of the proposed algorithm, it is also important toderive the corresponding error performance, when the proposed algorithm is converged. According

January 12, 2021 DRAFT9 to Theorem 1, we can evaluate this in either time domain or DD domain, when the proposed algorithmis converged. This issue will be discussed in detail in the coming subsection. In the following, weinvestigate a special case where the DD domain symbols are assumed to be Gaussian distributed.We note that such a case is not practically important but it can provide some interesting insights forthe analysis of the proposed algorithm. In particular, we have the following Proposition.

Proposition 2 (Detection performance with Gaussian constellation in DD domain) : For the casewhere the DD domain symbols are Gaussian distributed, the DD domain detection cannot provideany error performance improvement, i.e., v p, DD z ( l ) = v a, DD z ( l ) . Proof : With the Gaussian assumption, we have v p, DD z ( l ) = M SE ( η ) = E (cid:2) | x − E [ x | x + ξ ] | (cid:3) = E (cid:2) | ξ | (cid:3) = 1 η = v a, DD z ( l ) . (51)The completes the proof of Proposition 2. (cid:4) Proposition 2 suggests that, if the DD domain constellation is Gaussian distributed, iterativelyupdating the extrinsic information between the time domain and DD domain will not introduceany error performance gain. In other words, the error performance improvement is due to the non-Gaussian constellation constraint in the DD domain. Intuitively speaking, the DD domain detectioncan be viewed as a de-noising operation. If the constellation is Gaussian distributed, the ML detectionwill give the detection output as x + ξ because E [ x | x + ξ ] = x + ξ always holds, thus it cannotcorrect any error induced by the noise. On the other hand, when the DD domain constellation is notGaussian distributed, applying iterations crossing time and DD domain can potentially improve theerror performance. This is because the time domain estimation in module A assumes that the z isa Gaussian vector due to the spreading effect of ISFFT, which does not take advantage of the DDdomain constellation constraint. Therefore, by performing DD domain symbol detection, the DDdomain constellation constraint is exploited by the proposed algorithm, which can lead to a potentialerror performance improvement.Fig. 4 shows the time domain MSE performance of the proposed algorithm, where N = 32 and M = 64 , respectively. Without loss of generality, the time domain effective channel H eﬀT isgenerated according to (13), where P = 4 and the channel coefﬁcients are [ − .

27 + 0 . i, .

17 +0 . i, . − . i, . − . i ] , the delay indices are [0 , , , , and the Doppler indices are [4 . , − . , . , − . , respectively. In speciﬁc, we consider two constellation mappings at differ-ent SNRs, including the quadrature phase shift keying (QPSK) at E s /N = 12 dB and 16-quadratureamplitude modulation (16-QAM) at E s /N = 17 dB and show state v a, T z ( l ) and the corresponding January 12, 2021 DRAFT0

Number of iterations -5 -4 -3 -2 -1 M SE QPSK, E s /N = 12 dB, SimulationQPSK, E s /N = 12 dB, Evolution16-QAM, E s /N = 17 dB, Simulation16-QAM, E s /N = 17 dB, Evolution Fig. 4. Time domain MSE performance for OTFS modulation with P = 4 , where the frame contains DD domain QPSK or16-QAM symbols. In speciﬁc, the SNR for the QPSK case is E s /N = 12 dB, while that for the 16-QAM case is E s /N = 17 dB. MSE values in Fig. 4. As observed from the ﬁgure, with an increased number of iterations, the MSEperformance of the proposed algorithm ﬁrst decreases and then saturates at MSEs around . × − for the QPSK case and around . × − for the 16-QAM case. Meanwhile, the derived stateevolution shows a close match to the actual MSE performance. This observation indicates that thederived state evolution is consistent with the simulation results.So far we have investigated the MSE performance of the proposed algorithm via state evolution.In the following subsection, we will focus on the error performance analysis when the proposedalgorithm is converged. B. Analysis of Effective DD Domain SNR

In this subsection, we will investigate the converged error performance. In particular, we areinterested in the effective DD domain SNR η DD ( l ) , because the DD domain detection is a simplecomponent-wise operation, where η DD ( l ) determines the BER performance for a given constellation.Let us deﬁne the second-order time domain OTFS channel matrix G eﬀT ∆ = H eﬀT (cid:0) H eﬀT (cid:1) H . Noticing that G eﬀT is a Hermitian matrix by its deﬁnition, there exists a unitary matrix U such that G eﬀT = UΛU H ,where Λ is a diagonal matrix whose ( k, k ) -th element is the k -th eigenvalue λ k of G eﬀT . Thus, wecan rewrite the a posteriori variance v p, T z ( l ) in (45) according to the eigenvalues λ k as shown inthe following Lemma. January 12, 2021 DRAFT1

Lemma 1 (Time domain a posteriori variance) : The time domain a posteriori variance v p, T z ( l ) in (45) can be simpliﬁed by v p, T z ( l ) = v a, T z ( l ) − v a, T z ( l ) M N MN X k =1 v a, T z ( l ) λ k v a, T z ( l ) λ k + N . (52) Proof : The proof is given in Appendix C.In order to obtain some important insights of the effective DD domain SNR η DD ( l ) , we need toinvestigate the property of the eigenvalues λ k . For the ease of further derivation, let us consider thefollowing assumption. Assumption 2 : The delay index associated to each resolvable path is different to each other, i.e., l i = l j , ∀ i = j, ≤ i, j ≤ P .It should be noted that the value of the delay index depends on the speciﬁc reﬂectors correspondingto each resolvable path. Furthermore, in the case where the maximum delay index l max is much largerthan P , it is unlikely to have a channel realization where different paths share the same delay index.However, later we will show that even without the above assumption, our following derivation canstill provide an accurate estimate for the effective DD domain SNR η DD ( l ) . With Assumption 2 inhand, we can derive the following Lemma for G eﬀT . Lemma 2 (Main diagonal element of G eﬀT ) : Under Assumption 2, the main diagonal elements of G eﬀT are of the same value k h k , where h = [ h , h , ..., h P ] T is the path gain vector. Proof : The Lemma can be straightforwardly derived by noticing that Π l i ∆ k i + κ i ∆ − k i − κ i Π − l i = I MN . (cid:4) According to Lemma 1 and Lemma 2, we can now derive the lower-bound of the DD domain apriori variance v a, DD z ( l ) . The corresponding results are summarized in the following Theorem. Theorem 2 (Lower-bound of v a, DD z ( l ) ) : Under Assumption 2, the DD domain a priori variance v a, DD z ( l ) is lower-bounded by N k h k , where the lower bound becomes tighter if the time domain apriori variance v a, T z ( l ) tends to zero and the lower bound is achieved when v a, T z ( l ) = 0 . Proof : The proof is given in Appendix D.Immediately, we can derive the upper-bound of η DD ( l ) based on Theorem 2. Corollary 1 (Upper-bound of η DD ( l ) ) : Under Assumption 2, the DD domain effective SNR η DD ( l ) is upper-bounded by k h k N , where the upper bound becomes tighter with an increased number ofiterations. Proof : The proof can be straightforwardly derived from Theorem 2, by noticing that η DD ( l ) = v a, DD z ( l ) ≤ k h k N . (cid:4) January 12, 2021 DRAFT2

Number of iterations E ff e c t i v e DD do m a i n S NR ( d B ) QPSK, E s /N = 14 dB, SimulationQPSK, E s /N = 14 dB, Evolution16-QAM, E s /N = 17 dB, Simulation16-QAM, E s /N = 17 dB, EvolutionSNR bound, E s /N = 14 dBSNR bound, E s /N = 17 dB Fig. 5. Effective DD domain SNRs for OTFS modulation with different number of iterations, where the frame contains

DDdomain QPSK or 16-QAM symbols. In speciﬁc, the SNR for the QPSK case is E s /N = 14 dB, while that for the 16-QAM caseis E s /N = 17 dB. The considered wireless channel contains P = 4 paths with different delay indices for each path. It is interesting to see from Theorem 2 and Corollary 1 that the effective DD domain SNRof the proposed algorithm can theoretically approach the maximum receiver SNR for a givenfading channel [13], with a sufﬁcient number of iterations. Equivalently, this observation indicatesthat the proposed algorithm can approach the error performance of MLSE theoretically given asufﬁcient number of iterations. It should be noted that the MLSE can provide the optimal ML errorperformance, but usually requires a prohibitively high complexity [13]. Therefore, the proposedalgorithm can be viewed as a type of reduced-complexity detection algorithm that can potentiallyapproach the optimal error performance. On the other hand, we note that the above analysis is basedon Assumption 2. For the case where different resolvable paths share the same delay index, we willdemonstrate that the effective DD domain SNR also follows Corollary 1 by numerical simulations.Fig. 5 shows the effective DD domain SNRs with respect to number of iterations. Without lossof generality, the time domain effective channel H eﬀT is generated according to (13), where P =4 and the channel coefﬁcients are [ − . − . i, . − . i, − .

43 + 0 . i, .

59 + 0 . i ] , thedelay indices are [0 , , , , and the Doppler indices are [ − . , . , − . , − . , respectively.Similarly, we consider both QPSK and 16-QAM constellations with E s /N = 14 dB and E s /N = 17 dB, respectively. Meanwhile, we also plot the SNR derived based on the state evolution and thecorresponding SNR upper bound, i.e., k h k N . As observed from the ﬁgure, with an increased number January 12, 2021 DRAFT3

Number of iterations E ff e c t i v e DD do m a i n S NR ( d B ) QPSK, E s /N = 14 dB, SimulationQPSK, E s /N = 14 dB, Evolution16-QAM, E s /N = 17 dB, Simulation16-QAM, E s /N = 17 dB, EvolutionSNR bound, E s /N = 14 dBSNR bound, E s /N = 17 dB Fig. 6. Effective DD domain SNRs for OTFS modulation with different number of iterations, where the frame contains

DDdomain QPSK or 16-QAM symbols. In speciﬁc, the SNR for the QPSK case is E s /N = 14 dB, while that for the 16-QAM caseis E s /N = 17 dB. The considered wireless channel contains P = 4 paths and the ﬁrst two paths share the same delay index. of iterations, the effective DD domain SNR increases. In speciﬁc, the derived SNR based on thestate evolution shows a close match to the actual SNR performance based on the simulation. Moreimportantly, the derived SNR upper bound agrees with the simulation results and state evolution,and the bound becomes tighter as the number of iteration increases, which is consistent with theabove analysis.To have a fair comparison, we show the effective DD domain SNRs with respect to the numberof iterations with a speciﬁc channel realization in Fig. 6, where different resolvable paths share thesame delay index. In speciﬁc, the channel contains P = 4 paths and the channel coefﬁcients are [ − .

21 + 0 . i, .

17 + 0 . i, . − . i, . − . i ] , the delay indices are [0 , , , , and theDoppler indices are [ − . , . , − . , − . , respectively. We observe that both QPSK case and16-QAM case have similar SNR performance to the previous ﬁgure. This observation indicates thatthe above analysis is also valid when different resolvable paths share the same delay index.We have provided the analysis of the effective DD domain SNR of the proposed algorithm inthe convergence. In the following subsection, we will discuss the detection complexity in order todemonstrate the advantage of the proposed algorithm. January 12, 2021 DRAFT4

C. Analysis of Detection Complexity

We ﬁrst consider the computational complexity of module A. It is obvious that the most com-putation relates to the matrix inverse in (22). Generally, the computation complexity order of thematrix inversion is O (cid:0) ( M N ) (cid:1) . However, this complexity can be further reduced by consideringthe speciﬁc sparse structure of the time domain effective channel H eﬀT [21]. As for module B,it is obvious that the computational complexity is of order O ( M N ) because of the component-wise operation. On the other hand, the computational complexity for the domain transformationcan be low by considering the special structure of the corresponding kernels. In particular, theKronecker products F N ⊗ I M and F H N ⊗ I M in (27) and (36) can be efﬁciently calculated based onFFT and IFFT and the corresponding computation complexity will be O ( M N log N ) . Therefore,we can calculate the total detection complexity of the proposed algorithm per iteration, whichis given by O (cid:0) ( M N ) + 2 M N log N + M N (cid:1) . It should be noted that the detection complexitydoes not increase in the presence of the fractional Doppler indices. In comparison of the proposedalgorithm, the detection complexity of the optimal MLSE detection is exponential to the numberof non-zero elements per row/column of the corresponding channel matrix, which can be as highas O (cid:16) X ( A ) MN (cid:17) when the fractional Doppler shift exists. Therefore, the proposed algorithm cansigniﬁcantly reduce the detection complexity compared to the optimal MLSE detection, especiallyin the presence of fractional Doppler shifts.V. N UMERICAL R ESULTS

In this section, we will evaluate the BER performance of the proposed algorithm. We consider theaverage BER performance with a sufﬁcient number of realizations of the channel. Speciﬁcally, thecorresponding channel matrix is generated according to (13), the channel coefﬁcients are randomlygenerated based on a uniform power delay proﬁle, and the delay and Doppler indices are randomlygenerated within the range of [0 , l max ] and [ − k max , k max ] , where l max = 10 and k max = 5 . We notethat, as mentioned before, the delay index can only be an integer number, while the Doppler indexcan be a fractional number [8]. Without loss of generality, we consider the QPSK modulated OTFSsystem with different number of paths. We also provide other detection methods for comparison thatinclude the MMSE detection based on the DD domain effective channel H eﬀDD , DD domain detectionbased on the sum-product algorithm (SPA) [22], and the DD domain message passing algorithmin [8]. The considered SPA detection is derived based on the graphical model corresponding to theDD domain effective channel, whose computational complexity can be O (cid:16) X ( A ) MN (cid:17) in the case of January 12, 2021 DRAFT5 E s /N (dB) -6 -5 -4 -3 -2 -1 BE R Fractional+DD domain MMSEFractional+proposed 1 iterFractional+proposed 2 iterFractional+proposed 5 iterInteger+SPA detection

Fig. 7. BER performance for OTFS modulation with P = 4 , where the error performance of the proposed algorithm is comparedwith that of DD domain MMSE detection and MLSE detection with integer Doppler shifts. complex fractional Doppler shifts [22]. In particular, the considered SPA detection can theoreticallyapproach the error performance of the optimal MLSE detection and achieve the same performancewhen the graphical model does not contain any cycle [22]. However, since the DD domain SPAdetection requires a very high detection complexity in the fractional Doppler case, we only considerthe integer Doppler case for simplicity.Fig. 7 shows the BER performance with P = 4 . As observed from the ﬁgure, the error performanceof the proposed algorithm with one iteration is almost the same as that with the MMSE detection.This observation indicates that applying the same detection method for OTFS modulation overdifferent domains can result in the similar error performance. Furthermore, with the increased numberof iterations, the proposed algorithm outperforms the MMSE detection and almost achieves theperformance of SPA detection with only integer Doppler indices. In speciﬁc, with BER below × − , the proposed algorithm with iterations has only a marginal performance gap (around . dB) compared to the SPA detection. This observation clearly substantiates our theoretical analysisin the previous section.Fig. 8 shows the BER performance with P = 10 . Under such a complex channel condition, theDD domain effective channel H eﬀDD can be very dense and conventional detection methods based on H eﬀDD may not have a good error performance. As shown in the ﬁgure, the proposed algorithm January 12, 2021 DRAFT6 E s /N (dB) -6 -5 -4 -3 -2 -1 BE R Fractional+DD domain MMSEFractional+Message passing algorithmFractional+proposed 1 iterFractional+proposed 2 iterFractional+proposed 3 iterFractional+proposed 5 iter

Fig. 8. BER performance for OTFS modulation with fractional Doppler shifts, where P = 10 . with one iteration performs almost the same with the DD domain MMSE detection, which isconsistent with the observation in the previous ﬁgure. Meanwhile, the DD domain message passingalgorithm [8] also shows a similar error performance to both the DD domain MMSE detection andthe proposed algorithm with one iteration. However, with the increased number of iterations, theproposed algorithm signiﬁcantly outperforms the MMSE detection and message passing algorithm.Speciﬁcally, at BER ≈ × − , with only iterations, the error performance of the proposedalgorithm shows an around . dB gain compared to that of the MMSE detection and the gain to themessage passing algorithm is even more. Furthermore, with iterations, the SNR gain to the MMSEdetection performance is increased to around . dB. This observation shows the advantage of theproposed algorithm over conventional detection algorithms, which also agrees with our previoustheoretical analysis. VI. C ONCLUSION

In this paper, we proposed a novel cross domain iterative detection for OTFS modulation. Wederived the state evolution for the proposed algorithm and investigated its detection performanceincluding both the MMSE performance and the DD domain effective SNR. In particular, we showthat the proposed algorithm can approach the error performance of MLSE detection even in thepresence of complex fractional Doppler shifts, but only requires a much lower detection complexity.Our analytical results are explicitly veriﬁed by simulation results, where a signiﬁcant performance

January 12, 2021 DRAFT7 improvement can be observed compared to the conventional detection algorithms for channels withfractional Doppler shifts. The cross domain signal processing may be a new research direction forOTFS modulation and general multicarrier modulation schemes. Our future work will investigatethe cross domain channel estimation schemes and cross domain precoding schemes.A

PPENDIX AJ USTIFICATION OF P ROBLEM F ORMULATION (28)Note that the DD domain detection is carried out based on the estimates of time domain signal.As a common approach, we model the extrinsic mean of z by m e, T z = z + ˆw , (53)where ˆw is a white Gaussian noise sample vector with zero mean and a diagonal covariance matrix C a, DD z = C e, T z . In particular, the term ˆw stands for the inaccuracy of the MMSE estimation in thetime domain. In order to carry out the DD domain symbol detection, a straightforward way is to ﬁrstconvert the extrinsic information m e, T z of the time domain signal z to DD domain and then compareit with the DD domain symbol constellation. In speciﬁc, we can directly apply the transformation F N ⊗ I M onto (53) and obtain the corresponding DD domain symbol estimates, i.e., m a, DD x = ( F N ⊗ I M ) m e, T z = x + ( F N ⊗ I M ) ˆw , (54)where the term ( F N ⊗ I M ) ˆw is the equivalent noise in the DD domain and its covariance matrix isdenoted by C a, DD x . Recalling (12), we have C a, DD x = ( F N ⊗ I M ) C a, DD z (cid:0) F H N ⊗ I M (cid:1) . (55)It should be noted that in the asymptotical regime, i.e., M N tends to inﬁnity, the diagonal entriesof the diagonal matrix C a, DD z tend to be of the same value owing to the law of large numbers, i.e., C a, DD z ∝ I MN . In this case, we have C a, DD x = C a, DD z and the DD domain symbol detection canbe done in a straightforward symbol-by-symbol fashion. However, in practice, the diagonal entriesof C a, DD z may not be of the same value due to the speciﬁc noise values. In this case, C a, DD x canbe non-diagonal and dense, which not only conﬂicts with the i.i.d. assumption of the entries in x ,but also potentially undermines the detection performance. In contrast to the above solution, we canconsider the DD domain detection problem formulation in (28). As can be noticed from Section III-C,the proposed DD domain detection algorithm based on (28) bypasses the problem that the diagonalentries of C a, DD z are of different values and can still be efﬁciently carried out in a symbol-by-symbolfashion. January 12, 2021 DRAFT8 A PPENDIX BP ROOF OF P ROPOSITION m e, DD x [ k ] by excluding the contribution of x [ k ] in (34). By noticing that m p, DD x [ k ] = E (cid:2) x [ k ] | m e, T z (cid:3) = E (cid:2) x [ k ] | m a, DD x (cid:3) , we have m e, DD x [ k ] = E (cid:2) x [ k ] | m a, DD x /m a, DD x [ k ] (cid:3) , (56)where m a, DD x /m a, DD x [ k ] denotes m e, DD x excluding the entry of m a, DD x [ k ] . Furthermore, it can benoticed that the a posteriori probability of x [ k ] only relates to m a, DD x [ k ] instead of other entriesin m e, DD x . Therefore, it can be shown that m e, DD x [ k ] = 0 [12]. Similarly, it can be shown that DDdomain detection cannot provide any extrinsic information in terms of the covariance matrix either,due to the component-wise operation [12]. This completes the proof of Proposition 1. (cid:4) A PPENDIX CP ROOF OF L EMMA v p, T z ( l )= v a, T z ( l ) − (cid:0) v a, T z ( l ) (cid:1) M N Tr (cid:18)(cid:0) H eﬀT (cid:1) H (cid:16) v a, T z ( l ) H eﬀT (cid:0) H eﬀT (cid:1) H + N I MN (cid:17) − H eﬀT (cid:19) = v a, T z ( l ) − (cid:0) v a, T z ( l ) (cid:1) M N Tr (cid:16)(cid:0) v a, T z ( l ) G eﬀT + N I MN (cid:1) − G eﬀT (cid:17) . (57)Furthermore, by considering G eﬀT = UΛU H , (57) can be further simpliﬁed by v p, T z ( l ) = v a, T z ( l ) − (cid:0) v a, T z ( l ) (cid:1) M N Tr (cid:16)(cid:0) v a, T z ( l ) UΛU H + N I MN (cid:1) − UΛU H (cid:17) = v a, T z ( l ) − (cid:0) v a, T z ( l ) (cid:1) M N Tr (cid:16) U (cid:0) v a, T z ( l ) Λ + N I MN (cid:1) − ΛU H (cid:17) = v a, T z ( l ) − v a, T z ( l ) M N MN X k =1 v a, T z ( l ) λ k v a, T z ( l ) λ k + N . (58)This completes the proof of Lemma 2. (cid:4) January 12, 2021 DRAFT9 A PPENDIX DP ROOF OF T HEOREM f ( λ ) = vλvλ + N , whose second-order derivative with respect to λ is of the form f ′′ ( λ ) = − v N ( vλ + N ) .It can be shown that with v and N strictly above zero, the above function is a concave function.Furthermore, with v close to zero, f ′′ ( λ ) tends to be a zero, which indicates that f ( λ ) tends to bea linear function. Therefore, according to Jensen’s inequality, it is obvious that M N MN X k =1 f ( λ k ) ≤ f M N MN X k =1 λ k ! = vMN MN P k =1 λ kvMN MN P k =1 λ k + N , (59)where the bound becomes tighter with decreasing v and the equality is achieved when v = 0 .Therefore, we have v p, T z ( l ) = v a, T z ( l ) − v a, T z ( l ) M N MN X k =1 v a, T z ( l ) λ k v a, T z ( l ) λ k + N ≥ v a, T z ( l ) − (cid:0) v a, T z ( l ) (cid:1) MN MN P k =1 λ k v a, T z ( l ) MN MN P k =1 λ k + N , (60)where the lower bound becomes tighter with the decrease of v a, T z ( l ) and the equality is achievedwhen v a, T z ( l ) becomes zero. Notice that MN P k =1 λ k = Tr (cid:0) G eﬀT (cid:1) . Thus, by considering Lemma 2, (60)becomes v p, T z ( l ) ≥ v a, T z ( l ) − (cid:0) v a, T z ( l ) (cid:1) k h k v a, T z ( l ) k h k + N . (61)By substituting (61) into (44), after some manipulations, we arrive at v a, DD z ( l ) = 1 v p, T z ( l ) − v a, T z ( l ) ≥ v a, T z ( l ) − va, T z ( l ) k h k va, T z ( l ) k h k N − v a, T z ( l ) = v a, T z ( l ) − ( v a, T z ( l ) ) k h k v a, T z ( l ) k h k + N v a, T z ( l ) k h k v a, T z ( l ) k h k + N = N k h k . (62)This completes the proof of Theorem 2. (cid:4) January 12, 2021 DRAFT0 R EFERENCES [1] R. Hadani, S. Rakib, M. Tsatsanis, A. Monk, A. J. Goldsmith, A. F. Molisch, and R. Calderbank, “Orthogonal time frequencyspace modulation,” in

Proc. 2017 IEEE Wireless Commun. Net. Conf. , 2017, pp. 1–6.[2] G. Meyer and S. Beiker,

Road Vehicle Automation . Springer International Publishing, 2019.[3] Y. Cai, Z. Wei, R. Li, D. W. K. Ng, and J. Yuan, “Joint trajectory and resource allocation design for energy-efﬁcient secureUAV communication systems,”

IEEE Trans. Commun. , vol. 68, no. 7, pp. 4536–4553, Mar. 2020.[4] P. Raviteja, Y. Hong, E. Viterbo, and E. Biglieri, “Effective diversity of OTFS modulation,”

IEEE Wireless Commun. Lett. ,vol. 9, no. 2, pp. 249–253, Feb. 2020.[5] S. Li, J. Yuan, W. Yuan, Z. Wei, B. Bai, and D. W. K. Ng, “Performance analysis of coded OTFS systems over high-mobilitychannels,” arXiv preprint arXiv:2010.13008 , 2020.[6] T. Hwang, C. Yang, G. Wu, S. Li, and G. Y. Li, “OFDM and its wireless applications: A survey,”

IEEE Trans. Veh. Technol. ,vol. 58, no. 4, pp. 1673–1694, May 2008.[7] Z. Wei, W. Yuan, S. Li, J. Yuan, G. Bharatula, R. Hadani, and L. Hanzo, “Orthogonal time-frequency space modulation: Afull-diversity next generation waveform,” arXiv preprint arXiv:2010.03344 , 2020.[8] P. Raviteja, K. T. Phan, Y. Hong, and E. Viterbo, “Interference cancellation and iterative detection for orthogonal time frequencyspace modulation,”

IEEE Trans. Wireless Commun. , vol. 17, no. 10, pp. 6501–6515, Oct. 2018.[9] W. Yuan, Z. Wei, J. Yuan, and D. W. K. Ng, “A simple variational Bayes detector for orthogonal time frequency space (OTFS)modulation,”

IEEE Trans. Veh. Technol. , vol. 69, no. 7, pp. 7976–7980, Apr. 2020.[10] Z. Yuan, F. Liu, W. Yuan, Q. Guo, Z. Wang, and J. Yuan, “Iterative detection for orthogonal time frequency space modulationusing approximate message passing with unitary transformation,” arXiv preprint arXiv:2008.06688 , 2020.[11] J. Ma and L. Ping, “Orthogonal AMP,”

IEEE Access , vol. 5, pp. 2020–2033, 2017.[12] J. Ma, X. Yuan, and L. Ping, “Turbo compressed sensing with partial DFT sensing matrix,”

IEEE Signal Process. Lett. , vol. 22,no. 2, pp. 158–161, 2015.[13] D. Tse and P. Viswanath,

Fundamentals of Wireless Communication . Cambridge university press, 2005.[14] P. Raviteja, Y. Hong, E. Viterbo, and E. Biglieri, “Practical pulse-shaping waveforms for reduced-cyclic-preﬁx OTFS,”

IEEETrans. Veh. Technol. , vol. 68, no. 1, pp. 957–961, Jan. 2019.[15] Z. Wei, W. Yuan, S. Li, J. Yuan, and D. W. K. Ng, “Transmitter and receiver window designs for orthogonal time frequencyspace modulation,” arXiv preprint arXiv:2010.13005 , 2020.[16] S. M. Kay,

Fundamentals of Statistical Signal Processing . Prentice Hall PTR, 1993.[17] G. Forney, “Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference,”

IEEETrans. Inf. Theory , vol. 18, no. 3, pp. 363–378, May 1972.[18] G. Ungerboeck, “Adaptive maximum-likelihood receiver for carrier-modulated data-transmission systems,”

IEEE Trans. Com-mun. , vol. 22, no. 5, pp. 624–636, May 1974.[19] L. Gaudio, M. Kobayashi, G. Caire, and G. Colavolpe, “On the effectiveness of OTFS for joint radar parameter estimation andcommunication,”

IEEE Trans. Wireless Commun. , vol. 19, no. 9, pp. 5951–5965, Sep. 2020.[20] A. Lozano, A. M. Tulino, and S. Verd´u, “Mercury/waterﬁlling: Optimum power allocation with arbitrary input constellations,”in

IEEE Proc. Int. Sym. Inf. Theory , pp. 1773–1777.[21] A. George and E. Ng, “On the complexity of sparse QR and LU factorization of ﬁnite-element matrices,”

SIAM J. Sci. Statist.Comput. , vol. 9, no. 5, pp. 849–861, 1988.[22] S. Li, W. Yuan, Z. Wei, J. Yuan, B. Bai, D. W. K. Ng, and Y. Xie, “Hybrid MAP and PIC detection for OTFS modulation,” arXiv preprint arXiv:2010.13030 , 2020., 2020.