[PDF] Joint Inter-path and Intra-path Multiplexing for Terahertz Widely-spaced Multi-subarray Hybrid Beamforming Systems

Abstract

Terahertz (THz) communications with multi-GHz bandwidth are envisioned as a key technology for 6G systems. Ultra-massive (UM) MIMO with hybrid beamforming architectures are widely investigated to provide a high array gain to overcome the huge propagation loss. However, most of the existing hybrid beamforming architectures can only utilize the multiplexing offered by the multipath components, i.e., inter-path multiplexing, which is very limited due to the spatially sparse THz channel. In this paper, a widely-spaced multi-subarray (WSMS) hybrid beamforming architecture is proposed, which improves the multiplexing gain by exploiting a new type of intra-path multiplexing provided by the spherical-wave propagation among k widely-spaced subarrays, in addition to the inter-path multiplexing. The resulting multiplexing gain of WSMS architecture is k times of the existing architectures. To harness WSMS hybrid beamforming, a novel design problem is formulated by optimizing the number of subarrays, subarray spacing, and hybrid beamforming matrices to maximize the spectral efficiency, which is decomposed into two subproblems. An optimal closed-form solution is derived for the first hybrid beamforming subproblem, while a dominant-line-of-sight-relaxation algorithm is proposed for the second array configuration subproblem. Extensive simulation results demonstrate that the WSMS architecture and proposed algorithms substantially enhance the spectral efficiency and energy efficiency.

Full PDF

11 Joint Inter-and-intra-multiplexing and HybridBeamforming for Terahertz Widely-spacedMulti-subarray Systems

Longfei Yan, Chong Han,

Member, IEEE , and Jinhong Yuan,

Fellow, IEEE

Abstract

Terahertz (THz) communications with multi-GHz bandwidth are envisioned as a key technologyfor 6G wireless systems. While suffering from huge propagation loss, large arrays of sub-millimeterwavelength antennas can be realized in ultra-massive (UM) MIMO systems to overcome the distancelimitation. However, channel sparsity and low spatial degree-of-freedom of the THz channel limitthe spatial multiplexing gain and hence the achievable rate. In this paper, a widely-spaced multi-subarray (WSMS) hybrid beamforming architecture is proposed for THz UM-MIMO systems to improvethe multiplexing gain. In each subarray, the antennas are critically-spaced to utilize the inter-pathmultiplexing and beamforming gains, while the subarrays are widely-spaced to harvest the novel intra-path multiplexing gain through exploiting the phase differences over the subarrays. By exploring the THzchannel peculiarity, a dominant-LoS-relaxation (DLR) method is proposed to balance the multiplexingand beamforming, and a block-diagonal vectorization-based (BD-VEC) algorithm is developed to solvethe hybrid beamforming problem. Extensive simulation results demonstrate that the multiplexing gain isimproved by a factor of the number of subarrays in the THz WSMS system. The low-complexity DLRmethod maximizes the achievable rate near-optimally. Compared to existing hybrid beamforming algo-rithms, the BD-VEC algorithm achieves higher spectral efﬁciency with substantially lower computationalcomplexity.

Index Terms

This paper was presented in part at the IEEE INFOCOM workshops, Paris, France, April 29-May 2, 2019 [1].Longfei Yan and Chong Han are with the University of Michigan-Shanghai Jiao Tong University Joint Institute, ShanghaiJiao Tong University, Shanghai 200240, China (e-mail: [email protected]; [email protected]).Jinhong Yuan is with the School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney,NSW 2052, Australia (e-mail: [email protected]). a r X i v : . [ c s . I T ] J a n Terahertz band, inter-and-intra-multiplexing, hybrid beamforming.

I. I

NTRODUCTION

With unprecedented multi-GHz bandwidth, the Terahertz (THz) band has drawn increasingattention to support 100+ Gbps wireless data rates [2]. Recently, the Federal CommunicationsCommission (FCC) created a new category of experimental licenses for the use of frequenciesbetween 95 GHz and 3 THz, as preparation for 6G THz wireless systems. Although with ultra-broad bandwidth, the THz band suffers from huge propagation loss, which signiﬁcantly limitsthe wireless communication distance [3]. Thanks to the sub-millimeter wavelength, design of anarray consisting of 512 and even 1024 antennas at transceivers is feasible, which enables THzultra-massive multiple-input multiple-output (UM-MIMO) systems [4], [5]. The multi-antennasystem can generate a high beamforming gain to compensate the path loss and solve the distanceproblem, while supporting multiple data streams to offer a multiplexing gain and further improvethe spectral efﬁciency of the THz communications.Most of the studies of MIMO system assume that the antenna spacing is half of the wavelength,i.e., the antennas are critically-spaced [1], [6]–[9]. Under this critically-spaced MIMO system,the spatial multiplexing gain can be obtained from multipath of the channel, which is referredas inter-path multiplexing [1], [10]. The inter-path multiplexing gain is determined by the rankof the multi-antenna channel, which is upper bounded by the number of multipath. Due to theprohibitively high propagation attenuation and scattering loss, the number of multipath of theTHz channel is very limited, e.g., typically less than 6 [8], [11], [12]. As a result, the inter-path multiplexing gain is unfortunately not enticing, despite of the hundreds of antennas inTHz UM-MIMO systems. Even with high beamforming gain to enhance the received power, thelimited inter-path multiplexing gain strictly suppresses the data rate of the critically-spaced THzUM-MIMO system.An interesting idea is proposed by implementing a widely-spaced antennas array in line-of-sight (LoS) MIMO system, in which intra-path multiplexing could be exploited [10], [13]–[16].This is realized by enlarging the antenna spacing to make the Rayleigh distance of the antennasarray, i.e., S λ , larger than the communication distance, where S and λ denote the array apertureand the wavelength [15], [16]. As a result, the spherical-wave propagation should be consideredin this widely-spaced antennas array. One propagation path from the transmitted array to the ChannelMatrixDigitalPrecoder 𝑁𝑁 𝑠𝑠 … Analog Precoder Analog Combiner … 𝑁𝑁 𝑡𝑡 / 𝑘𝑘 … Widely-spaced Subarray 𝑙𝑙 𝑡𝑡 … DAC

RF Chain

DAC

RF Chain 𝑁𝑁 𝑡𝑡 / 𝑘𝑘 … Widely-spaced Subarray k 𝑙𝑙 𝑡𝑡 … DAC

RF Chain

DAC

RF Chain

Widely-spaced Subarray 𝑁𝑁 𝑟𝑟 / 𝑘𝑘 … 𝑙𝑙 𝑟𝑟 … ADC

RF Chain

ADC

RF Chain

Widely-spaced Subarray k 𝑁𝑁 𝑟𝑟 / 𝑘𝑘 … 𝑙𝑙 𝑟𝑟 … ADC

RF Chain

ADC

RF Chain … 𝑁𝑁 𝑠𝑠 DigitalCombiner … Fig. 1: The architecture of the THz widely-spaced multi-subarray system with hybrid beamforming. received array can be decomposed into multiple sub-paths at the antenna level with distinctphases, which lead to the new type of intra-path multiplexing.Inspired by these, a promising research direction is to jointly harvest the inter-path and intra-path multiplexing gains, in addition to the beamforming gain in the THz UM-MIMO system,which adopts a widely-spaced multi-subarray (WSMS) architecture [1], [10]. On the one hand,the antennas within each subarray are critically-spaced to exploit the inter-path multiplexing andbeamforming gains. On the other hand, the subarrays are widely-spaced to make the Rayleighdistance exceed the communication distance such that the intra-path multiplexing gain couldbe harvested. Although the idea was studied in [1], [10], these studies miss to consider thedominant LoS peculiarity of the THz band. In addition, the number of subarrays and the subarrayspacing need to be carefully designed to fully unleash the potential of the THz WSMS system.Furthermore, the computational complexity of the hybrid beamforming algorithm needs to bereduced for practical implementation.In this work, we investigate the holistic harvesting of the inter-and-intra-multiplexing andbeamforming gains in the THz WSMS system with N t and N r antennas at the transmitterand receiver, as illustrated in Fig. 1. The antennas are divided uniformly into k widely-spacedsubarrays, while N t /k or N r /k antennas in each subarray are critically-spaced at the transmitterand receiver, respectively. Within the subarray, the inter-path multiplexing and beamforminggains are utilized. Among the subarrays, we design the subarray spacing to harvest the intra-pathmultiplexing gain. Additionally, compared to the fully-digital beamforming, the hardware- andenergy-efﬁcient hybrid beamforming is considered for the THz WSMS system. The distinctive contributions of this work are summarized as follows. • By capturing the spherical-wave propagation, we investigate the channel model of theTHz WSMS system and evaluate its accuracy. We prove that the total inter-and-intra-multiplexing gain in the THz WSMS system is min { kN p , N t , N r } , i.e., k times ofthat of the existing critically-spaced system , where N p is the number of multipath. • Inspired by the relationship between the inter-and-intra-multiplexing gain and the number ofsubarrays, we formulate a design problem to determine the number of subarrays and thesubarray spacing, aiming at achieving an optimal multiplexing gain and maximizingthe achievable rate of the THz WSMS system. By leveraging the dominant LoS peculiarityof the THz band, we propose a low-complexity dominant-LoS-relaxation (DLR) method to solve this design problem. • We analyze the implementation of the hybrid beamforming architecture in the THz WSMSsystem to reduce the hardware complexity. Since the subarrays are widely-spaced, theRF chain connected with one subarray can not connect to any other subarrays, whichresults in the block-diagonal structure of the analog beamformer. Based on this practicalhardware limitation, we formulate a WSMS hybrid beamforming problem and develop alow-complexity block-diagonal vectorization-based (BD-VEC) algorithm to solve it. • We carry out extensive simulations to evaluate the achievable rate of the THz WSMSsystem, the DLR method, and the BD-VEC algorithm.

Numerical results verify theinter-and-intra-multiplexing gain and show that the achievable rate of the THz WSMSsystem is substantially higher than the existing THz UM-MIMO systems. The DLR methodcan maximize the achievable rate of the THz WSMS system near-optimally, with a muchlower complexity than the optimal exhaustive search method. Compared to the existinghybrid beamforming algorithms, the proposed BD-VEC algorithm achieves higher spectralefﬁciency with substantially lower computational complexity.The remainder of this paper is organized as follows. The multiplexing and beamforming gainsin the critically-spaced and widely-spaced THz UM-MIMO systems are investigated in Sec.II. We analyze the channel model of the THz WSMS system and evaluate its accuracy inSec. III. Additionally, the inter-and-intra-multiplexing gain is derived. In Sec. IV, the DLRmethod is proposed to design the number of subarrays and the subarray spacing near-optimally.We investigate the BD-VEC hybrid beamforming algorithm in Sec. V. Furthermore, extensive simulation results are analyzed in Sec. VI. Finally, the conclusion is drawn in Sec. VII.

Notations : A is a matrix; a is a vector; a is a scalar; A [ i, l ] is the element of the i th row and l th column of A ; a [ i ] denotes the i th element of a ; I N is an N -dimensional identity matrix; ( · ) T , ( · ) ∗ , and ( · ) H represent transpose, conjugate, and conjugate transpose; (cid:107)·(cid:107) F is the Frobeniusnorm of the matrix; (cid:107)·(cid:107) p is the p -norm of the vector.II. A NTENNA S PACING AND G AINS IN TH Z UM-MIMO S

YSTEMS

In this section, we ﬁrst analyze the inter-path multiplexing and beamforming gains in thecritically-spaced THz UM-MIMO system. Then, we investigate the intra-path multiplexing gainin the widely-spaced THz UM-MIMO system. Inspired by these two systems, we propose tojointly utilize the inter-and-intra-multiplexing and beamforming gains.

A. Inter-path Multiplexing and Beamforming in Critically-spaced THz UM-MIMO System

Most of the existing MIMO studies [6]–[9] consider the critically-spaced MIMO system,where the antenna spacing d a equals to half of the wavelength, i.e., λ , as shown in Fig. 2(a).Multiple propagation paths are considered while we only draw two paths for simplicity. Since theantenna spacing is λ , the array aperture is far less than the communication distance D , for whichthe planar-wave propagation assumption on a constant phase over the plane is appropriate. Byconsidering N p propagation paths, an N r × N t -dimensional channel matrix H cs of the critically-spaced THz UM-MIMO system can be stated as [6], [8], [17] H cs = (cid:88) N p i =1 (cid:101) α i a ri ( φ ri , θ ri ) a ti ( φ ti , θ ti ) H , (1)where (cid:101) α i is the complex path gain of the i th multipath, which can be obtained by the ray-tracingmethod at THz frequencies [18]. The vectors a ri ( φ ri , θ ri ) and a ti ( φ ti , θ ti ) are the received andtransmitted array response vectors of the i th multipath. For an N L × N W -element uniform planararray on the x-z plane, a ti ( φ ti , θ ti ) can be written as a t i ( φ ti ,θ ti )= (cid:104) , ..., e j πλ d a ( n L · sin( θ ti )cos( φ ti )+ n W · cos( θ ti )) , ..., e j πλ d a (( N L − θ ti )cos( φ ti )+( N W − θ ti )) (cid:105) T , (2)where φ ti and θ ti are the azimuth and elevation departure angles of the i th multipath, with ≤ n L ≤ ( N L − and ≤ n W ≤ ( N W − . For the received array response vector a r i ( φ ri , θ ri ) ,the subscript ti in (2) should be replaced by ri . Compared to a single-input single-output (SISO)system, the critically-spaced THz UM-MIMO system can harvest the inter-path multiplexinggain and the beamforming gain to enhance the data rate as follows. 𝑑𝑑 𝑎𝑎 = λ TX Array RX Array (a) (b) D Multipath 1

RX Array 𝑑𝑑 𝑎𝑎 > λ𝐷𝐷 𝑁𝑁 − TX Array

Multipath 2

Fig. 2: (a) Critically-spaced system. (b) Widely-spaced system. Inter-path multiplexing : The channel H cs in (1) can be decomposed as N rank parallel SISOsub-channels via the singular value decomposition (SVD), where N rank = min { N t , N r , N p } isthe rank of H cs [1], [6], [10]. In the THz band, due to the prohibitively high reﬂection andscattering loss, the contribution of the higher-order reﬂection paths and the scattering paths isnegligible. As a result, the number of multipath N p is low, e.g., typically less than 6 [8], [12].Hence, in the critically-spaced THz UM-MIMO system, N rank usually equals to N p . Throughtransmitting different data streams on these N p parallel sub-channels, the spatial multiplexinggain is N p . Beneﬁting from the different N p multipath, this refers to the inter-path multiplexinggain [1], [10]. Beamforming : For illustration, we analyze the beamforming gain when the number ofpropagation path N p = 1 and only one beam is generated, while the general cases are extensible.The beamforming gain originates from focusing the energy of the electromagnetic wave into thetarget direction, through adjusting the transmitted and received steering vectors w t ( φ t , θ t ) and w r ( φ r , θ r ) , which have the same structure as the array response vector in (2). Here φ t ro ) and θ t r are the azimuth and elevation steering angles. The effective channel after beamforming is h e = w Hr ( φ r , θ r ) H cs w t ( φ t , θ t ) , (3)where the resulting beamforming gain is deﬁned as G BF = | h e (cid:101) α | . When the target direction ofthe steering vector is aligned with the propagation path at both the transmitter and receiver, thebeamforming gain reaches the maximum value of G BF = N t N r .By jointly utilizing the inter-path multiplexing and beamforming gains, the critically-spacedTHz UM-MIMO system achieves higher data rate than the THz SISO system. However, in THzband, the value of N p is usually limited, which leads to a fairly poor inter-path multiplexinggain. Although with ultra-massive antennas and high beamforming gain, the poor inter-pathmultiplexing gain N p still restricts the data rate of the critically-spaced THz UM-MIMO system. B. Intra-path Multiplexing in Widely-spaced THz UM-MIMO System

We denote S as the array aperture of the MIMO system, which equals to the length of thediagonal of the uniform planar array. It has been investigated that by increasing the antennaspacing d a to make S larger than (cid:113) λD , the spatial multiplexing gain can exceed N p , where D denotes the communication distance between the transmitter and receiver [13]–[16].Let us consider N t = N r = N and S = √ N − d a . As shown in Fig. 2(b), if the antennaspacing d a is larger than (cid:113) λD N − , we have S > (cid:113) λD . Since the communication distance D is usually far greater than λ , (cid:113) λD N − is much larger than λ , which results in a widely-spacedUM-MIMO system. For instance, when considering a 0.3 THz × UM-MIMO systemcommunicating over 100m, (cid:113) λD N − ≈ λ , which is ten times of λ .To highlight the spatial multiplexing gain in the widely-spaced system from the inter-pathmultiplexing gain in the critically-spaced system, we consider the case N p = 1 where there isonly one LoS path and therefore, no inter-path multiplexing effect. In a widely-spaced systemwhere the array aperture S > (cid:113) λD , the communication distance D is smaller than the Rayleighdistance D ray = S λ . The planar-wave propagation assumption is valid when the communicationdistance is longer than the Rayleigh distance, while for the case of D < D ray , the spherical-wave propagation needs to be invoked instead [15], [16]. As a result, the channel between the n th received antenna and the m th transmitted antenna is denoted as | (cid:101) α mn | e j πλ D mn , where | (cid:101) α mn | denotes the magnitude of the path gain, and D mn represents the distance between these twoantennas. It has been investigated in [1], [10] that although the array aperture S is larger than (cid:113) λD , it is still on the order of (cid:113) λD . Due to the sub-millimeter wavelength at THz band, the arrayaperture of the THz widely-spaced system is usually far less than the communication distance D . Therefore, all | (cid:101) α mn | can be treated identical and approximated as | (cid:101) α | . Hence, the N r × N t -dimensional channel matrix H ws of the widely-spaced THz UM-MIMO system is describedas [14]–[16] H ws = | (cid:101) α | G , (4)where N r × N t -dimensional G denotes the phase matrix such that G [ m, n ] = e j πλ D mn . Intra-path multiplexing:

In (4), the rank of the channel matrix H ws equals to the rank of thephase matrix G , i.e., min { N t , N r } [13], [15]. Therefore, through the SVD procedure, the numberof sub-channels is min { N t , N r } , which thereby suggests that the value of spatial multiplexinggain is equal to min { N t , N r } . Distinguished from the inter-path multiplexing gain which equals … … x z yD … … 𝑑𝑑 𝑠𝑠 𝑑𝑑 𝑠𝑠 𝑑𝑑 𝑠𝑠 𝑑𝑑 𝑠𝑠 Tx Rx o 𝑑𝑑 𝑎𝑎 = λ Multipath 2Multipath 1

Fig. 3: The THz WSMS UM-MIMO system. The numbers of antennas at transmitter and receiver are N t and N r . The numberof multipath is N p . For simplicity, we only draw two multipath components. to the number of multipath N p , this intra-path multiplexing gain does not rely on the number ofmultipath and exists even with only one LoS path.As we analyzed before, in the critically-spaced THz UM-MIMO system, the poor inter-pathmultiplexing gain limits the data rate. The existence of the intra-path multiplexing gain when theantennas are widely-spaced motivates us to propose a new design for THz UM-MIMO system,i.e., the THz widely-spaced multi-subarray (WSMS) UM-MIMO system which can jointly utilizethe inter-path and intra-path multiplexing and beamforming gains.III. TH Z WSMS UM-MIMO S

YSTEM

In this section, we propose a THz WSMS UM-MIMO system, as shown in Fig. 3. At both thetransmitter and receiver, the antennas are uniformly divided into k subarrays. In each subarray,antennas are critically-spaced, i.e., d a = λ , to beneﬁt from the inter-path multiplexing andbeamforming gains. To explore the intra-path multiplexing gain, subarrays are widely-spaced tomake the array aperture S exceed (cid:113) λD , as we analyzed in Sec. II-B. Based on the spherical-wave propagation, we investigate the channel model and evaluate its accuracy for the THz WSMSsystem. Then, we prove that the total inter-and-intra-multiplexing gain in the WSMS system is min { kN p , N t , N r } , which is k times of the multiplexing gain in the critically-spaced system. A. Channel Model

For each subarray, we select its ﬁrst antenna as the reference point, which is colored in redin Fig. 3. We denote the subarray spacing d s as the distance between the reference points of the adjacent subarrays. First, we use all k reference points at the transmitter to compose a virtualtransmitted array. At the receiver, we similarly compose k reference points as a virtual receivedarray. We design the subarray spacing d s to make the array aperture of the virtual transmittedand received arrays exceed (cid:113) λD . Hence, based on the spherical-wave propagation model, thechannel between the virtual arrays share the similar form as the widely-spaced system in (4),expressed as H v = (cid:88) N p i =1 α i G i , (5)where α i denotes the channel gain of the i th multipath. For LoS path, α i is the amplitude of thepath gain. While for reﬂection and scattering paths, α i equals to the product of the amplitudeof the path gain and the phase rotation caused by the reﬂection and scattering. The k × k -dimensional G i denotes the phase matrix of the i th multipath, satisfying G i [ m, n ] = e j πλ D mni ,and D mni refers to the distance between the n th transmitted reference point and the m th receivedreference point along the i th multipath.Without loss of generality, we consider that the transmitted array and the received arrayface to each other, both on the x-z plane. We denote the reference point and subarray whichlocate at the origin as the ﬁrst reference point and subarray. For the n th transmitted referencepoint, the coordinates are ( x n d s , , z n d s ) . Similarly, for the m th received reference point, thecoordinates are ( x m d s , D, z m d s ) . According to the coordinates, D mn can be calculated as D mn = (cid:112) (( x m − x n ) d s ) + D + (( z m − z n ) d s ) . Other D mni can be calculated with the similar methodby considering the reﬂection or scattering. If the received array is not parallel to the transmittedarray, i.e., tilted with an angle, the knowledge about the rotation of coordinate system should beused to adjust the coordinates of the received reference points.Through (5), the channel between the reference points of the n th transmitted subarray andthe m th received subarray is (cid:80) N p i =1 α i e j πλ D mni . Next, we investigate the channel between thesetwo subarrays. Note that the n th transmitted subarray and the m th received subarray compose acritically-spaced MIMO system. As analyzed in Sec. II-A, the planar-wave propagation assump-tion is valid and therefore, the channel model between these two subarrays is H sub = (cid:88) N p i =1 α i e j πλ D mni a ri ( φ mnri , θ mnri ) a ti ( φ mnti , θ mnti ) H , (6)where a ri ( φ mnri , θ mnri ) and a ti ( φ mnti , θ mnti ) are the array response vectors of the m th received and n th transmitted subarrays along the i th multipath, as given in (2). Moreover, φ mnri ( ti ) and θ mnri ( ti ) standfor the azimuth and elevation arrival (departure) angles, respectively. As we analyzed in Sec. II-B, the array aperture S is usually on the order of (cid:113) λD and far less than the communicationdistance D . Hence, the azimuth and elevation angles of different subarrays can be approximatedequally, denoted by φ ri ( ti ) and θ ri ( ti ) . Therefore, the array response vectors of all subarrays canbe approximated identically, i.e., a ri ( φ ri , θ ri ) and a ti ( φ ti , θ ti ) .Based on the channel response between the reference points in (5), the channel model betweenany two subarrays can be obtained via (6). By combining the spherical-wave model for thesubarray-level propagation in (5) and the planar-wave model for antenna-wise propagation withina subarray in (6), the channel model of the THz WSMS system can be stated as [1], [10] H = (cid:88) N p i =1 α i G i ⊗ (cid:16) a ri ( φ ri , θ ri ) a ti ( φ ti , θ ti ) H (cid:17) , (7)where ⊗ is the Kronecker product. The rank of channel H is illustrated as below. Lemma 1 : In the THz WSMS system, the rank of N r × N t -dimensional H is min { kN p , N t , N r } . Proof : The proof is presented in Appendix A.It is challenging to acquire instantaneous channel state information in the THz WSMS system.The parameters of channel model (7) can be divided into two parts, namely, planar-wave andspherical-wave. The phase matrix G i is under the spherical-wave propagation and α i , φ ri , θ ri , φ ti , θ ti are parameters under planar-wave propagation. Since α i , φ ri , θ ri , φ ti , θ ti are identical for anysubarray, we can ﬁrst activate one pair of subarrays at the transmitter and the receiver, to acquirethese parameters according to the existing THz planar-wave channel estimation method [19].Then, by activating the reference antennas in each subarray in (5), the method in [20] is efﬁcientto recover the phase matrix G i , by considering the spherical-wave propagation. B. Accuracy Evaluation of the Channel Model

Accuracy of the developed channel model in (7) is evaluated in an exemplary applicationof THz wireless backhaul. The transmitter and receiver are two base stations with heights of h t = h r = 30 m. The communication distance D is m. Due to the long communicationdistance and high reﬂection and scattering loss, we consider one LoS path and one ground-reﬂection path included in the THz wireless backhaul channel [21], [22]. As a ground truth, theresults from the spherical-wave MIMO channel model H s are regarded as the benchmark, whereeach element of H s is calculated individually by the ray-tracing method under the spherical-wavepropagation [18]. For comparison, the accuracy of the planar-wave MIMO channel model is also Subarray Spacing ds, ( ) A pp r ox i m a ti on E rr o r Planar-wave Channel ModelChannel Model (7)

Fig. 4: The approximation error of the developed channel model in (7). f = 0 . THz, N t = N r = 1024 , k = 4 . evaluated. The planar-wave MIMO channel model has the same form with (1), except that theantenna spacing is adjusted according to the widely-spaced subarrays.The approximation error of the channel model (7), which is deﬁned as (cid:107) H − H s (cid:107) F (cid:107) H s (cid:107) F , is evaluatedin Fig. 4. With the increase of subarray spacing d s , the inaccuracy of channel model (7) increasesslightly. The reason is that in (7), we consider that the array aperture is far less than thecommunication distance. As the subarray spacing increases, the array aperture grows such thatthe approximation error increases. As we analyzed before, the array aperture is usually onthe order of (cid:113) λD , whose corresponding subarray spacing is around λ . When d s = 200 λ ,the approximation error is only 4.3%. By contrast, the approximation error of the planar-wavechannel model is much higher than (7). When d s = 200 λ , the approximation error exceeds 100%since the phases of the elements in planar-wave channel model differ too much and even becomeopposite with those of the benchmark spherical-wave model H s .Therefore, the channel model (7) is accurate for the THz WSMS system. Furthermore, com-pared to the complicated spherical-wave channel model H s , the complexity of the channelmodel (7) is signiﬁcantly reduced. In H s , each element needs to be calculated individually.By contrast, in (7), we only need to calculate G i and extend it to H with the assistance of a ri ( φ ri , θ ri ) a ti ( φ ti , θ ti ) H . C. Inter-and-intra-multiplexing and Beamforming Gains

We analyze the existence of the inter-and-intra-multiplexing and beamforming gains in theTHz WSMS system. To start with, the beamforming gain arises since we can adjust the weightsof the antennas in the subarrays of the THz WSMS system, to concentrate the energy of the THz wave. As we analyzed in

Lemma 1 , in the THz WSMS system, the rank of H equals to min { kN p , N t , N r } . Therefore, the total inter-and-intra-multiplexing gain is min { kN p , N t , N r } ,which is k times or min { N t ,N r } N p times of the inter-path multiplexing gain in the critically-spacedTHz UM-MIMO system.In the THz WSMS system, on the one hand, when kN p ≤ min { N t , N r } , the inter-and-intra-multiplexing gain kN p increases with the number of subarrays k . On the other hand, with moresubarrays, the multiplexing gain increases which reduces the beamforming gain as well as theachievable rate. Therefore, there exists a trade-off on the number of subarrays k in the THzWSMS system. Furthermore, the subarray spacing d s inﬂuences the channel (7) and furtherinﬂuences the achievable rate. Consequently, the number of subarrays and the subarray spacingneed to be carefully designed to maximize the achievable rate of the THz WSMS system.IV. A CHIEVABLE R ATE M AXIMIZATION OF TH Z WSMS S

YSTEM

In this section, we design the number of subarrays k and the subarray spacing d s , which aretwo fundamental parameters of the inter-and-intra-multiplexing gain and the channel model (7),to maximize the achievable rate of the THz WSMS system.A straightforward and optimal method is using exhaustive search to select k and d s . However,this is impractical since the number of candidates of d s is prohibitively large. Instead, to design k and d s efﬁciently, we propose a low-complexity dominant-LoS-relaxation (DLR) method. Thedominant LoS property of the THz band is utilized to relax the intractable design problem intoa tractable problem, which can be solved by the gradient descend method.The joint design of k and d s is difﬁcult since these two parameters are coupled as derivedin (7). Therefore, we ﬁrst shed light on the design of the subarray spacing d s to maximize theachievable rate of the THz WSMS system, with a ﬁxed value of k . After that, we determine thevalue of the number of subarrays k to maximize the achievable rate. A. Design of the Subarray Spacing d s in DLR Method The rank of channel H in (7) is min { kN p , N t , N r } . Using the SVD procedure to decompose H into min { kN p , N t , N r } parallel sub-channels and leveraging the water-ﬁlling power allocation,the achievable rate of the channel H is given by C = (cid:88) min { kN p ,N t ,N r } i =1 log (cid:16) ρ i σ n r i ( H ) (cid:17) , (8) Subarray Spacing ds, ( ) A c h i e v a b l e R a t e ( bp s / H z ) HH LoS

Fig. 5: Achievable rates of H and H LoS versus d s . N t = N r = 1024 , k = 4 , D = 100 m, h t = h r = 30 m, ρ = 20 dBm. where r i ( H ) is the i th largest singular value of H , σ n denotes the noise power. Let x + (cid:44) max { x, } , ρ i = (cid:16) Γ − σ n r i ( H ) (cid:17) + describes the transmitted power allocated to the i th sub-channel,where Γ is chosen to satisfy the transmitted power constraint (cid:80) i ρ i = ρ .To take a closer look, the channel matrix H in (7) can be divided into two parts as H = H LoS + H NLoS = α G ⊗ ( a r a Ht ) + (cid:88) N p i =2 α i G i ⊗ ( a ri a Hti ) , (9)where H LoS and H NLoS refer to the LoS and NLoS parts of H , respectively. Note that φ ri ( ti ) and θ ri ( ti ) are omitted for simplicity. Compared to the microwave and mmWave frequencies, onepeculiarity of the THz channel is the dominant LoS property, i.e., the received power throughthe LoS path is more than times stronger than the other NLoS paths [11], [12]. Inspired bythis property, we propose to maximize the achievable rate of H LoS , instead of the achievablerate of H . To show the feasibility of this substitution, we compare the achievable rates of H LoS and H versus d s in Fig. 5. The setup of the channel follows the analysis in Sec. III-B. Since theLoS path is much stronger than the NLoS path, the achievable rate of channel H LoS contributesto 90% achievable rate of the entire channel H . Furthermore, the peaks of the rates of H LoS and H match very well at the same subarray spacing. Speciﬁcally, the ﬁrst three peaks of theachievable rate of H LoS locate at d s = 227 . λ, . λ, . λ , while the ﬁrst three peaks of H locate at d s = 230 . λ, . λ, . λ . The deviations are less than 2%, which reveals thatit is reasonable to invoke the achievable rate maximization of H LoS to substitute that of H . Therank of H LoS is min { kN p , N t , N r } with N p = 1 , i.e., equals to k . Thus, the achievable rate of H LoS is written as C LoS = (cid:88) ki =1 log (cid:16) (cid:98) ρ i σ n r i ( H LoS ) (cid:17) , (10) where (cid:98) ρ i = (cid:16)(cid:98) Γ − σ n r i ( H LoS ) (cid:17) + is the transmitted power allocated to the i th sub-channel, (cid:98) Γ ischosen to satisfy the transmitted power constraint (cid:80) i (cid:98) ρ i = ρ , and r i ( H LoS ) denotes the i th largest singular value of H LoS . Hence, the problem of designing d s to maximize C LoS can beformulated as max d s (cid:88) ki =1 log (cid:16) (cid:98) ρ i σ n r i ( H LoS ) (cid:17) (11a) s . t . (cid:98) ρ i = (cid:16)(cid:98) Γ − σ n r i ( H LoS ) (cid:17) + , (cid:88) ki =1 (cid:98) ρ i = ρ (11b) (cid:88) ki =1 r i ( H LoS ) = tr( H LoS H HLoS ) , (11c)where (11c) originates from the relationship between the singular values and the trace of H LoS .Directly solving problem (11) is ambitious due to the complicated water-ﬁlling power al-location (cid:98) ρ i . To make the optimization tractable, we substitute the water-ﬁlling allocation withequal-power allocation (cid:98) ρ i = ρ/k , which yields a lower bound of C LoS . As a result, the problem(11) is relaxed to maximize the lower bound of C LoS as max d s (cid:88) ki =1 log (cid:16) ρkσ n r i ( H LoS ) (cid:17) (12a) s . t . (cid:88) ki =1 r i ( H LoS ) = tr (cid:0) H LoS H HLoS (cid:1) . (12b)Following the Jensen’s inequality, (12a) can be rearranged as (cid:88) ki =1 log (cid:16) ρkσ n r i ( H LoS ) (cid:17) ≤ k · log (cid:16) k · ρkσ n (cid:88) ki =1 r i ( H LoS ) (cid:17) (13a) = k · log (cid:16) k · ρkσ n tr (cid:0) H LoS H HLoS (cid:1)(cid:17) , (13b)where the inequality in (13a) holds when all r i ( H LoS ) are identical. In (13b), according to theproperty of the Kronecker product, H LoS H HLoS = | α | ( G G H ) ⊗ ( a r a Ht a t a Hr ) . Following theproperty of matrix trace, tr( H LoS H HLoS ) = | α | · tr( G G H ) · tr( a r a Ht a t a Hr ) , where | α | and tr( a r a Ht a t a Hr ) are constants. As analyzed in (5), the structure of G G H can be expanded as G G H =  k · · · (cid:80) ki =1 e j πλ ( D i − D ki ) ... . . . ... (cid:80) ki =1 e j πλ ( D ki − D i ) · · · k  . (14)As a result, tr( G G H ) = k , which is a constant since k is ﬁxed temporarily. Therefore, tr( H LoS H HLoS ) is ﬁxed, which makes (13b) a constant. Hence, (13b) is the upper bound of (12a), which is achieved when all r i ( H LoS ) are identical. Note that when (13b) is achieved,(12a) is maximized.Now, we investigate how to design d s to make all r i ( H LoS ) identical. According to the propertyof Kronecker product, r ( H LoS ) = r ( H LoS ) = . . . = r k ( H LoS ) is equivalent to r ( G ) = r ( G ) = . . . = r k ( G ) , since a r a Ht only has one singular value. Following the property ofsingular value and the structure of G G H in (14), r ( G ) = r ( G ) = . . . = r k ( G ) is furtherequivalent to G G H = k · I k . Therefore, we propose to design d s to make the difference between G G H and k · I k as small as possible, i.e., to minimize (cid:107) G G H − k · I k (cid:107) F . Let f ( d s ) = (cid:107) G G H − k · I k (cid:107) F , the design problem of d s is formulated as min d s f ( d s )s . t . d min s ≤ d s ≤ d max s , (15)where d max s denotes the maximal subarray spacing, which is decided by the maximal arrayaperture by considering the practical communication system. The minimal subarray spacing d min s is deﬁned to avoid overlap of adjacent subarrays. According to the structure of G G H in(14), f ( d s ) can be rewritten as f ( d s ) = − k + k (cid:88) a =1 k (cid:88) b =1 (cid:12)(cid:12)(cid:12) k (cid:88) i =1 e j πλ ( D ai − D bi ) (cid:12)(cid:12)(cid:12) (cid:124) (cid:123)(cid:122) (cid:125) ( (cid:63) ) . (16)As analyzed in Sec. III, πλ D mn can be represented as πλ D mn = 2 πλ (cid:112) D + ( l mn ) , (17)where ( l mn ) = (( x m − x n ) d s ) + (( z m − z n ) d s ) . The square root form brings difﬁculties tothe calculation. Therefore, we propose to use the Taylor expansion to simplify the expression of πλ D mn . Using the Taylor expansion, we have πλ D mn = 2 πDλ (cid:114) (cid:16) l mn D (cid:17) = 2 πDλ (cid:16) (cid:16) l mn D (cid:17) − (cid:16) l mn D (cid:17) + . . . (cid:17) , (18)where l mn ranges from to the array aperture S , which is on the order of (cid:113) λD as analyzed inSec. III-A. As a result, the maximal value of πDλ ( − ( l mn D ) ) and the higher order terms of theTaylor expansion are on the order of λD . Since the sub-millimeter wavelength of THz wave isusually much smaller than the communication distance D , these terms of the Taylor expansioncould be omitted. Hence, D mn in (17) can be rearranged as D mn ≈ D + d s D (( x m − x n ) + ( z m − z n ) ) . (19) By substituting (19) in the expression ( (cid:63) ) in (16), we can further derive (cid:12)(cid:12)(cid:12) k (cid:88) i =1 e j πλ ( D ai − D bi ) (cid:12)(cid:12)(cid:12) = (cid:16) k (cid:88) i =1 cos (cid:0) πλ ( D ai − D bi ) (cid:1)(cid:17) + (cid:16) k (cid:88) i =1 sin (cid:0) πλ ( D ai − D bi ) (cid:1)(cid:17) (20a) = k + k (cid:88) i =1 k (cid:88) l = i +1 (cid:16) πλ (cid:0) ( D ai − D bi ) − ( D al − D bl ) (cid:1)(cid:17) (20b) ≈ k + k (cid:88) i =1 k (cid:88) l = i +1 (cid:16) πd s λD ψ a,b,i,l (cid:17) , (20c)where (20c) is the result of substituting (19) in (20b) and ψ a,b,i,l = ( x a − x b )( x l − x i ) + ( z a − z b )( z l − z i ) . By replacing the expression ( (cid:63) ) in (16) with (20c), the design problem in (15) canbe reformulated as min d s k (cid:88) a =1 k (cid:88) b =1 k (cid:88) i =1 k (cid:88) l = i +1 (cid:16) πd s λD ψ a,b,i,l (cid:17) s . t . d min s ≤ d s ≤ d max s . (21)The objective function of problem (21) is non-convex, which brings difﬁculty to obtain theoptimal solution. To enable the optimization, we analyze the gradient function f (cid:48) ( d s ) of theobjective function in (21) as follows f (cid:48) ( d s ) = k (cid:88) a =1 k (cid:88) b =1 k (cid:88) i =1 k (cid:88) l = i +1 (cid:16) − πλD ψ a,b,i,l sin (cid:16) πd s λD ψ a,b,i,l (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) ( † ) d s (cid:17) . (22)We observe that the gradient function f (cid:48) ( d s ) is Lipschitz continuous, which suggests that forany two points d s and d s located at [ d min s , d max s ] , there exists a constant (cid:15) which follows that | f (cid:48) ( d s ) − f (cid:48) ( d s ) | ≤ (cid:15) | d s − d s | . The proof of the Lipschitz continuous gradient can be completedwith the following key idea. To start with, ψ a,b,i,l = ( x a − x b )( x l − x i ) + ( z a − z b )( z l − z i ) has amaximal value since x a , x b , x l , x i , z a , z b , z l , and z i are the coordinates of the reference pointsin the transmitted and received arrays, which are limited by the physical size of the array, asanalyzed in Sec. III. Additionally, πλD is a constant and the value of sine function locates at [ − , . Therefore, ( † ) in (22) is upper bounded by a constant such that f (cid:48) ( d s ) is upper boundedby a linear function. As a result, for any d s and d s in [ d min s , d max s ] , we can ﬁnd a constant (cid:15) such that | f (cid:48) ( d s ) − f (cid:48) ( d s ) | ≤ (cid:15) | d s − d s | . (23) For a non-convex optimization problem with the objective function owning a Lipschitz continuousgradient, the gradient descend method can converge to a local optimal point [23]. Consequently,we invoke the gradient descend method to calculate a local optimal solution for d s in (21). B. Design of the Number of Subarrays k in DLR Method Different from the subarray spacing that has a prohibitively large number of candidate values,the possible number of subarrays k is very limited, for the following constraints. First, thenumber of subarrays k is a positive integer. Second, the number of antennas in subarrays N t k and N r k are positive integers. For instance, when N t = N r = 1024 , k can only be selected from , , , , , , , , , , , i.e., k has only possible selections. Consequently, theexhaustive search on k has a reasonably low complexity, which operates as follows. For eachvalue of k , we can obtain the value of d s according to (21) to maximize the achievable rate,based on the gradient descend method. Then, we can calculate the achievable rate of H with allpossible k , among which we determine the combination of k and d s that yields the highest rate.For practical implementation, the number of subarrays k and the subarray spacing d s are ﬁxedand can not be adjusted after the time of manufacture. The proposed design of k and d s is relatedto the communication distance and the position of the transmitter and receiver. Consequently,our design on k and d s in this work is applicable for the ﬁxed wireless communication scenarioswhich have a ﬁxed transmitter and receiver, e.g., THz wireless backhaul. One potential researchtopic is investigating how to extend the design to mobile environments.V. H YBRID B EAMFORMING IN TH Z WSMS S

YSTEM

Due to the use of ultra-massive antennas in the THz WSMS system, the hardware complexityand power consumption of the fully-digital beamforming are unbearable [6], [9], [24], [25].By contrast, hybrid beamforming is a promising technology which achieves comparable spec-tral efﬁciency with the fully-digital beamforming, while the hardware complexity and powerconsumption are substantially reduced [26]–[29]. In this section, we ﬁrst formulate the hybridbeamforming problem in the THz WSMS system. Then, we investigate a low-complexity block-diagonal vectorization-based (BD-VEC) algorithm to solve the hybrid beamforming problem.

A. Problem Formulation

As shown in Fig. 1, we assign l t and l r RF chains to control one subarray at the transmitterand receiver, respectively. The number of RF chains at the transmitter and receiver are L t = kl t and L r = kl r . The number of data streams is N s . The system model of the THz WSMS systemwith hybrid beamforming is described as y = √ ρ C H D C H A HP A P D s + C H D C H A n , (24)where s and y are N s × transmitted and received signals, ρ is the transmitted power. Thedimensions of analog precoding matrix P A and the digital precoding matrix P D for the transmitterare N t × L t and L t × N s . The dimensions of analog combining matrix C A and the digital combiningmatrix C D for the receiver are N r × L r and L r × N s , respectively. Moreover, n represents an N r × noise vector. Since the subarrays are widely-spaced, the RF chain connected with onesubarray can not connect to the other subarrays, which results in the block-diagonal structure of P A as P A =  p ... p l t ... ... ... ... ... p ... p l t ... ... ... ... ... ... ... ... ... ... ... ... ... ... p k ... p kl t  , (25)where p il is an N t k × vector and is an N t k × zero vector. Since p il is implemented by phaseshifters, the elements of p il follow the constant modulus (CM) constraint, i.e., | p il [ m ] | = 1 /N t , ∀ m . The constraints on C A and C D are the same as P A and P D as above. Additionally, thetransmitter’s power constraint is given by (cid:107) P A P D (cid:107) F = N s . Therefore, the achievable spectralefﬁciency of the THz WSMS system with hybrid beamforming is regarded as [1], [10] SE = log (cid:16)(cid:12)(cid:12) I N s + ρN s R − n C H D C H A HP A P D P H D P H A H H C A C D (cid:12)(cid:12)(cid:17) , (26)where R n = σ n C H D C H A C A C D is the noise covariance matrix after combining, and σ n denotes thenoise power.The joint design of P A , P D , C A , and C D is usually intractable [6], [7], [9]. Instead, it is acommon practice to divide into two separate problems, i.e., precoding problem and combiningproblem [6], [7]. We ﬁrst focus on the precoding process. It has been proved in [6] that, ﬁndingthe proper P A and P D to maximize the spectral efﬁciency (26) can be transformed to solve thefollowing problem as min P A , P D (cid:107) P opt − P A P D (cid:107) F s . t . | p il [ m ] | = 1 /N t , ∀ i, l, m (cid:107) P A P D (cid:107) F = N s , (27) where N t × N s -dimensional optimal fully-digital precoder P opt equals to V N s · Γ . Moreover, N t × N s -dimensional V N s refers to the ﬁrst N s columns of the right singular matrix fromthe SVD of the channel matrix H in (7). The N s × N s -dimensional Γ is water-ﬁlling powerallocation matrix. The combining problem has a similar form with the precoding problem (27),by substituting P opt , P A , P D with C opt , C A , C D , and removing the power constraint, where C opt comes from the ﬁrst N s columns of the left singular matrix of the SVD of H . B. Block-diagonal Vectorization-based (BD-VEC) Algorithm

We develop a low-complexity BD-VEC algorithm to solve the precoding and combiningproblems. We take the precoding problem (27) as an illustration and the combining problemcan be solved with a similar way. The problem (27) is intractable to solve due to the followingreasons. First, P A and P D are coupled. Second, the structure of P A is irregular, i.e., block-diagonal. Third, P A is under the non-convex CM constraint. In light of these, the main idea ofthe BD-VEC algorithm is presented as follows. We propose to alternatively solve P A and P D ,i.e., ﬁx one and update another one. Then, we leverage the vectorization principle to reshape P A to a column form, which tackles the irregular and non-convex constraints. Although thevectorization idea has been studied in [8], the algorithm in [8] can only be applied for thespecial case that L t = N s . By contrast, the developed BD-VEC algorithm can work for generalcases that L t ≥ N s . The digital precoder design : To begin with, we design P D , when P A is ﬁxed as below.A semi-unitary digital precoding matrix can mitigate the interference of different data streamsand improve the spectral efﬁciency [7], [29]. Inspired by this, we add a semi-unitary constraintto the digital precoding matrix as P H D P D = I N s . By ﬁxing the analog precoder P A and omittingthe transmitter’s power constraint temporarily, the problem (27) can be reformulated as min P D (cid:107) P opt − P A P D (cid:107) F s . t . P H D P D = I N s . (28)The solution of (28), which is called the orthogonal procrustes problem, is given as [7], [29] P D = (cid:98) V (cid:98) U H , (29)where L t × L t - and N s × N s -dimensional (cid:98) V and (cid:98) U are obtained from the SVD of P H opt P A , yieldingthat P H opt P A = (cid:98) U (cid:98) Σ (cid:98) V H , and (cid:98) V is the ﬁrst N s columns of (cid:98) V . The analog precoder design : By ﬁxing P D , we adopt the vectorization principle totransform the irregular P A in (25) into a column form. To invoke the property of the Kroneckerproduct, we have Vec( P A P D ) = ( P T D ⊗ I N t )Vec( P A ) , (30)where Vec( · ) represents the vectorization operator. Thanks to the vectorization process, we havethe following equation that ( P T D ⊗ I N t )Vec( P A ) = P e q , where N t l t × -dimensional q is the resultof removing the zero elements in Vec( P A ) , and N t N s × N t l t -dimensional P e is the result ofremoving the corresponding columns in ( P T D ⊗ I N t ) . As a result, the zero elements have beenremoved. Based on this, the objective function in (27) can be rearranged as (cid:107) P opt − P A P D (cid:107) F = (cid:107) Vec( P opt ) − ( P T D ⊗ I N t )Vec( P A ) (cid:107) (31a) = (cid:107) Vec( P opt ) (cid:107) − R e ((Vec( P opt )) H ( P T D ⊗ I N t )Vec( P A )) + (cid:107) ( P T D ⊗ I N t )Vec( P A ) (cid:107) , (31b) = (cid:107) Vec( P opt ) (cid:107) − R e ((Vec( P opt )) H P e q ) (cid:124) (cid:123)(cid:122) (cid:125) ( ∗ ) + (cid:107) ( P T D ⊗ I N t )Vec( P A ) (cid:107) (cid:124) (cid:123)(cid:122) (cid:125) ( ‡ ) , (31c)where (31c) comes from the fact that ( P T D ⊗ I N t )Vec( P A ) = P e q . The last term ( ‡ ) can be furtherderived as (cid:107) ( P T D ⊗ I N t )Vec( P A ) (cid:107) = Tr(Vec( P A ) H ( P T D ⊗ I N t ) H ( P T D ⊗ I N t )Vec( P A )) (32a) = Tr(Vec( P A ) H (( P T D ) H ⊗ I N t )( P T D ⊗ I N t )Vec( P A )) (32b) = Tr(Vec( P A ) H (( P D P H D ) ∗ ⊗ I N t )Vec( P A )) (32c) = Tr (cid:18) Vec( P A ) H (cid:18)(cid:18)(cid:101) U (cid:20) I N s (cid:21)(cid:101) U H (cid:19) ∗ ⊗ I N t (cid:19) Vec( P A ) (cid:19) (32d) = Tr (cid:18)(cid:18)(cid:20) I N s (cid:21) ⊗ I N t (cid:19) ( (cid:101) U T ⊗ I N t )Vec( P A )Vec( P A ) H ( (cid:101) U ∗ ⊗ I N t ) (cid:19) (32e) ≤ Tr(( (cid:101) U T ⊗ I N t )Vec( P A )Vec( P A ) H ( (cid:101) U ∗ ⊗ I N t )) (32f) = Tr(( (cid:101) U ∗ ⊗ I N t )( (cid:101) U T ⊗ I N t )Vec( P A )Vec( P A ) H ) (32g) = (cid:107) Vec( P A ) (cid:107) , (32h)where (32b) follows the property of the Kronecker product that ( X ⊗ Y ) H = X H ⊗ Y H . Inaddition, (32c) is the result of the mixed-product property of the Kronecker product such that Algorithm 1: BD-VEC algorithmInput: P opt , L t , and k

01: Initial P A randomly with the constraint in (25)02: Initial P D as (29)03: Repeat

04: Calculate P A through (33)05: Update P D through (29)06: Until convergence P D = ( P H A P A ) − P H A P opt

08: Normalize P D as √ N s (cid:107) P A P D (cid:107) F P D Output: P A and P D ( X ⊗ Y )( X ⊗ Y ) = ( X X ) ⊗ ( Y Y ) . In (32d), (cid:18)(cid:101) U (cid:20) I N s (cid:21)(cid:101) U H (cid:19) is the SVD of P D P H D since P H D P D = I N s , where (cid:101) U is an L t × L t -dimensional unitary matrix. Next, (32e) follows from theproperty that Tr(

XYZ ) = Tr(

YZX ) . The inequality (32f) follows that all diagonal elements ofthe Hermitian matrix ( (cid:101) U T ⊗ I N t )Vec( P A )Vec( P A ) H ( (cid:101) U ∗ ⊗ I N t ) are non-zero, and the equalityholds when L t = N s . Owing to the mixed-product property of the Kronecker product and theunitary property of (cid:101) U , ( (cid:101) U ∗ ⊗ I N t )( (cid:101) U T ⊗ I N t ) equals to an identity matrix, which results in (32h).Due to the CM constraint of P A , the modulus of each non-zero element in P A is √ N t . Sincethe number of non-zero elements in P A is N t l t , (cid:107) Vec( P A ) (cid:107) equals to l t . Therefore, the upperbound of ( ‡ ) in (31c) is a constant, as derived in (32h). Additionally, (cid:107) Vec( P opt ) (cid:107) is a constantsince P opt is ﬁxed. Consequently, by substituting ( ‡ ) with its upper bound l t , the minimizationof (31c) is transformed to the maximization of the term ( ∗ ) . Under the CM constraint of q , theoptimal solution of q to maximize R e ((Vec( P opt )) H P e q ) is known as q = 1 √ N t e j ∗ angle( P He Vec( P opt )) , (33)where angle( · ) denotes the phase vector of the vector ( · ) . With the calculated q , Vec( P A ) and P A can be recovered, which complete the design of the analog precoding matrix.To sum up, we can alternatively design P A and P D according to (33) and (29), until conver-gence. Then we apply the least square solution P D = ( P H A P A ) − P H A P opt to further reduce theobjective function of (27). The power constraint is applied to P D such that P D = √ N s (cid:107) P A P D (cid:107) F P D .The convergence property is numerically analyzed in the simulations of Sec. VI-C. The pseudocodes to implement the BD-VEC algorithm are described in Algorithm 1 . Computational complexity analysis : The computational complexity of (33) is O ( N t L t N s ) .The computational complexity of (29) is O ( N t L t N s ) . The computational complexity of step07 and step 08 is O ( N t L t ) . Hence, the overall complexity of the BD-VEC algorithm can berepresented as O ( K ( N t L t N s ) + N t L t ) , where K is the number of iterations.VI. P ERFORMANCE E VALUATION

In this section, we ﬁrst evaluate the achievable rate of the THz WSMS system as a function ofthe subarray spacing d s and the number of subarrays k . The performance of the proposed DLRmethod is compared to the optimal exhaustive search method. Then, we analyze the spectralefﬁciency of the THz WSMS system with the hybrid beamforming architecture, based on thedeveloped BD-VEC algorithm and the existing hybrid beamforming algorithms [9], [10].In the simulations, we consider the THz wireless backhaul scenario [21], [22]. The height ofthe transmitter and receiver is h t = h r = 30 m. The communication distance D ranges from m, m, m, m, m, to m. We consider a LoS path and a ground-reﬂection path, whosepath gains can be obtained by the ray-tracing method [11], [18]. The number of transmitted andreceived antennas is N t = N r = 1024 . The subarrays are planar, as shown in Fig. 3. The numberof subarrays k and the subarray spacing d s are set equally at the transmitter and receiver. Thecentral frequency is f = 0 . THz. The bandwidth B is 5 GHz and the corresponding noisepower is -76.2 dBm [30]. A. Achievable Rate of the THz WSMS System1)

Achievable rate versus d s : We evaluate the achievable rate of the THz WSMS system,which is calculated by (8), versus the subarray spacing d s , as shown in Fig. 6(a). For THzWSMS system with k = 2 , , , the achievable rate varies rapidly with d s . For instance, when d s = 192 . λ , the achievable rate of the THz WSMS system with k = 16 achieves the maximalvalue of 40.2 bps/Hz. By contrast, when d s = 386 . λ , the achievable rate of the THz WSMSsystem with k = 16 reaches the minimal value of 14 bps/Hz. Similar phenomenon can beobserved when k = 2 and , which reveals that, for THz WSMS system, the subarray spacing d s needs to be carefully designed to maximize the achievable rate. Achievable rate versus k : Next, we analyze the achievable rate of the THz WSMSsystem versus the number of subarrays k , with a constant 1024 antennas at the transceiver, asdemonstrated in Fig. 6(b). We use exhaustive search method to ﬁnd the optimal d s to maximize Subarray Spacing ds, ( ) A c h i e v a b l e R a t e ( bp s / H z ) k =2 k =4 k =16 (a) Achievable rate versus d s . Number of Subarrays k A c h i e v a b l e R a t e ( bp s / H z ) (b) Achievable rate versus k .Fig. 6: The achievable rate of the THz WSMS system versus d s and k . The transmitted power is ρ = 20 dBm. D = 150 m. the achievable rate of the THz WSMS system for each k . When k = 1 , all antennas form onearray. The THz WSMS system is equivalent to the critically-spaced UM-MIMO system, whichowns the inter-path multiplexing and beamforming gains without the intra-path multiplexing gain.Due to the limited inter-path multiplexing gain which equals to N p = 2 , the achievable rate ofthe critically-spaced UM-MIMO system is 13.7 bps/Hz. The other extreme case is k = 1024 such that all antennas are widely-spaced, by which the THz WSMS system is equivalent tothe widely-spaced UM-MIMO system. The total multiplexing gain is 1024, i.e., the number ofthe effective sub-channels is 1024. The beamforming gain is diluted to such a large number ofsub-channels, which causes a low SNR of each sub-channel. As a result, the achievable rate ofthe THz WSMS system when k = 1024 is poor, i.e., 2.5 bps/Hz.With ≤ k ≤ , the inter-and-intra-multiplexing and beamforming gains co-exist. As weanalyzed in Sec. III-C, with the increase of k , the inter-and-intra-multiplexing gain kN p growswhile the beamforming gain decreases. Hence, there exists a trade-off about k to balance theinter-and-intra-multiplexing gain and the beamforming gain. When k < , the increase ofmultiplexing gain dominates the achievable rate, which grows with k . While when k > , thedecrease of beamforming gain makes the achievable rate reduce rapidly. In particular at k = 16 ,the achievable rate reaches the maximum value of 40.2 bps/Hz, which is around 3 times and16 times of those of the critically-spaced and widely-spaced UM-MIMO systems. Therefore,through the joint utilization of the inter-and-intra-multiplexing gain and beamforming gain, theachievable rate of the THz WSMS system is substantially higher than the critically-spaced and

50 100 150 200 250 300

Distance D ( m ) A c h i e v a b l e R a t e ( bp s / H z ) ExhaustiveDLR

Fig. 7: The achievable rate of the THz WSMS system with the DLR method. ρ = 20 dBm. widely-spaced THz UM-MIMO systems. Furthermore, the selection of the number of subarrays k is critical to the maximization of the achievable rate of the THz WSMS system. B. Performance Evaluation of the DLR Method

As we analyzed above, the design of k and d s is critical to the maximization of the achievablerate of the THz WSMS system. The DLR method proposed in Sec. IV can design k and d s tomaximize the achievable rate with a low complexity. As illustrated in Fig. 7, we evaluate theachievable rate of the THz WSMS system, whose d s and k are determined by the DLR methodand the optimal exhaustive search method. It can be observed that the rate gap between the DLRmethod and the exhaustive search method is negligible. For instance, when D = 100 m, the gapis only 1.6%. Therefore, using the DLR method to design d s and k , the achievable rate of theTHz WSMS system can be maximized near-optimally.Furthermore, we present the designed k and d s of the DLR method compared to the exhaustivesearch method in Fig. 8(a) and (b). We observe that, except for the case of D = 100 m, the selected k and d s of these two methods are almost the same, which result in the similar achievable ratein Fig 7. When D = 100 m, the designed k and d s of the DLR method differ from those ofthe exhaustive search method. The reason is that, when D = 100 m, the achievable rates of theTHz WSMS system with k = 64 and k = 32 are similar. The DLR method designs k as 32while the exhaustive search method selects k as 64. With different k , the designed d s of thesetwo methods differs. Note that although with different k and d s , the rate gap between these twomethods is only 1.6%. To sum up, for all D , the DLR method can design near-optimal k and d s tomaximize the achievable rate. Additionally, since the DLR method transforms the complicated

50 100 150 200 250 300

Distance D ( m ) N u m b e r o f S ub a rr a y s k ExhaustiveDLR (a) The number of suabrrays k versus D.

50 100 150 200 250 300

Distance D ( m ) S ub a rr a y S p ac i ng d s , () ExhaustiveDLR (b) The subarray spacing d s versus D.Fig. 8: The designed k and d s of the DLR method. ρ = 20 dBm. problem (11) to a tractable problem (21), which can be efﬁciently solved, the computationalcomplexity of the DLR method is substantially lower than the exhaustive search method. C. Performance Evaluation of the BD-VEC Algorithm1)

Spectral efﬁciency : We evaluate the spectral efﬁciency of the THz WSMS system withBD-VEC hybrid beamforming algorithm as follows. Most of the existing algorithms can nottackle the block-diagonal challenge in WSMS system. We select the algorithm in [9] and thealgorithm in [10] which can overcome this challenge as the benchmark. The design of the analogbeamforming matrix of the algorithm in [10] is based on the codebook and we set its codebooksize as N t .As illustrated in Fig. 9(a), we evaluate the spectral efﬁciency of the BD-VEC algorithm byvarying the transmitted power ρ . The spectral efﬁciency of the BD-VEC algorithm approachesthe capacity. For instance, when ρ = 20 dBm, the spectral efﬁciency of the BD-VEC algorithm is22.7 bps/Hz, which is 3.6 bps/Hz lower than the capacity. Compared to the algorithm in [9], thespectral efﬁciency of the BD-VEC algorithm is 3.3 bps/Hz, i.e., 17% higher. The reason is thatthe algorithm in [9] utilizes the approximation that P H A P A ≈ I . However, in the WSMS hybridbeamforming problem (27), the block-diagonal structure of P A as analyzed in (25) enlargesthe approximation error and degrades the spectral efﬁciency. When ρ = 20 dBm, the BD-VECalgorithm outperforms the algorithm in [10] by 6.8 bps/Hz, i.e., 43%. The huge performance Transmitted Power (dBm) S p ec t r a l E ff i c i e n c y ( bp s / H z ) CapacityBD-VECAlgorithm in [9]Algorithm in [10] (a) The spectral efﬁciency versus ρ . L t = 12 . Lt S p ec t r a l E ff i c i e n c y ( bp s / H z ) CapacityBD-VECAlgorithm in [9]Algorithm in [10] (b) The spectral efﬁciency versus L t . ρ = 20 dBm.Fig. 9: The spectral efﬁciency of the BD-VEC algorithm. D = 150 m, k = 4 , N s = 8 , L r = L t . gap comes from that the algorithm in [10] enforces an additional constraint that all [ p i , . . . , p il t ] with i = 1 , , . . . , k in (25) are identical, which reduces the spectral efﬁciency.As shown in Fig. 9(b), we analyze the spectral efﬁciency versus the number of RF chains L t .The spectral efﬁciencies of all these three algorithms grow with L t . The spectral efﬁciency gapbetween the BD-VEC algorithm and the capacity is 6.3 bps/Hz when L t = 8 . As L t grows to , the gap reduces to 0.8 bps/Hz, i.e., the spectral efﬁciency of the BD-VEC algorithm is only3% lower than the capacity. Compared to the algorithm in [9] and the algorithm in [10], thespectral efﬁciency of the BD-VEC algorithm is higher by at least 3.3 bps/Hz for all L t . Computational complexity and convergence analysis : Furthermore, we analyze the com-putational complexity of the proposed BD-VEC algorithm, compared to the algorithms in [9],[10]. As studied in Sec. V-B, the computational complexity of the proposed BD-VEC algorithmis O ( K ( N t L t N s ) + N t L t ) , where K is the number of iterations. To obtain the overall complexityof the BD-VEC algorithm, we evaluate its convergence, as presented in Fig. 10. The proposedBD-VEC algorithm needs around 20 iterations to converge, which is on the order of L t . Hence,the overall computational complexity of the BD-VEC algorithm is O ( N t L t N s ) . By contrast,the computational complexity of the algorithm in [9] is O ( N t ) . For each search candidate, thecomputational complexity of the algorithm in [10] is O ( N t L t N s ) . In our simulations, the numberof search candidates in the codebook is set as N t such that the computational complexity ofthe algorithm in [10] is O ( N t L t N s ) . Since the number of antennas N t is far greater than thenumber of RF chains L t and data streams N s in THz WSMS hybrid beamforming system, the Number of Iterations O b j ec ti v e F un c ti on Fig. 10: The convergence of the BD-VEC algorithm. The objective function is (cid:107) P opt − P A P D (cid:107) F . k = 4 , L t = L r = 8 , N s = 8 , ρ = 20 dBm, D = 150 m. computational complexity of the BD-VEC algorithm is much lower than the algorithms in [9],[10]. To sum up, compared to the existing algorithms in [9], [10], the BD-VEC algorithm achievessigniﬁcantly higher spectral efﬁciency with a substantially lower computational complexity.VII. C ONCLUSION

In this paper, a WSMS hybrid beamforming architecture is investigated for THz UM-MIMOsystems. In each subarray, the antennas are critically-spaced to utilize the inter-path multiplexingand beamforming gains. Among the widely-spaced subarrays, the intra-path multiplexing gainis harvested. We derive that the inter-and-intra-multiplexing gain is min { kN p , N t , N r } , whichis further veriﬁed in numerical results. Inspired by this, we propose a low-complexity DLRmethod to maximize the achievable rate of the THz WSMS system. We formulate the WSMShybrid beamforming problem, which is solved by the investigated low-complexity BD-VECalgorithm. Extensive simulation results show that the achievable rate of the THz WSMS systemis 3 times and 16 times of those of the existing critically-spaced and widely-spaced systems. Thelow-complexity DLR method can maximize the achievable rate of the THz WSMS system near-optimally. Compared to the existing hybrid beamforming algorithms, the BD-VEC algorithmachieves 17% higher spectral efﬁciency with much lower computational complexity. A PPENDIX AP ROOF OF

Lemma 1

The rank of the N r × N t -dimensional channel matrix H is limited by min { N t , N r } . We ﬁrstprove that when kN p ≤ min { N t , N r } , the rank of H equals to kN p . We present the proof when k = 2 , while for general cases, the derivations are similar and extensible. With k = 2 , thechannel matrix H in (7) can be stated as H = (cid:88) N p i =1 α i G i ⊗ (cid:0) a ri ( φ ri , θ ri ) a ti ( φ ti , θ ti ) H (cid:1) (34a) = (cid:88) N p i =1 (cid:20) x i x i x i x i (cid:21) ⊗ ( a ri a Hti ) (34b) = (cid:88) N p i =1 (cid:18)(cid:20) (cid:21)(cid:2) x i , x i (cid:3) + (cid:20) (cid:21)(cid:2) x i , x i (cid:3)(cid:19) ⊗ ( a ri a Hti ) (34c) = (cid:88) N p i =1 (cid:26)(cid:18)(cid:20) (cid:21) ⊗ a ri (cid:19)(cid:18)(cid:2) x i , x i (cid:3) ⊗ a Hti (cid:19) + (cid:18)(cid:20) (cid:21) ⊗ a ri (cid:19)(cid:18)(cid:2) x i , x i (cid:3) ⊗ a Hti (cid:19)(cid:27) , (34d)where x mni = α i · G i [ m, n ] in (34b) and φ ri ( ti ) and θ ri ( ti ) are omitted for simplicity. Step (34d)is the result from adopting the property of Kronecker product that ( X X ) ⊗ ( Y Y ) = ( X ⊗ Y )( X ⊗ Y ) . Note that step (34d) can be represented as a matrix form as H = D · B , where D isan N r × N p -dimensional matrix, whose (2 i − th column and the (2 i ) th column are [1 , T ⊗ a ri and [0 , T ⊗ a ri , respectively. The (2 i − th row and the (2 i ) th row of the N p × N t -dimensional B are [ x i , x i ] ⊗ a Hti and [ x i , x i ] ⊗ a Hti , respectively. We deﬁne the set { [ x i , x i ] ⊗ a Hti , [ x i , x i ] ⊗ a Hti | i = 1 , , ..., N p } as B , and { [1 , T ⊗ a ri , [0 , T ⊗ a ri | i = 1 , , ..., N p } as D . Next, we provethat D and B are two linearly independent (LI) sets such that the N r × N p -dimensional D is acolumn-full-rank matrix and the N p × N t -dimensional B is a row-full-rank matrix. Accordingly,the rank of H = D · B is N p .The structure of a r , ..., a rN p is analyzed in (2). We observe that these N p column vectorscan compose a Vandermonde matrix. One property of the Vandermonde matrix is that when N p ≤ N r k , a r , ..., a rN p are LI. Since we have kN p ≤ min { N t , N r } , a r , ..., a rN p are LI, whichcan further results that D is an LI set. Similarly, a t , ..., a tN p are LI. To prove the LI propertyof B , we ﬁrst assume that B is NOT LI which means that (cid:88) N p i =1 (cid:0) w i [ x i , x i ] ⊗ a Hti + γ i [ x i , x i ] ⊗ a Hti (cid:1) = 0 , (35) where w i and γ i denote the coefﬁcients of the linear combination and there is at least onenon-zero w i or γ i . (35) can be further expressed as (cid:88) N p i =1 [ w i x i + γ i x i , w i x i + γ i x i ] ⊗ a Hti = 0 . (36)Due to the LI property of a t , ..., a tN p , (36) is equivalent to [ w i x i + γ i x i , w i x i + γ i x i ] = 0 , ∀ i . Since not all w i and γ i can be set as zero, for the non-zero w i and γ i , there has x i x i − x i x i = 0 . (37)Note that x mni = α i · G i [ m, n ] and G i is the phase matrix under the spherical-wave propagation. G i is full-rank when the subarray spacing is designed to avoid the key-hole effect [13], [15],which is a common practice for widely-spaced systems. Therefore, the determinant of α i G i isnon-zero, i.e., x i x i − x i x i (cid:54) = 0 , which is contradictory with (37). Hence, the assumptionthat B is not LI is invalid, i.e., B is LI. Until now, we have proved that both B and D are LIsets. As a result, the rank of H = D · B is N p as we analyzed before. When N p exceeds min { N t , N r } , owing to the dimension limitation of the N r × N t -dimensional H , the rank of H equals to min { N t , N r } . The proof for a general k is similar and extensible. Therefore, the rankof H equals to min { kN p , N t , N r } , which completes the proof.R EFERENCES [1] L. Yan, C. Han, and J. Yuan, “Joint two-level spatial multiplexing and beamforming in terahertz ultra-massive MIMOsystems,” in

Proc. IEEE INFOCOM workshops. , Apr. 2019, pp. 1–6.[2] T. S. Rappaport et al. , “Wireless communications and applications above 100 GHz: Opportunities and challenges for 6Gand beyond,”

IEEE Access , vol. 7, pp. 78 729–78 757, Jun. 2019.[3] I. F. Akyildiz, C. Han, and S. Nie, “Combating the distance problem in the millimeter wave and terahertz frequency bands,”

IEEE Commun. Mag. , vol. 56, no. 6, pp. 102–108, Jun. 2018.[4] I. F. Akyildiz and J. M. Jornet, “Realizing ultra-massive MIMO (1024 × Nano Communication Networks , vol. 8, pp. 46 – 54, Jun. 2016.[5] H. Sarieddeen, M.-S. Alouini, and T. Y. Al-Naffouri, “Terahertz-band ultra-massive spatial modulation MIMO,”

IEEE J.Sel. Areas Commun. , vol. 37, no. 9, pp. 2040–2052, Sep. 2019.[6] O. E. Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, “Spatially sparse precoding in millimeter wave MIMOsystems,”

IEEE Trans. Wireless Commun. , vol. 13, no. 3, pp. 1499–1513, Mar. 2014.[7] X. Yu, J. Shen, J. Zhang, and K. B. Letaief, “Alternating minimization algorithms for hybrid precoding in millimeter waveMIMO systems,”

IEEE J. Sel. Topics Signal Process. , vol. 10, no. 3, pp. 485–500, Apr. 2016.[8] L. Yan, C. Han, and J. Yuan, “A dynamic array-of-subarrays architecture and hybrid precoding algorithms for terahertzwireless communications,”

IEEE J. Sel. Areas Commun. , vol. 38, no. 9, pp. 2041–2056, Sep. 2020.[9] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming design for large-scale antenna arrays,”

IEEE J. Sel. TopicsSignal Process. , vol. 10, no. 3, pp. 501–513, Apr. 2016. [10] X. Song, W. Rave, N. Babu, S. Majhi, and G. Fettweis, “Two-level spatial multiplexing using hybrid beamforming formillimeter-wave backhaul,” IEEE Trans. Wireless Commun. , vol. 17, no. 7, pp. 4830–4844, Jul. 2018.[11] C. Han and Y. Chen, “Propagation modeling for wireless communications in the terahertz band,”

IEEE Commun. Mag. ,vol. 56, no. 6, pp. 96–101, Jun. 2018.[12] C. Lin and G. Y. Li, “Terahertz communications: An array-of-subarrays solution,”

IEEE Commun. Mag. , vol. 54, no. 12,pp. 124–131, Dec. 2016.[13] X. Song, D. Cvetkovski, T. H¨alsig, W. Rave, G. Fettweis, E. Grass, and B. Lankl, “High throughput line-of-sight MIMOsystems for next generation backhaul applications,”

Journal of RF-Engineering and Telecommunication (Frqunz) , vol. 71,pp. 389–398, Aug. 2017.[14] L. Zhu and J. Zhu, “Optimal design of uniform circular antenna array in mmwave LOS MIMO channel,”

IEEE Access ,vol. 6, pp. 61 022–61 029, Sep. 2018.[15] F. Bohagen, P. Orten, and G. E. Oien, “Design of optimal high-rank line-of-sight MIMO channels,”

IEEE Trans. WirelessCommun. , vol. 6, no. 4, pp. 1420–1425, Apr. 2007.[16] J. Chen, S. Wang, and X. Yin, “A spherical-wavefront-based scatterer localization algorithm using large-scale antennaarrays,”

IEEE Commun. Lett. , vol. 20, no. 9, pp. 1796–1799, Sep. 2016.[17] C. Lin, G. Y. Li, and L. Wang, “Subarray-based coordinated beamforming training for mmwave and sub-THz communi-cations,”

IEEE J. Sel. Areas Commun. , vol. 35, no. 9, pp. 2115–2126, Sep. 2017.[18] C. Han, A. O. Bicen, and I. F. Akyildiz, “Multi-ray channel modeling and wideband characterization for wirelesscommunications in the terahertz band,”

IEEE Trans. Wireless Commun. , vol. 14, no. 5, pp. 2402–2412, May. 2015.[19] S. Nie and I. F. Akyildiz, “Deep kernel learning-based channel estimation in ultra-massive MIMO communications at0.06-10 THz,” in

Proc. IEEE Globecom workshops. , Dec. 2019, pp. 1–6.[20] Y. Ji, W. Fan, and G. F. Pedersen, “Channel characterization for wideband large-scale antenna systems based on a low-complexity maximum likelihood estimator,”

IEEE Trans. Wireless Commun. , vol. 17, no. 9, pp. 6018–6028, Sep. 2018.[21] I. F. Akyildiz, J. M. Jornet, and C. Han, “Terahertz band: Next frontier for wireless communications,”

PhysicalCommunication , vol. 12, pp. 16–32, Sep. 2014.[22] J. Ma, R. Shrestha, L. Moeller, and D. M. Mittleman, “Invited article: Channel performance for indoor and outdoor terahertzwireless links,”

APL Photonics , vol. 3, no. 5, Feb. 2018.[23] Y. Nesterov,

Introductory Lectures on Convex Optimization: A Basic Course . New York, NY, USA: Springer, 2013.[24] L. Zhao, D. W. K. Ng, and J. Yuan, “Multi-user precoding and channel estimation for hybrid millimeter wave systems,”

IEEE J. Sel. Areas Commun. , vol. 35, no. 7, pp. 1576–1590, Jul. 2017.[25] X. Gao, L. Dai, S. Han, C. I, and R. W. Heath, “Energy-efﬁcient hybrid analog and digital precoding for mmwave MIMOsystems with large antenna arrays,”

IEEE J. Sel. Areas Commun. , vol. 34, no. 4, pp. 998–1009, Apr. 2016.[26] H. Yuan, J. An, N. Yang, K. Yang, and T. Q. Duong, “Low complexity hybrid precoding for multiuser millimeter wavesystems over frequency selective channels,”

IEEE Trans. Veh. Technol. , vol. 68, no. 1, pp. 983–987, Jan. 2019.[27] D. Fan, F. Gao, and Y. L. et al , “Angle domain channel estimation in hybrid millimeter wave massive MIMO systems,”

IEEE Trans. Wireless Commun. , vol. 17, no. 12, pp. 8165–8179, Dec. 2018.[28] Y. Lin, S. Jin, and M. M. et al , “Transceiver design with UCD-based hybrid beamforming for millimeter wave massiveMIMO,”

IEEE Trans. Commun. , vol. 67, no. 6, pp. 4047–4061, Jun. 2019.[29] C. Rusu, R. M`endez-Rial, N. Gonz´alez-Prelcic, and R. W. Heath, “Low complexity hybrid precoding strategies formillimeter wave communication systems,”

IEEE Trans. Wireless Commun. , vol. 15, no. 12, pp. 8380–8393, Dec. 2016.[30] K. Katayama and K. T. et al , “A 300 GHz CMOS transmitter with 32-QAM 17.5 Gb/s/ch capability over six channels,”