[PDF] Quasi-Distributed Antenna Selection for Spectral Efficiency Maximization in Subarray Switching XL-MIMO Systems

Abstract

In this paper, we consider the downlink (DL) of a zero-forcing (ZF) precoded extra-large scale massive MIMO (XL-MIMO) system. The base-station (BS) operates with limited number of radio-frequency (RF) transceivers due to high cost, power consumption and interconnection bandwidth associated to the fully digital implementation. The BS, which is implemented with a subarray switching architecture, selects groups of active antennas inside each subarray to transmit the DL signal. This work proposes efficient resource allocation (RA) procedures to perform joint antenna selection (AS) and power allocation (PA) to maximize the DL spectral efficiency (SE) of an XL-MIMO system operating under different loading settings. Two metaheuristic RA procedures based on the genetic algorithm (GA) are assessed and compared in terms of performance, coordination data size and computational complexity. One algorithm is based on a quasi-distributed methodology while the other is based on the conventional centralized processing. Numerical results demonstrate that the quasi-distributed GA-based procedure results in a suitable trade-off between performance, complexity and exchanged coordination data. At the same time, it outperforms the centralized procedures with appropriate system operation settings.

Full PDF

aa r X i v : . [ c s . I T ] F e b Quasi-Distributed Antenna Selection forSpectral Efﬁciency Maximization in SubarraySwitching XL-MIMO Systems

Jo˜ao Henrique Inacio de Souza, Abolfazl Amiri, Tauﬁk Abr˜ao, Elisabeth deCarvalho, and Petar Popovski

Abstract

In this paper, we consider the downlink (DL) of a zero-forcing (ZF) precoded extra-large scalemassive MIMO (XL-MIMO) system. The base-station (BS) operates with limited number of radio-frequency (RF) transceivers due to high cost, power consumption and interconnection bandwidth as-sociated to the fully digital implementation. The BS, which is implemented with a subarray switchingarchitecture, selects groups of active antennas inside each subarray to transmit the DL signal. This workproposes efﬁcient resource allocation (RA) procedures to perform joint antenna selection (AS) and powerallocation (PA) to maximize the DL spectral efﬁciency (SE) of an XL-MIMO system operating underdifferent loading settings. Two metaheuristic RA procedures based on the genetic algorithm (GA) areassessed and compared in terms of performance, coordination data size and computational complexity.One algorithm is based on a quasi-distributed methodology while the other is based on the conventionalcentralized processing. Numerical results demonstrate that the quasi-distributed GA-based procedureresults in a suitable trade-off between performance, complexity and exchanged coordination data. Atthe same time, it outperforms the centralized procedures with appropriate system operation settings.

Index Terms

Extra-large scale massive MIMO (XL-MIMO), antenna selection (AS), resource allocation (RA),genetic algorithm (GA), distributed signal processing.

J. H. I. de Souza and T. Abr˜ao are with the Electrical Engineering Department, State University of Londrina, PR, Brazil.E-mail: [email protected]; tauﬁ[email protected]. Amiri, E. de Carvalho and P. Popovski are with the Department of Electronic Systems, Technical Faculty of IT and Design;Aalborg University, Denmark; E-mail: [email protected]; [email protected]; [email protected].

I. I

NTRODUCTION

The beneﬁts of adopting a high number of antennas at the base-station (BS) have attracted theinterest on the massive MIMO transceiver design for the multi-antenna wireless communicationssystems beyond the ﬁfth generation (B5G) and of the sixth generation (6G). The main advantagesare the large array gain, inter-channel orthogonality and channel hardening. Also, increasing thenumber of antenna elements can enhance the cell coverage, improving the quality-of-service(QoS) of the border-cell users [1].When the BS array attains extreme physical dimensions to support crowded scenario locations,such as airports and large shopping malls, the system is classiﬁed as extra-large scale massiveMIMO (XL-MIMO) [2]. The XL-MIMO array provides the beneﬁts of massive MIMO withadditional beam-forming resolution due to the large array aperture [3]. The XL-MIMO arrayis characterized by key changes in the electromagnetic propagation conditions when comparedto the conventional spatial stationary massive MIMO regime. The ﬁrst property is the sphericalwavefront propagation feature for the received signal due to the distance between the BS andthe users being less than the Rayleigh distance [4]. Second, each cluster of scatterers sees only aportion of the array. Thus, the transmitted signal by each user reaches a small group of antennas,which comprises the visibility region (VR) of this user [2]. Additionally, the different propagationpaths experienced along the array result in variations on the average received power. Results in[5], [6] demonstrate that the spatial non-stationarities produced by these two properties limitthe performance of the system in terms of spectral efﬁciency (SE) unless an appropriated signalprocessing technique is applied.Despite the beneﬁts of high numbers of antennas, the XL-MIMO scenario imposes challengesfor transceiver design. The ﬁrst of them is the high cost and power consumption of fully digitalimplementations, which require one radio-frequency (RF) transceiver per antenna element [7],[8]. In addition, adopting a large number of antennas demands a high interconnection bandwidthto transmit the baseband data throughout the links to the BS processing unit. This turns intoa serious implementation bottleneck, since the required bandwidth can not be handled by thecurrent radio interfaces [9], [10]. Lastly, handling the complexity of signal processing techniquesis a relevant issue, since the number of executed operations in linear detectors, such as zero-forcing (ZF) and minimum mean-squared error (MMSE), scales with the number of antennas[11].

In order to design practical BS architectures, one can limit the number of RF transceivers tocope with the cost constraints. The implementation with a limited the number of RF transceiverscan beneﬁt from the large array by adopting techniques such as antenna selection (AS) and hybridprecoding. Often, hybrid precoding design is associated with the solution of intricate optimizationproblems [12]. In addition, the commonly employed analog phase shifters are more expensiveand consume more power than conventional on-off switches [8]. For these reasons, combiningthe AS procedures with linear precoding designs result in attainable strategies aiming at robustand effective implementations. Different approaches and tools can be adopted to perform AS,such as convex optimization [7], [13], [14], greedy heuristics [7], [15], machine learning [16]and metaheuristics [17]–[20].One strategy to combat the problem of high interconnection bandwidth is to use hierarchicalarchitectures. Adding multiple processing units to handle small groups of antennas and choosingthe right signal processing methods can reduce signiﬁcantly the amount of exchanged infor-mation in the regime of asymptotic number of antennas, as discussed in [9], [10]. However,the coordination of such processing units to perform different signal processing and resourceallocation (RA) tasks constitutes a big challenge. In addition, many of these activities rely on theknowledge of fully reliable channel state information (CSI), which is hard to attain due to thehigh array dimensions. Many works on channel estimation [21], precoding and data detection[9], [10], [22]–[25] in massive and XL-MIMO consider distributed pre-processing at local nodes.However, studies on the distributed RA strategies, mainly involving AS, are scarce.The signal processing complexity is an important concern in XL-MIMO due to the high numberof antenna elements. However, differently from the conventional massive MIMO, the XL-MIMOcan beneﬁt from the spatial non-stationarities adopting local signal processing strategies to treatthe signals inside the VRs at the BS’ sub-arrays with reduced complexity [22], [24].

A. Literature Review

AS strategies for MIMO systems are extensively discussed in the literature. One AS algorithmto improve capacity in low rank matrix channels on point-to-point MIMO was ﬁrst introducedin [26]. Later, the capacity distribution of systems with receive AS has been derived in [27].These results were extended to massive MIMO regime in [28] and [29]. In these papers, theauthors derived capacity bounds for systems with transmit and receive AS, respectively.

The authors in [13], [14] proposed AS procedures respectively for the channel capacity anddownlink (DL) sum-capacity maximization based on the convex optimization framework. Onetechnique based on the branch-and-bound algorithm is used in [8]. Considering linearly-precodedsystems, the problems of AS for SE and sum-SINR maximization are addressed respectivelyin [15], [30]. Differently, the work in [31] analyzed one joint AS and power allocation (PA)procedure in a system with spatially distributed antennas. The proposed procedure runs at eachantenna with side-information shared within its neighborhood. Besides, AS considering limitedconnections in the RF transceivers switching matrices is examined in [7].On the other hand, there are only a few works that consider the AS problem for the XL-MIMOsystems. A spatial users mapping procedure to maximize SE implemented with convolutionalneural networks (CNN) is proposed in [16]. The aim is to determine each effective subarraywindow to precode the users signals using ZF. Results demonstrate that the CNN-based procedureachieves SE values comparable to the optimal mapping algorithm. In [17], several transmit ASprocedures to maximize the energy efﬁciency (EE) from the long-term fading coefﬁcients areproposed. Asymptotic SINR expressions for the received signal with AS are derived. Since thederived optimization problem is NP-hard, three of the proposed procedures are implementedby metaheuristic techniques, one being the genetic algorithm (GA). The GA is a powerfulevolutionary metaheuristic that was used in different contexts to solve AS problems, as it isconsidered in [18]–[20].

B. Contribution

Motivated by the beneﬁts of large numbers of antennas at the BS and the restricted numberof RF transceivers, this work examines the joint AS and PA problem on the DL of a linearly-precoded XL-MIMO system. Differently from other papers adopting AS strategy, a distributedBS signal processing architecture is considered and the AS procedures are characterized interms of the exchanged information between the processing nodes. Furthermore, we extend partof the results of [17] with the proposition of AS algorithms for XL-MIMO that use the short-term fading coefﬁcients instead of the long-term ones. Additionally, we address the problem ofjoint AS and PA in XL-MIMO sub-arrays using a decentralized RA algorithm. The proposedRA algorithm uses the Sherman-Morrison-Woodbury (SMW) formula to perform optimal powerallocation (OPA) and AS in a decentralized fashion.

The BS is constituted by multiple non-overlapping subarrays with dedicated remote processingunits (RPUs), which perform independently channel estimation, precoding calculation and RA,mainly AS and PA. Each subarray is equipped with a ﬁxed number of antenna elements and RFtransceivers. Using the ZF precoding, the optimization goal is to maximize the SE subjected tothe constraints of subarrays connections and maximum transmitted power.The contribution of this work is fourfold. i ) Description of a distributed transceiver design forXL-MIMO based on a subarray switching architecture; ii ) proposition of a centralized procedurebased on the evolutionary heuristic GA to perform joint AS and PA to maximize the SE withsubarray connection and maximum transmitted power constraints; iii ) proposition of a distributedversion of the GA procedure for joint AS and PA which achieves performance tight to thecentralized one but with low-size coordination data and less number of executed operations; iv ) extensive analyses of the proposed procedures in terms of number of symbols for training,coordination data size and number of ﬂoating point operations per second (ﬂops).The numerical results corroborate the GA-based procedures in achieving high performance,speciﬁcally in crowded XL-MIMO applications. Additionally, the decentralized GA versionoffers a good trade-off between performance, number of operations and coordination data size,outperforming the centralized procedures by adopting proper settings.The rest of the paper is organized as follows. In Section II is described the system model,including the distributed subarrays processing at the BS. Next, in Section III are describedthe centralized and distributed GA-based optimization procedures for joint AS and PA in XL-MIMO systems, while Section IV discusses two feasible AS procedures adopted as a result ofdecoupling the joint AS and PA optimization problem. Section V examines the complexity of theproposed algorithms. Extensive numerical results are discussed in Section VI. Final commentsand conclusions are provided in Section VII. C. Notation

Boldface small a and capital A letters represent respectively vectors and matrices. Capitalcalligraphic letters A represent ﬁnite sets, and |A| denotes the cardinality of the set A . I n denotes the identity matrix of size n . {·} T and {·} H denote respectively the transpose andthe conjugate transpose operators. diag ( · ) , tr ( · ) and det ( · ) denote respectively the diagonalmatrix, trace and determinant operators. ⌈·⌉ denotes the greatest integer operator. (cid:0) nk (cid:1) denotes the Figure 1. XL-MIMO system deployed inside a square cell with size L . The BS is a ULA with M antennas divided into B subarrays of M b antennas each one. The K users are randomly distributed at a distance in the range (0 . L, L ) from the array. binomial coefﬁcient. CN ( µ, σ ) is a circularly symmetric complex Gaussian distribution withmean µ and variance σ . E [ · ] denotes the expectation operator.II. S YSTEM M ODEL

Consider the DL of a narrow-band multi-user XL-MIMO system with the BS equipped with M antennas and N RF transceivers serving K single-antenna users, as is depicted in Fig. 1. Duringthe DL, the BS uses η tr symbols to perform channel estimation and η data symbols to transmit thepayload. We assume that the time interval used to send the total DL symbols η DL = η tr + η data is less than the channel coherence time.The array in the BS is composed of B independent subarrays, each with M b antennas and N b < M b RF transceivers. The subarrays are equipped with a RPU to perform, in a distributedway, channel estimation, precoding calculation and RA tasks, specially AS and PA procedures.In addition, the BS has a central processing unit (CPU) to coordinate the subarrays operation.Fig. 2 depicts all the described BS blocks.

Assumption 1 (Subarray switching stage):

A ﬂexible switching stage is implemented in each XLsubarray. This stage allows every antenna of the subarray i to connect to any RF transceiver Figure 2. Diagram of the BS architecture for DL. The BS array is composed by B subarrays containing M b antennas, N b RFtransceivers and one RPU. Additionally, the BS has a CPU for subarrays coordination. of it. Results in [7] demonstrate that partially connected architectures introduce lower insertionloss than fully-ﬂexible matrices, which allows the connection of any antenna in the entire arrayto any RF transceiver.We assume that each subarray has perfect knowledge of the channel coefﬁcients associatedto its antennas. See [21] for details on channel acquisition in distributed signal processingarchitectures. Besides, we deploy the ZF precoder to decode signals in each subarray. We adoptthe technique in [21] to calculate the ZF precoder with low interconnection trafﬁc, splitting thecomputations between the RPUs and the CPU.

A. Channel Model

In the XL-MIMO scenario, spatial non-stationarities arise due to the large array physicaldimensions and number of antenna elements. Such non-stationarities are addressed in the adoptedchannel model as the variation of the mean received power along the array, as in [17], [22]. Thepath-loss coefﬁcient associated to the BS antenna m and the user k is deﬁned as β m,k = q d − κm,k (1) where q is the path-loss attenuation at a reference distance, d m,k is the distance between theantenna m and the user k and κ is the path-loss exponent.Let R k ∈ C M × M , R k = diag ([ β ,k · · · β M,k ] T ) be the matrix with the long-term fadingcoefﬁcients of the user k . The channel vector of the user k is deﬁned as h k = R k h ′ k (2)where h ′ k ∈ C M × , h ′ k ∼ CN ( , I M ) is the short-term fading vector. From the users channelvectors, the channel matrix H ∈ C M × K is deﬁned as H = h h · · · h K i = h h T · · · h TM i T (3)considering h m ∈ C × K as the channel vector with the coefﬁcients associated to the antenna m .During the DL, the BS activates a group of antennas represented by the set S ⊆ { , . . . , M } such that |S| ≤ N . A partition of the set S , i.e. {S b } , ∀ b = 1 , . . . , B , contains the index ofthe selected antennas in the subarray b . This set is deﬁned such that |S b | ≤ N b ∀ b , meeting theadopted subarray structure. The equivalent channel matrix of the active antennas is deﬁned as arow-wise submatrix of H , H S ∈ C |S|× K . Similarly, the matrix H S b ∈ C |S b |× K contains only thechannel vectors related to the active antennas in the subarray b .Let D m ∈ { , } , ∀ m = 1 , . . . , M be an indicator equal to 1 if the antenna m is active duringthe DL and 0 otherwise. These indicators form the diagonal matrix D = diag ([ D · · · D M ] T ) .During the precoding and SE computations, it is required to calculate the matrix product H H S H S of the active antennas channel matrix. Intended to enable this computation by the distributedsignal processing architecture, the Gramian matrix is deﬁned as in the following. Remark 1 (Gramian matrix):

Let G m = h Hm h m , ∀ m = 1 , . . . , M be the Gramian matrixassociated with the BS antenna m . The set M b is deﬁned for b = 1 , . . . , B as the group ofantennas in the subarray b . The Gramian matrix associated to the b -th subarray includes onlythe active antennas inside it, and it can be written as G S b = H H S b H S b = X m ∈M b D m G m (4)Similarly, the array Gramian matrix considering only the active antennas is deﬁned as G S = H H S H S = M X m =1 D m G m (5) An upper bound for the system performance considering the active antennas in the set S ,namely the DL sum-capacity, is calculated by [14]: C DPC = max P log det (cid:18) I K + 1 σ z PH H S H S (cid:19) (6) = max P log det (cid:18) I K + 1 σ z PG S (cid:19) where σ z is the additive noise power, while P = diag ([ p · · · p K ]) denotes the matrix with theallocated power for each user. The powers p k , ∀ k = 1 , . . . , K are deﬁned in order to meet thetotal power constraint P Kk =1 p k = P max . The DL sum-capacity is achieved by the dirty papercoding (DPC) precoder, which has prohibitive high-complexity for practical implementations. B. Downlink Signal

The data signal transmitted by the BS is deﬁned as x ∈ C |S|× , x = FP s (7)where F ∈ C |S|× K denotes the ZF precoding matrix, calculated by F = H S (cid:0) H H S H S (cid:1) − (8) = H S G − S s = [ s · · · s K ] T denotes the vector of modulated data symbols such that E [ k s k k ] = 1 , ∀ k =1 , . . . , K and E [ s ∗ k s k ′ ] = 0 , ∀ k = k ′ . The allocated powers in (7) are calculated in order to meetthe following power constrainttr h P (cid:0) H H S H S (cid:1) − i = tr (cid:0) PG − S (cid:1) = P max (9)Therefore, the entries of P depend on the active antennas set S and the PA policy.The signal received by the users in the DL is deﬁned as y ∈ C K × , y = H H S FP s + z (10) = P s + z where z ∈ C K × , z ∼ CN ( , σ z I K ) denotes the additive noise vector. Given the ZF precoding design, the system SE is calculated bySE = K X k =1 log (cid:18) p k σ z (cid:19) (11)which is equivalent to the SE of K independent Gaussian channels with received signal-to-noiseratio (SNR) equal to p k /σ z ∀ k . C. Optimal Power Allocation (OPA) Policy

The OPA policy is the one that solves the problem of maximizing the system SE at (11),subjected to the maximum power constraint in (9):maximize P SE = K X k =1 log (cid:18) p k σ z (cid:19) (12a)subject to tr (cid:2) P ( H H S H S ) − (cid:3) ≤ P max (12b) p k ≥ , ∀ k = 1 , . . . , K (12c)The optimization problem in (12) is equivalent to the well-known PA problem on independentGaussian channels. It has an analytical closed-form solution derived by the Lagrange multipliersmethod (water ﬁlling solution). The optimal power distribution is calculated by [32]: p k = (cid:16) µ (cid:2) ( H H S H S ) − (cid:3) − k,k − σ z (cid:17) + (13)where ( x ) + = max ( x, and µ is a constant calculated by µ = 1 K (cid:8) P max + σ z tr (cid:2) ( H H S H S ) − (cid:3)(cid:9) (14)If p k = 0 for some user k , the PA problem including this user is not feasible. For this reason,the k -th user is deactivated and the power distribution is recalculated considering only the groupof the remaining active users. This process must be repeated until a group of users which resultsin a feasible solution is found.III. A LGORITHM FOR J OINT A NTENNA S ELECTION AND P OWER A LLOCATION

The problem of jointly selecting the antenna-elements of the BS and allocating appropriatepower amounts to maximizing the ZF SE given the constraints of maximum RF transceivers, subarray connections, and maximum power is formulated asmaximize D , P SE = K X k =1 log (cid:18) p k σ z (cid:19) (15a)subject to X m ∈M b D m ≤ N b , ∀ b ∈ { , . . . , B } (15b)tr (cid:2) P ( H H DH ) − (cid:3) ≤ P max (15c) D m ∈ { , } , ∀ m ∈ { , . . . , M } (15d) p k ≥ , ∀ k ∈ { , . . . , K } (15e)The objective function in (15a) is the system SE. The constraints (15b) are the subarray con-nections constraints, which allow the activation of a maximum of N b RF transceivers in eachsubarray. Also, the constraint (15c) ensures that the maximum transmitted power is equal to orless than P max . Moreover, the constraints (15d) and (15e) deﬁne respectively the binary antennaassociation variables and non-negative allocated powers.Since D is binary constrained, the problem (15) constitutes a non-convex combinatorialoptimization problem. One approach to solve (15) comprises two steps: ﬁrstly, determining theoptimal active antennas set via exhaustive search assuming equal PA; after that, given the result D ⋆ from the exhaustive search, the allocated power matrix P ⋆ is calculated adopting the OPApolicy in (13).The AS via exhaustive search considering the activation of all the RF transceivers requirestesting (cid:0) M b N b (cid:1) B candidate solutions, a number that attains prohibitive dimensions in the XL-MIMOregime. For instance, in a system with B = 8 subarrays equipped with M b = 64 antennas and N b = 32 RF transceivers, there is a number of feasible solutions on the order of . Testingall these solution candidates in a timely manner is impracticable. An efﬁcient alternative tothe exhaustive search is to perform a guided search along the feasible set using an intelligentmetaheuristic procedure. In this way, a good quality solution can be obtained in feasible timetesting only a few candidates. A. Genetic Algorithm

One metaheuristic procedure adopted to solve many different combinatorial problems inwireless communications is the GA. This technique implements different search phases toefﬁciently explore the feasible set and exploit the good candidates properties in order to ﬁnd Table IG

LOSSARY OF THE GENETIC ALGORITHM TERMS

Parameter Description

Individual Candidate solution for the optimization problemPopulation Set of candidate solutions for the optimization problemOffspring Set of candidate solutions generated during an iterationGene One optimization variable of the candidate solutionChromosome Set of optimization variables of the candidate solutionGeneration Genetic algorithm iterationFitness Objective function of the optimization problemScore Value of the objective function for a candidate solutionpromising regions in the feasible sub-spaces. Differently from exact optimization methods,evolutionary metaheuristics do not require convex objective functions or constraints. In addition,the execution complexity can be ﬁtted to the available computational burden by adjusting theinput parameters and number of iterations. Despite the advantages, the GA, as well as othermetaheuristics, does not ensure ﬁnding the optimal solution.As the GA is a procedure inspired by principles of genetics and natural selection, it inheritedseveral terms from biology. To simplify understanding, Table I contains a glossary of some com-mon GA terms adopted throughout this work. In the following, the implemented GA procedures,phases and variables deployed to solve the problem (15) are brieﬂy described.

Optimization variables encoding:

The optimization variables of the problem (15) are theantennas state indicators D m and the users allocated powers p k . The powers p k are determinedby the OPA, eq. (13). Therefore, only the antennas indicators should be encoded as individuals.Thus, D m s are deﬁned as genes and the column vectors [ d i,b ] m = D m , ∀ m ∈ M b , b = 1 , . . . , B containing the optimization variables w.r.t. each subarray represent the chromosomes, where i is the individual index. Every individual is deﬁned by a vector d i ∈ { , } M × , d i = h d Ti, · · · d Ti,B i T = h D · · · D M i T (16) Fitness function:

The ﬁtness function considered for the implementation is the ZF SE deﬁnedin (11), with the power distribution computed by the OPA policy.The implemented GA contains the following phases: a) elitism, b) tournament selection, c) crossover and d) mutation. These phases require the deﬁnition of the parameters: population size N p , number of individuals for elitism N e , number of tournaments N s , crossover probability p c and mutation probability p m . Each procedure is summarized in the sequel. Elitism:

The elitism aims to keep the best individuals of the current generation without change.At every generation, the N e best individuals are chosen as the ﬁrst individuals of the nextgeneration. Elitism ensures that the SE obtained with the best AS indices of the GA iteration isalways a non-decreasing value. Tournament selection:

During the tournament selection, the individuals are pairwise randomlycompared according to their score values. The winners of the N s tournaments become candidatesfor the crossover phase. The selection step compares the sets of AS indices produced at eachGA iteration according to the SE achieved by them. Crossover:

The crossover phase aims to mix the chromosomes of the tournaments winners inorder to obtain new solutions. This phase exploits the good properties of the current set of ASindices. Two tournament winners, named parent 1 and parent 2, are randomly selected to generatetwo new individuals. Each chromosome of child 1 has the probability p c of being inherited fromparent 1 and − p c from parent 2. Considering child 2, every chromosome has the probability p c of being inherited from parent 2 and − p c from parent 1. Mutation:

The mutation phase aims to add random small changes at the offspring generated bycrossover. This phase promotes the variability among the set of AS indices, exploring differentregions of the feasible set. The chromosomes are mutated with probability p m , when one randomselected gene of the chromosome is ﬂipped. To preserve the solutions’ feasibility, the mutationphase is implemented by the scheme of Algorithm 1. The set P c denotes the offspring generatedduring the crossover, and P m is the offspring after mutation. Convergence:

There are several mechanisms to check the GA convergence. Herein, the imple-mented algorithm has two different criteria: the maximum number of generations T max and theno improvement of the best score during the last T stall generations.Algorithm 2 summarizes the implemented procedure, named genetic algorithm for resourceallocation (GA-RA). The set P denotes the initial population, P t the population of the generation t , P s the winners of the tournament selection and P temp a temporary set for the elitism phase. B. Quasi-Distributed Genetic Algorithm

The proposed GA-RA procedure requires the entire channel matrix H knowledge at the CPU tocompute the individuals score values. Such requirement is unfeasible in the XL-MIMO scenario Algorithm 1:

Mutation procedure

Input:

Crossover offspring P c , p m , B, M b , N b Output:

Mutated offspring P m P m ← ∅ ; for d i ∈ P c do for b = 1 : B do if rand uniform (0 , ≤ p m then k ← rand discrete uniform (1 , M b ) ; if [ d i,b ] m == 0 and P M b j =1 [ d i,b ] j == N b then Go to line 5; [ d i,b ] m ← ﬂip ([ d i,b ] m ) ; P m ← P m ∪ d i ; due to the high bandwidth to transfer all the channel coefﬁcients associated to thousands ofantennas to the CPU. For this reason, one solution that does not depend on the knowledge offull CSI at the CPU is preferable.One solution to avoid the requirement of full knowledge of the H matrix consists of per-forming local AS at each subarray, considering ﬁxed the AS indices in the other subarrays. Thecontribution of these ﬁxed AS indices can be calculated previously by the CPU and transmittedto the RPUs with reduced bandwidth and processing power resources. Therefore, each subarraycan selects its antennas using the GA. The proposed quasi-distributed genetic algorithm forresource allocation (DGA-RA) implements this concept and is presented in the following.Analyzing the ﬁtness function of the GA-RA procedure in (11), one can observe that it dependson the inverse of the array Gramian matrix, G − S = ( H H S H S ) − . The computation of G − S canbe done from the subarrays Gramian matrices by G − S = B X b =1 G S b ! − (17)Therefore, the CPU can compute the inverse of the array Gramian matrix to calculate the GA-RA ﬁtness function only with the subarrays Gramian matrices calculated locally at the RPUs.Each subarray Gramian matrix has K entries, while the channel matrix has M K . Therefore,calulating the contribution of the selected antennas at the CPU using the Gramian matrix strategyrequires less bandwidth than by using the centralized strategy if BK < M K holds.Based on (17), the DGA-RA procedure operates as follows. Initially, each subarray selectsan active antennas set based on a simple criterion, such as the norm-based antenna selection Algorithm 2:

GA-RA

Input: N p , N e , N s , p c , p m , T stall , B, M b , N b , H Output:

The best selected antennas set, D ⋆ P ← ∅ ; P ← P ∪ N-AS ( H ) (Section IV-B) ; for i = 1 : N p − do P ← P ∪ rand individual () ; for t = 0 : T max do P t +1 , P s , P c ← ∅ ; P temp ← P t ; for i = 1 : N e do Elitism d e ← argmax d j score ( d j ) , d j ∈ P temp ; P t +1 ← P t +1 ∪ d e ; P temp ← P temp \ d e ; for i = 1 : N s do Tournament selection d s , d s ← rand ( P t ) ; d s ← argmax d j [ score ( d s ) , score ( d s )] ; P s ← P s ∪ d s ; for i = 1 : N e do Crossover d c , d c ← rand ( P s ) ; d o , d o ← M for j = 1 : B do if rand uniform (0 , ≤ p c then d o ,j ← d c ,j ; d o ,j ← d c ,j ; else d o ,j ← d c ,j ; d o ,j ← d c ,j ; P c ← P c ∪ d o ∪ d o ; P m ← mutation ( P c ) (Algorithm 1) ; P t +1 ← P t +1 ∪ P m ; d ⋆t +1 ← argmax d i score ( d i ) , d i ∈ P t +1 ; if t > T stall then Stall convergence criterion d stall ← argmax d i score ( d i ) , d i ∈ P t − T stall ; if score ( d ⋆t +1 ) == score ( d stall ) then Break the loop; D ⋆ ← diag ( d ⋆t +1 ) ; return D ⋆ ; (N-AS) described in the subsection IV-B. Then, the subarrays compute their Gramian matricesbased on the selected set and transmit them to the CPU. At the CPU, the array Gramian matrixis computed by (17) and transmitted back to the subarrays. Afterwards, every subarray performslocal antenna selection by a GA implementation, considering that the other subarrays are ﬁxed. To evaluate the ﬁtness function in eq. (11), the subarrays compute the array Gramian inversematrix adopting the SMW formula for matrix inversion, as follows.

Remark 2 (SMW formula):

The SMW formula [33] gives the inverse of the matrix ( A + UV H ) from A − , U and V by computing: ( A + UV H ) − = A − − A − U (cid:0) I + V H A − U (cid:1) − V H A − (18)Adopting this formulation, the array Gramian matrix can be calculated at the subarray b duringthe iteration n by letting A − = (cid:16) G ( n − S (cid:17) − , (19) U = h − (cid:16) H ( n − S b (cid:17) H (cid:16) H ( n ) S b (cid:17) H i , (20) V H =  H ( n − S b H ( n ) S b  , (21)where the superscript ( n ) denotes the variable during the n -th iteration of the DGA-RA procedure(proof in Appendix A).After performing local AS, each subarray transmits their achieved SE values to the CPU. TheCPU updates the AS indices of the subarray that has achieved the maximum SE values at theiteration n . Then, the CPU requests the subarray Gramian matrix of the updated subarray, andrecalculates the inverse of the array Gramian matrix, ( G ( n ) S ) − . The process can be executediteratively following the scheme depicted in Fig. 3. Figure 3. Proposed DGA-RA procedure steps with coordination between the CPU and the RPUs. The superscript ( n ) denotesthe n -th iteration. The GA implemented in the DGA-RA procedure is similar to that one described in theAlgorithm 2, except for some details at the optimization variables encoding and the crossoverphase. About the individual encoding, the optimization variables at each subarray are reducedfrom M to M b , since local AS is performed at each RPU. In addition, as the optimizationvariables consider only one subarray at each RPU, the individuals have two chromosomes: onerepresented by the ﬁrst M b / genes, and another composed by the remaining genes.Due to this new chromosome deﬁnition, one further procedure after the crossover phase isrequired to preserve the solution’s feasibility. The chosen method is to deactivate antennas ofindividuals with more than N b antennas in a random fashion until they become feasible.IV. A NTENNA S ELECTION P ROCEDURES

Two techniques to perform antenna selection are presented in the sequel, the DL sum-capacitymaximization antenna selection (SCMAX-AS) and the N-AS method, proposed respectively in[14], [7]. The goal of solving only the antenna selection problem is to decouple the two RAproblems associated to (15) aiming at obtaining tractable formulations.

A. Antenna Selection for DL Sum-Capacity Maximization

Firstly, we analize equal power allocation (EPA) strategy, i.e. P = P max K I K , intended to obtaina manageable optimization problem. The problem of selecting the set of active antennas in orderto maximize the DL sum-capacity with the constraints of maximum number of RF transceiversand subarray connections is formulated as [14]:maximize D C EPA = log det (cid:18) I K + P max Kσ z H H DH (cid:19) (22a)subject to X m ∈M b D m ≤ N b , ∀ b ∈ { , . . . , B } (22b) D m ∈ { , } , ∀ m ∈ { , . . . , M } (22c)Despite the concavity of the objective function in (22a) [13], the problem (22) is not convexdue to the binary constraint in (22c). Hence, we deﬁne a convex relaxation of (22) by takingthe variables D m in the range (0 , . This new problem, which can be solved with convexoptimization tools, has the constraint (22c) replaced by ≤ D m ≤ , ∀ m ∈ { , . . . , M } (23) Notice that the solution of the convex relaxation results in non-binary values for the activeantenna indicators D m , which is outside the original problem domain.One method for performing the antenna selection by solving the convex relaxation is to activatethe N b antennas with the highest D m values at each subarray. This procedure is named in thiswork as SCMAX-AS, and is followed by the OPA policy in eq. (13). This AS procedure givesnear-optimal results, except for N ≪ M [14]. Therefore, in a XL-MIMO system where thenumber of available RF transceivers is much less than the array antennas, the achieved systemSE with the SCMAX-AS algorithm will be sub-optimal. B. Norm-Based Antenna Selection (N-AS)

The N-AS procedure focus on selecting the subset of N b antennas with the highest channelvector norm values [7]. We adopt this method to initiate the population of the GA-basedprocedures due to its low computational cost. The N-AS method solves the optimization problemformulated as maximize D Π = M X m =1 D m k h m k (24a)subject to X m ∈M b D m ≤ N b , ∀ b ∈ { , . . . , B } (24b) D m ∈ { , } , ∀ m ∈ { , . . . , M } (24c)where the objective function consists of the sum of the squared norms of the channel vectorsassociated to the selected antennas.The problem (24) can be solved quickly by selecting the N b antennas with the highest channelvector norms at each subarray. After selection, the PA is performed by the OPA policy in (13).V. C OMPLEXITY A NALYSIS

The complexity of the presented procedures is evaluated in terms of the number of symbolsrequired for channel acquisition, the size of the coordination data exchanged between the RPUsand the CPU, and the number of ﬂops during execution.

A. Training

In the following, we analyze the procedures in terms of training symbols for CSI acquisition.The length of the mutually orthogonal pilot signals used to estimate the channel vectors at the BS Table IIC

OORDINATION DATA EXCHANGED BETWEEN THE

RPU

S AND THE

CPU

Procedure Implementation Data type Data size

GA-RA Centralized Channel matrix

M K

SCMAX-AS [14] Centralized Channel matrix

M K

N-AS [7] Totally distributed – –DGA-RA Quasi-distributed Gramian matrix ( B + N it ) K depends on: a ) the number of users; b ) the number of available RF transceivers; c ) the numberof antennas at the BS.The number of symbols to acquire the entire channel matrix, required in all the presentedprocedures except in the N-AS, is K (cid:6) MN (cid:7) . Particularly, the N-AS algorithm requires onlythe knowledge of the channel vector norms for selection. For this reason, the N-AS can beimplemented without explicit channel estimation, supported by physical power-meters [21]. Withthis implementation, the N-AS requires a total of K symbols to operate. From this total, K symbols are required to estimate the norms of the channel vectors, and the remaining K symbolsare used to estimate the channel vectors associated to the selected antennas. B. Coordination Data Size

The coordination data is deﬁned as the data originated at the RPUs that is required at theCPU during the RA procedures. Determining the coordination data size is crucial since it cangrow tremendously in the XL-MIMO scenario. In practical implementations, techniques as datacompression helps alleviating the high interconnection bandwidth associated to the coordinationdata. However, such kind of consideration and optimization are out of the scope of this work.Table II contains the coordination data size associated to the considered RA procedures,detailing the type of required data in each one. The GA-RA and SCMAX-AS procedures requirethe entire channel matrix at the CPU, while the DGA-RA one relies on the subarrays Gramianmatrices. On the other hand, the N-AS procedure does not require any CSI knowledge at the CPUfor antenna selection purpose, being the most appealing technique in terms of the coordinationdata size. C. Number of Flops

The third complexity metric is the number of ﬂops executed by each procedure. The complexityanalyses for the N-AS and the GA-based AS algorithms are as follows. The SCMAX-ASprocedure is not considered due to the high complexity associated with computing the numberof executed operations by the convex optimization solver.

N-AS:

The operations executed at each subarray on the N-AS procedure consists of calculatingthe channel vectors’ norms then sorting the obtained values to get the N b largest ones. Assumingthat the sorting operation has the complexity of the order M b log( M b ) , the per-subarray ﬂops forN-AS is C N - AS = M b (2 K −

1) + M b log( M b ) (25) GA-RA:

The complexity of the GA-RA method is dominated by the number of operationsrequired for the evaluation of the GA ﬁtness function, eq. (11). At the ﬁrst iteration, the algorithmevaluate the ﬁtness function for N p individuals. During the remaining iterations, ( T − N p − N e ) ﬁtness function evaluations are done, where T denotes the total number of generations.As the OPA policy involves simple computations, the complexity of the ﬁtness function isreduced to the inversion of the array Gramian matrix. The ﬂops to compute the array Gramianmatrix inverse is derived in Appendix B. From this result, the total ﬂops for the GA-RA algorithmis C GA - RA = [ T ( N p − N e ) + N e ] (cid:18) K + 2 N K − K (cid:19) (26) DGA-RA:

For the DGA-RA procedure, a similar approach to the one used for GA-RA can SE ( bp s / H z ) (a) Varying N p SE ( bp s / H z )

225 500250 0.011000 (b) Varying p c SE ( bp s / H z ) Best scoreAverage score (c) Varying p m Figure 4. Convergence of the GA-RA with the number of generations t varying the GA input parameters N p , p c and p m . The”best” and ”average” SE surfaces are obtained over 20 realizations. In each plot, the values of the remaining input parametersare given in Table IV. Table IIIS

IMULATION PARAMETERS

Parameter Value

Cell size L = 30 m K ∈ [1 , Maximum transmitted power P max = 230 µ WPath-loss at the reference distance q = − . dBPath-loss exponent κ = 3 Noise power σ z = − dBm Uniform Linear Array (ULA) Setup M ∈ [32 , N ∈ [64 , B = { , , } M b = M/B N b = N/B be followed. Despite that, the inverse of the array Gramian matrix is computed by the SMWformula, which is implemented with a different number of ﬂops. The number of ﬂops to obtainthe inverse of the array Gramian matrix in the DGA-RA procedure is derived in Appendix C.Taking into account these differences and the fact that the DGA-RA procedure runs over N it iterations, the total number of ﬂops is given by: C DGA - RA = N it [ T ( N p − N e ) + N e ] × (27) × (cid:20) N b + 2 K + N b (4 K −

1) ++ K (4 N b −

2) + N b (1 − K ) + K (cid:21) VI. N

UMERICAL R ESULTS

The numerical evaluations of the proposed methods as well as the benchmark techniques arepresented in this section. The simulation system parameters are given in Table III. The users arerandomly located inside a square cell of size L , and the BS is equipped with a uniform lineararray (ULA) positioned on one side of the cell, as depicted in Fig. 1. Additionally, the usersare random uniformly located at a distance in the range (0 . L, L ) from the array. Although theresults in the following are obtained for the ULA, they can be easily extended to other arrayform factors, such as the uniform planar one. Table IVG

ENETIC ALGORITHM PARAMETERS

Symbol Description Parameter valueGA-RA DGA-RA N p Population size 80 80 N e Elitism individuals 8 8 N s Tournaments 36 36 p c Crossover probability 0.33 0.35 p m Mutation probability 0.13 0.36 T max Maximum generations T stall Stall generations 300 30Before comparing the proposed techniques, it is necessary to tune the GA-RA and DGA-RAGA input parameters in order to obtain a suitable performance-complexity tradeoff. The inputparameter N p , p c and p m values are selected using the iterated local search algorithm [34].The number of individuals for elitism is equal to 10% of the population size, and the number oftournaments is deﬁned in order to ﬁll the population after the elitism phase. Additionally, the stallconvergence criterion parameter is approximately % of the maximum number of generations.The selected parameters for the GA-based procedures are listed in Table IV. Notice that theDGA-RA procedure is set to run 10 times less generations than the GA-RA, since the numberof optimization variables decrease from M at the GA-RA to M b in the DGA-RA procedure.In Fig. 4, the quality of convergence of the GA-RA procedure is corroborated varying theparameters N p , p c and p m independently. Each surface is computed by averaging the achievedscores over 20 realizations. These results on the best and average SE scores among the generations t conﬁrm the parameters’ values adopted in Table IV, while demonstrating a relative low tuningsensibility of the GA-RA convergence to the three input parameters.Fig. 5 depicts the system SE achieved by the proposed RA procedures versus the numberof available RF transceivers. In addition to the proposed solutions, the SE attained by randomAS scheme and using all the M antennas are plotted as the lower and upper performancebounds, respectively. The results consider M = 512 , B = 8 , K = 50 and N it ∈ { , } for theDGA-RA procedure. Observing the Fig. 5, one realize that the GA-based procedures achievebetter SE results than the other ones. In the sequence, there are respectively the SCMAX-ASand N-AS. As expected, all the performance curves are upper and lower bounded by the SEachieved using full-array ZF and random AS, respectively. The SE gap between the procedures

64 80 96 112 128 144 160 176 192 208 224 240 256

Number of RF transceivers SE ( bp s / H z ) GA-RADGA-RA ( N it = 16 ) DGA-RA ( N it = 5 ) SCMAX-AS [14]N-AS [7]Full-array ZFRandom AS

64 72 80 88 96200220240260280300

Figure 5. Comparison of SE vs the number of available RF transceivers. M = 512 , B = 8 , K = 50 and, for the DGA-RAprocedure N it ∈ { , } . decreases as the number of RF transceivers increases. Analyzing the GA-based procedures, theDGA-RA achieves SE values tight to the GA-RA running with only ﬁve iterations. However,setting N it = 16 makes the DGA-RA system SE values outperform marginally the ones obtainedby the GA-RA procedure. Therefore, the quasi-distributed procedure can achieve a performancecomparable, or even better, to the fully centralized approach by adopting a sufﬁcient number ofiterations.In the following, Fig. 6 depicts the system SE achieved by the proposed RA proceduresversus the number of users. These numerical results consider M = 512 , B = 8 , N = 256 and N it ∈ { , } for the DGA-RA procedure. For better understanding, let L = K/N bethe system effective loading factor. For all the proposed procedures, ﬁrstly the SE increaseswith K , assuming a decreasing behavior after a peak. This is due to the reduction of spatialdegrees of freedom increasing the system loading factor, typically observed in linearly precodedsystems [35]. Comparing the procedures, all of them get comparable SE values for a low loadingfactor. However, for high loading factor values, typically L = 0 . , the GA-RA and DGA-RAprocedures get substantial better results. Again, the DGA-RA outperforms the GA-RA in termsof SE by setting N it = 16 . Combining the results in Figs. 5 and 6, we conclude that the GA-based procedures perform with higher SE gains over the other available AS schemes [7], [14]

25 49 73 97 121 145 169 193 217

Number of users SE ( bp s / H z ) GA-RADGA-RA ( N it = 16 ) DGA-RA ( N it = 5 ) SCMAX-AS [14]N-AS [7]Full-array ZRandom AS

121 137 153 169 185 201 217700730760790820850

Figure 6. Comparison of SE vs the number of users. M = 512 , B = 8 , N = 256 and, for the DGA-RA procedure N it ∈ { , } . in crowded XL-MIMO scenarios, i.e. , when the loading factor is high, L > . . A. Complexity Analysis

The numerical results in the following cover the computational complexity of the proposedprocedures. In Fig. 7(a) the coordination data size of the centralized procedures (GA-RA andSCMAX-AS) and the DGA-RA one versus the number of users is illustrated. The curves areevaluated by the expressions in Table II. The result considers M ∈ { , } and,for theDGA-RA procedure, N it = 16 and B ∈ { , , } . Comparing the RA approaches when thenumber of users is low, the quasi-distributed one get lower coordination data sizes than thecentralized procedures. For higher numbers of users, the coordination data size associated toDGA-RA acquires larger values than the obtained by the centralized procedures. This point ofinversion of behavior depends on the numbers of antennas, subarrays and iterations w.r.t. theDGA-RA procedure. It is worth mentioning that the coordination data size grows quadraticallywith K for the DGA-RA procedure, while it grows linearly with K for the centralized RAprocedure.Fig. 7(b) depicts the coordination data size of the centralized procedures and the DGA-RAone versus the number of antennas in the BS. The results consider K = 50 and, for the DGA-RAmethod, N it ∈ { , } and B ∈ { , , } . The coordination data size grows linearly with M in Number of users C oo r d i na t i on da t a s i z e CentralizedDGA-RA (

B = 2 )DGA-RA (

B = 4 )DGA-RA (

B = 8 ) M = 2048 M = 512

32 536 1040 1544 2048

Number of antennas N it = 16 N it = 5 (a) (b)Figure 7. Coordination data size of the GA-based RA schemes vs the number of (a) users and (b) antennas. When it is notspeciﬁed, N it = 16 and K = 50 . the centralized procedures, while for the DGA-RA procedure, it does not depend on M . In fact,this is the primary aim for choosing a distributed RA technique in XL-MIMO, in which the BSis equipped with an asymptotically high number of antennas.The next results are related to the complexity in terms of ﬂops. Fig. 8(a) illustrates the numberof ﬂops per processing unit of the GA-based procedures versus the number of available RFtransceivers. The curves are evaluated by the eqs. (26) and (27). Such results consider K =50 and, for the DGA-RA procedure, B = 8 and N it ∈ { , , } . For low numbers of RFtransceivers, the ﬂops’ values for the DGA-RA procedure are lower than the GA-RA algorithm.Again, after a point of inversion of behavior, the ﬂops’ values for GA-RA get lower than theones for the quasi-distributed procedure. This point of changing of behavior decreases as N it increases.The curves with the number of ﬂops per processing unit of the GA-based procedures versusthe number of users are depicted in Fig. 8(b). This result considers N = 256 and, for the DGA-RA procedure, B = 8 and N it = { , , } . For low numbers of users, the ﬂops’ values of the GA-RA procedure are lower than the ones get for the DGA-RA. However, this behavior invertsquickly, and the gap between the ﬂops’ values for both centralized and distributed proceduresbecomes constant. This constant behavior for large K is due to the fact that both eqs. (26) and(27) grow asymptotically with K . VII. C ONCLUSIONS

This works proposes a subarray switching architecture for the BS antenna array, while exam-ining the problem of joint AS and PA optimization aiming at maximizing the SE of XL-MIMOsystems with limited number of RF transceivers. Two GA-based near-optimal and low-complexityprocedures are proposed. One is the centralized GA-RA, designed to operate with the entirechannel matrix available at the CPU. The other is the quasi-distributed DGA-RA, based on thesubarrays Gramian matrices. Both evolutionary metaheuristic optimization methods are analysedin terms of achieved SE, coordination data size and ﬂops, and compared with benchmarks,including two procedures from the literature, the SCMAX-AS and the N-AS followed by optimal

64 560 1056 1552 2048

Number of RF transceivers F l op s pe r p r o c e ss i ng un i t ( G f l op s ) Number of users -1 GA-RADGA-RA ( N it = 1 ) DGA-RA ( N it = 5 ) DGA-RA ( N it = 16 ) (a) (b)Figure 8. Flops per processing unit of the proposed GA-based procedures versus the number of (a) available RF transceiversand (b) users. B = 8 and, when it is not speciﬁed, K = 50 and N = 256 . PA. Numerical results corroborate that the GA-based AS and PA procedures achieve high SEgains compared to the selected benchmarks, particularly in crowded XL-MIMO scenarios, i.e. ,when the effective loading factor L > . . At the same time, the distributed DGA-RA methodcan outperform the other procedures with low-size coordination data and low computationalcomplexity by taking the appropriate system operation settings.A CKNOWLEDGMENT

This work was supported in part by the Coordenac¸ ˜ao de Aperfeic¸oamento de Pessoal de N´ıvelSuperior - Brazil (CAPES) - Finance Code 001 (scholarship), in part by the Ministry of Science,Technology and Innovation (MCTIC), by the National Council for Scientiﬁc and TechnologicalDevelopment (CNPq) of Brazil under Grant 310681/2019-7, and in part by State University ofLondrina – Paran´a State Government (UEL) and in part by the Danish Council for IndependentResearch DFF-701700271. A

PPENDIX AL OCAL C OMPUTATION OF THE I NVERSE OF THE A RRAY G RAMIAN M ATRIX VIA THE S HERMAN -M ORRISON -W OODBURY F ORMULA

To compute the array Gramian matrix at the subarray b , the RPU must follow these two steps.Firstly, remove the contribution of the selected antennas at the subarray b at the iteration n − .Then, add the contribution of the selected antennas at the iteration n . Therefore, it needs tocompute the inverse of the array Gramian matrix by the expression (cid:16) G ( n ) S (cid:17) − = (cid:16) G ( n − S − G ( n − S b + G ( n ) S b (cid:17) − (28)which evaluation would be straightforward if all the terms were available at the subarray.However, the subarray needs to compute ( G ( n ) S ) − knowing only ( G ( n − S ) − and the localchannel vectors, i.e. h m ∀ m ∈ M b for the subarray b . Writing the subarray Gramian matricesof (28) in terms of the local channel matrices results in − G ( n − S b + G ( n ) S b = − (cid:16) H ( n − S b (cid:17) H H ( n − S b + (cid:16) H ( n ) S b (cid:17) H H ( n ) S b = h − (cid:16) H ( n − S b (cid:17) H (cid:16) H ( n ) S b (cid:17) H i  H ( n − S b H ( n ) S b  (29) From (28) and (29), it is possible to deﬁne the SMW formula variables, A − , U and V H , interms of the available information at the subarray as the eqs. (19), (20) and (21), respectively.A PPENDIX BF LOPS TO C OMPUTE THE I NVERSE OF THE A RRAY G RAMIAN M ATRIX VIA THE C HOLESKY D ECOMPOSITION

Initially, the computation of the array Gramian matrix is done by solving the product in (5),which costs K N − K ﬂops [33]. Afterwards, deﬁne the Cholesky decomposition of the arrayGramian matrix as G S = LL H (30)where L is a lower triangular matrix. The computation of L can be done with K / ﬂops [33].Then, each column of the inverse of the Gramian matrix can be computed solving the set oflinear systems below by backforward substitution, LL H x = e i , ∀ i = 1 , . . . , K (31)where e i denotes the canonical basis vector, i.e. a row vector with all entries equal to 0, exceptthe entry i which is equal to 1. Each linear system can be solved with K ﬂops [33], totaling K ﬂops for all the columns of G − S . Therefore, the total ﬂops for the array Gramian matrixcomputation and inversion is equal to C Chol. = 73 K + 2 N K − K (32)A PPENDIX CF LOPS TO C OMPUTE THE I NVERSE OF THE A RRAY G RAMIAN M ATRIX VIA THE S HERMAN -M ORRISON -W OODBURY F ORMULA

To count the ﬂops to compute the matrix inversion by the SMW formula, the eq. (18) isdecomposed in six parts. The computations involved in each part and their respective ﬂops areorganized in Table V. The ﬂops in Table V are counted assuming that the contribution of theselected antennas during the previous iteration is removed. Such assumption is reasonable sincethe expression in (28) can be done sequentially, by keeping only the terms − G ( n − S b or G ( n ) S b ata time. All the parts include only simple matrix multiplications and sums, except for the part Q .This part can be efﬁciently computed by the Cholesky decomposition approach followed by thebackforward substitution procedure described in Appendix B. Therefore, the total ﬂops requiredto compute the inverse of the array Gramian matrix via the SMW formula is equal to C SMW = 73 N b + 2 K + N b (4 K − (33) + K (4 N b −

2) + N b (1 − K ) + K Table VF

LOPS INVOLVED ON THE S HERMAN -M ORRISON -W OODBURY FORMULA COMPUTATION

Symbol Expression Number of ﬂops Q V H A − N b K − N b K Q I + Q U N b K − N b + N b Q Q − / N b Q UQ N b K − N b K Q I − Q Q N b K − K + K Q A − Q K − K R EFERENCES [1] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,”

IEEECommunications Magazine , vol. 52, no. 2, pp. 186–195, Feb. 2014.[2] E. D. Carvalho, A. Ali, A. Amiri, M. Angjelichinoski, and R. W. Heath, “Non-stationarities in extra-large-scale massiveMIMO,”

IEEE Wireless Communications , vol. 27, no. 4, pp. 74–80, Aug. 2020.[3] `A. O. Mart´ınez, E. De Carvalho, and J. Ø. Nielsen, “Towards very large aperture massive MIMO: A measurement basedstudy,” in , Dec. 8–12 2014, pp. 281–286.[4] Z. Zhou, X. Gao, J. Fang, and Z. Chen, “Spherical wave channel and analysis for large linear array in LoS conditions,”in , Dec. 6–10 2015, pp. 1–6.[5] X. Li, S. Zhou, E. Bj¨ornson, and J. Wang, “Capacity analysis for spatially non-wide sense stationary uplink massive MIMOsystems,”

IEEE Transactions on Wireless Communications , vol. 14, no. 12, pp. 7044–7056, Dec. 2015.[6] A. Ali, E. D. Carvalho, and R. W. Heath, “Linear receivers in non-stationary massive MIMO channels with visibilityregions,”

IEEE Wireless Communications Letters , vol. 8, no. 3, pp. 885–888, Jun. 2019.[7] A. Garcia-Rodriguez, C. Masouros, and P. Rulikowski, “Reduced switching connectivity for large scale antenna selection,”

IEEE Transactions on Communications , vol. 65, no. 5, pp. 2250–2263, May 2017.[8] Y. Gao, H. Vinck, and T. Kaiser, “Massive MIMO antenna selection: Switching architectures, capacity bounds, and optimalantenna selection algorithms,”

IEEE Transactions on Signal Processing , vol. 66, no. 5, pp. 1346–1360, Mar. 2018.[9] K. Li, R. R. Sharan, Y. Chen, T. Goldstein, J. R. Cavallaro, and C. Studer, “Decentralized baseband processing for massiveMU-MIMO systems,”

IEEE Journal on Emerging and Selected Topics in Circuits and Systems , vol. 7, no. 4, pp. 491–507,Dec. 2017. [10] J. Rodr´ıguez S´anchez, F. Rusek, O. Edfors, M. Sarajli´c, and L. Liu, “Decentralized massive MIMO processing exploringdaisy-chain architecture and recursive algorithms,” IEEE Transactions on Signal Processing , vol. 68, pp. 687–700, Jan.2020.[11] A. Mueller, A. Kammoun, E. Bj¨ornson, and M. Debbah, “Linear precoding based on polynomial expansion: reducingcomplexity in massive MIMO,”

EURASIP Journal on Wireless Communications and Networking , no. 63, pp. 1687–1499,Feb. 2016.[12] R. W. Heath, N. Gonz´alez-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, “An overview of signal processing techniquesfor millimeter wave MIMO systems,”

IEEE Journal of Selected Topics in Signal Processing , vol. 10, no. 3, pp. 436–453,Apr. 2016.[13] A. Dua, K. Medepalli, and A. J. Paulraj, “Receive antenna selection in MIMO systems using convex optimization,”

IEEETransactions on Wireless Communications , vol. 5, no. 9, pp. 2353–2357, Sep. 2006.[14] X. Gao, O. Edfors, F. Tufvesson, and E. G. Larsson, “Massive MIMO in real propagation environments: Do all antennascontribute equally?”

IEEE Transactions on Communications , vol. 63, no. 11, pp. 3917–3928, Nov. 2015.[15] P. Lin and S. Tsai, “Performance analysis and algorithm designs for transmit antenna selection in linearly precoded multiuserMIMO systems,”

IEEE Transactions on Vehicular Technology , vol. 61, no. 4, pp. 1698–1708, May 2012.[16] A. Amiri, C. N. Manchon, and E. de Carvalho, “Deep learning based spatial user mapping on extra large MIMO arrays,” arXiv. 2002.00474 , Feb. 2020.[17] J. C. Marinello, T. Abr˜ao, A. Amiri, E. de Carvalho, and P. Popovski, “Antenna selection for improving energy efﬁciencyin XL-MIMO systems,”

IEEE Transactions on Vehicular Technology , vol. 69, no. 11, pp. 13 305–13 318, Nov. 2020.[18] H. Lu and W. Fang, “Joint transmit/receive antenna selection in MIMO systems based on the priority-based geneticalgorithm,”

IEEE Antennas and Wireless Propagation Letters , vol. 6, pp. 588–591, Dec. 2007.[19] J. Lain, “Joint transmit/receive antenna selection for MIMO systems: A real-valued genetic approach,”

IEEE Communica-tions Letters , vol. 15, no. 1, pp. 58–60, Jan. 2011.[20] B. Makki, A. Ide, T. Svensson, T. Eriksson, and M. Alouini, “A genetic algorithm-based antenna selection approach forlarge-but-ﬁnite MIMO networks,”

IEEE Transactions on Vehicular Technology , vol. 66, no. 7, pp. 6591–6595, Jul. 2017.[21] J. R. S´anchez, J. Vidal Alegr´ıa, and F. Rusek, “Decentralized massive MIMO systems: Is there anything to be discussed?”in , Jul. 7–12 2019, pp. 787–791.[22] A. Amiri, M. Angjelichinoski, E. de Carvalho, and R. W. Heath, “Extremely large aperture massive MIMO: Low complexityreceiver architectures,” in , Dec. 9–13 2018, pp. 1–6.[23] A. Amiri, C. N. Manch´on, and E. de Carvalho, “A message passing based receiver for extra-large scale mimo,” in , 2019, pp.564–568.[24] A. Amiri, S. Rezaie, C. N. Manchon, and E. de Carvalho, “Distributed receivers for extra-large scale MIMO arrays: Amessage passing approach,” arXiv. 2007.06930 , Jul. 2020.[25] X. Yang, F. Cao, M. Matthaiou, and S. Jin, “On the uplink transmission of multi-user extra-large scale massive MIMOsystems,” arXiv. 1909.06760 , Nov. 2019.[26] D. A. Gore, R. U. Nabar, and A. Paulraj, “Selecting an optimal set of transmit antennas for a low rank matrix channel,”in , vol. 5, Jun. 5–9 2000, pp. 2785–2788.[27] A. F. Molisch, M. Z. Win, Yang-Seok Choi, and J. H. Winters, “Capacity of MIMO systems with antenna selection,”

IEEETransactions on Wireless Communications , vol. 4, no. 4, pp. 1759–1772, Jul. 2005.[28] S. Asaad, A. M. Rabiei, and R. R. M¨uller, “Massive MIMO with antenna selection: Fundamental limits and applications,”

IEEE Transactions on Wireless Communications , vol. 17, no. 12, pp. 8502–8516, Dec. 2018. [29] C. Ouyang, Z. Ou, L. Zhang, P. Yang, and H. Yang, “Asymptotic upper capacity bound for receive antenna selection inmassive MIMO systems,” in , May. 20–24 2019, pp. 1–6.[30] Z. Abdullah, C. C. Tsimenidis, G. Chen, M. Johnston, and J. A. Chambers, “Efﬁcient low-complexity antenna selectionalgorithms in multi-user massive MIMO systems with matched ﬁlter precoding,” IEEE Transactions on VehicularTechnology , vol. 69, no. 3, pp. 2993–3007, Mar. 2020.[31] H. Siljak, I. Macaluso, and N. Marchetti, “Distributing complexity: A new approach to antenna selection for distributedmassive MIMO,”

IEEE Wireless Communications Letters , vol. 7, no. 6, pp. 902–905, Dec. 2018.[32] P. He, L. Zhao, S. Zhou, and Z. Niu, “Water-ﬁlling: A geometric approach and its application to solve generalized radioresource allocation problems,”

IEEE Transactions on Wireless Communications , vol. 12, no. 7, pp. 3637–3647, Jun. 2013.[33] G. H. Golub and C. F. V. Loan,

Matrix Computations . Baltimore, MD, USA: Johns Hopkins University Press, 2013.[34] E. Montero, M.-C. Riff, and B. Neveu, “A beginner’s guide to tuning methods,”

Applied Soft Computing , vol. 17, pp.39–51, Apr. 2014.[35] T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo,