[PDF] Joint Active and Passive Beamforming Design for IRS-Assisted Multi-User MIMO Systems: A VAMP-Based Approach

Abstract

This paper tackles the problem of joint active and passive beamforming optimization for an intelligent reflective surface (IRS)-assisted multi-user downlink multiple-input multiple-output (MIMO) communication system. We aim to maximize spectral efficiency of the users by minimizing the mean square error (MSE) of the received symbol. For this, a joint optimization problem is formulated under the minimum mean square error (MMSE) criterion. First, block coordinate descent (BCD) is used to decouple the joint optimization into two sub-optimization problems to separately find the optimal active precoder at the base station (BS) and the optimal matrix of phase shifters for the IRS. While the MMSE active precoder is obtained in a closed form, the optimal phase shifters are found iteratively using a modified version (also introduced in this paper) of the vector approximate message passing (VAMP) algorithm. We solve the joint optimization problem for two different models for IRS phase shifts. First, we determine the optimal phase matrix under a unimodular constraint on the reflection coefficients, and then under the constraint when the IRS reflection coefficients are modeled by a reactive load, thereby validating the robustness of the proposed solution. Numerical results are presented to illustrate the performance of the proposed method using multiple channel configurations. The results validate the superiority of the proposed solution as it achieves higher throughput compared to state-of-the-art techniques.

Full PDF

11 Joint Active and Passive Beamforming forIRS-Assisted Multi-User MIMO Systems: AVAMP Approach

Haseeb Ur Rehman, Faouzi Bellili,

Member, IEEE , Amine Mezghani,

Member,IEEE , and Ekram Hossain,

Fellow, IEEE

Abstract

This paper tackles the problem of joint active and passive beamforming optimization for an in-telligent reﬂective surface (IRS)-assisted multi-user downlink multiple-input multiple-output (MIMO)communication system. We aim to maximize spectral efﬁciency of the users by minimizing the meansquare error (MSE) of the received symbol. For this, a joint optimization problem is formulated underthe minimum mean square error (MMSE) criterion. First, block coordinate descent (BCD) is used todecouple the joint optimization into two sub-optimization problems to separately ﬁnd the optimal activeprecoder at the base station (BS) and the optimal matrix of phase shifters for the IRS. While the MMSEactive precoder is obtained in a closed form, the optimal phase shifters are found iteratively using amodiﬁed version (also introduced in this paper) of the vector approximate message passing (VAMP)algorithm. We solve the joint optimization problem for two different models for IRS phase shifts. First,we determine the optimal phase matrix under a unimodular constraint on the reﬂection coefﬁcients,and then under the constraint when the IRS reﬂection coefﬁcients are modeled by a reactive load,thereby validating the robustness of the proposed solution. Numerical results are presented to illustratethe performance of the proposed method using multiple channel conﬁgurations. The results validatethe superiority of the proposed solution as it achieves higher throughput compared to state-of-the-arttechniques.

Index Terms

Intelligent reﬂective surface (IRS), IRS-assisted multi-user MIMO, joint active and passive beam-forming, vector approximate message passing (VAMP) optimization

The authors are with the Department of Electrical and Computer Engineering at the University of Manitoba, Canada (emails:[email protected], { Faouzi.Bellili, Amine.Mezghani, Ekram.Hossain } @umanitoba.ca). a r X i v : . [ c s . I T ] F e b I. I

NTRODUCTION

A. Background

The need for higher data rates in wireless communication is soaring. This calls for innovativeand economically viable communication technologies that can keep up with the increasingnetwork capacity requirement. Massive multiple-input multiple-output (MIMO) technology canfulﬁll the network capacity requirement for beyond ﬁfth-generation (B5G) wireless networks[1]–[3]. The basic idea of massive MIMO is to equip the base stations (BSs) with tens (if nothundreds) of antenna elements so as to simultaneously serve multiple mobile devices using thesame time/frequency resources. Despite the advantages of massive MIMO, its practical large-scale deployment is hindered by high hardware cost and energy consumption [4], [5]. Moreover,although high-frequency bands such as millimeter wave (mmWave) beneﬁt from massive MIMO,its maximum potential is still often not utilized in a practical scenario with many blockagesbetween the BS and the mobile device due to its less penetrative propagation characteristic [6].One promising technology that has been introduced recently is intelligent reﬂective surfaces(IRSs), also called reconﬁgurable intelligent surfaces (RISs) [7], [8]. IRS is composed of a planarmetasurface consisting of a large number of passive reﬂective elements. Moreover, IRS doesnot require a power ampliﬁer for transmission which makes it an energy-efﬁcient technology.This allows the IRS to passively modify the propagation of the signal by reconﬁguring thephases of its reﬂective elements through a controller attached to the surface [9]. Therefore,IRSs can be utilized to perform passive beamforming (i.e., by only optimizing the IRS phaseswithout actively powering IRS antenna elements as opposed to active beamforming at the BS) toimprove the received signal power while reducing the interference for unintended users, therebyenhancing the overall throughput of the network [10]. Practically, IRS deployment requires alarge number of cost-effective phase shifters (PSs) on a surface that can be easily integrated intoa traditional wireless network [11]. Due to aforementioned reasons, IRS has gained substantialresearch interest in the wireless research community over the recent few years. [10]–[18].In [14], a single-user multiple-input single-output (MISO) wireless system assisted by multipleIRSs in the downlink conﬁguration is studied. While making use of statistical channel stateinformation (CSI) only, the optimal reﬂective coefﬁcients are calculated using a stochastic convexapproximation method. In [19], an IRS-assisted multi-cluster MISO system serving multiple usersis considered wherein the authors seek to minimize the transmit power under a minimum signal- to-interference-plus-noise ratio (SINR) constraint by jointly optimizing IRS phase shifts andtransmit precoding. They tackle the underlying problem through alternating direction methodof multipliers (ADMM). An IRS-aided MISO and MIMO system with discrete phase shifts forIRS elements is also discussed in [12]. The authors formulate the problem of minimizing thetransmit power under minimum SINR constraint and jointly optimize the transmit precoding andIRS phase shifts in a mixed-integer non-linear programming framework. In [17], a relatively morepractical model for IRS reﬂection coefﬁcients is considered and then a penalty-based algorithmis used to ﬁnd the optimal phase matrix.The vast majority of the existing work considers a MISO wireless system assisted by a singleor multiple IRSs serving a single user [13], [14], [14]–[17]. So far, limited research has beenconducted on IRS-aided multi-user MIMO systems. Moreover, IRS reﬂection coefﬁcients areoften modeled as ideal phase shifters and a realistic approach towards modeling reﬂectioncoefﬁcients has rarely been investigated. In fact, most of the existing methods are limited toa single-phase shifter model, unimodular phase shifts being the most common one, and hencethey are not robust to accommodate various hardware impairments for reﬂection coefﬁcients[13]–[17].

B. Contributions

In this paper, we consider a multi-user IRS-assisted single-cell downlink MIMO system witha single IRS. The IRS is equipped with a large number of passive phase shifters that aid theBS to serve a small number of users. We propose a robust solution for the problem of jointlyoptimizing the active and passive beamforming tasks under different models for the IRS reﬂectioncoefﬁcients and propose a robust solution. The main contributions embodied by this paper areas follows: • We minimize the mean square error (MSE) of the received symbol and thereby maximizingthe spectral efﬁciency of the users by jointly optimizing the transmit precoding matrix andthe reﬂection coefﬁcients at the IRS. To that end, we ﬁrst formulate the joint optimizationproblem under the MMSE criterion in order to minimize the MSE of the received signalsfor all users at the same time. • To solve the underlying joint optimization problem, we ﬁrst split it using alternate opti-mization block coordinate descent (BCD) [20] into two simpler sub-optimization problemsto separately optimize the active precoder at the BS and the reﬂection coefﬁcients at the

IRS. The precoding sub-optimization problem is similar to the MMSE transmit precodingoptimization for a traditional MIMO system, which can be solved in closed form throughLagrange optimization. • We modify and extend the existing VAMP algorithm [21] and propose a robust techniquefor ﬁnding optimal reﬂective coefﬁcients for the IRS under multiple constraints. Precisely,we ﬁnd the optimal phase matrix under two different models for the reﬂection coefﬁcients: i ) Under the unimodular constraint on the IRS reﬂection coefﬁcients and ii ) under a realisticconstraint, where each IRS element is terminated by a tunable simple reactive load. • We provide the order of complexity of the proposed solution. • We present various numerical results to compare the proposed solution with a standalonemassive MIMO system using maximal ratio transmission (MRT) precoder and an ADMM-based solution. The results show that, the proposed solution: i ) achieves higher throughputthan massive MIMO system while using a signiﬁcantly smaller number of transmit antennas,and ii ) outperforms the ADMM-based solution in terms of throughput in less time whileusing the same resources. C. Paper Organization and Notations

The rest of the paper is organized as follows: the system model along with the problemformulation for jointly optimizing the active precoder and the reﬂection coefﬁcients are discussedin Section II. Section III brieﬂy introduces the VAMP algorithm and then extends it to solveoptimization problems. In Section IV, we solve the joint optimization problem at hand using theproposed extended version of VAMP. In Section V, we further the underlying joint optimizationproblem under the “simple reactive loading” constraint on the IRS phase shifts. Numerical resultsfor the proposed solution are shown in Section VI.

Notations : Lowercase letters (e.g., r ) denote scalar variables. The uppercase italic letters (e.g., N ) represent scalar constants. Vectors are denoted by small boldface letters (e.g. z ) and the k -th element of z is denoted as z k . Exponent on a vector (e.g., z n ) denotes component-wiseexponent on every element of the vector. Capital boldface letters (e.g., A ) are used to denotematrices a ik and a i stand, respectively, for the ( i, k ) -th entry and the i -th column of A . C M × N denotes the set of matrices of size M × N with complex elements and A − k means ( A − ) k .Rank ( A ) , Tr ( A ) , return, respectively, the Rank and the trace of any matrix A . We also use (cid:107) . (cid:107) , (cid:107) . (cid:107) F , ( . ) ∗ , ( . ) T , ( . ) H to denote the L norm, Frobenius norm, the conjugate, the transpose, and the conjugate transpose operators, respectively. The operator < . > returns the mean of theelements/entries of any vector or matrix. Moreover, vec ( . ) and unvec ( . ) denote vectorization ofmatrix and unvectorization of a vector back to its original matrix form, respectively. Diag ( . ) operates on a vector and generates a diagonal matrix by placing that vector in the diagonal.The statistical expectation is denoted as E { . } . An independent and identically distributed (i.i.d)random vector with complex normal distribution is represented by x ∼ C N ( x ; u , γ − I ) , where u and γ − denote its mean and variance, respectively. The imaginary unit is represented by j = √− . The proportional relationship between any two entities (functions or variables) isrepresented by ∝ operator. Lastly, the operators ⊗ and (cid:12) denote the Kronecker and Hadamardproduct, respectively.II. S YSTEM M ODEL , A

SSUMPTIONS , AND P ROBLEM F ORMULATION

Consider a BS that is equipped with N antenna elements serving M ( M < N ) single-antennausers in the downlink. The BS is assisted by an IRS which has K ( K > M ) reﬂective elements.For each user, we have a direct link to the BS expressed by a channel vector h b-u ∈ C N × . Thechannel of the IRS-user link is denoted by h s-u ∈ C N × . Let H b-s ∈ C K × N denote the channelmatrix of the MIMO IRS-BS link. The signal received at the IRS is phase-shifted by a diagonalmatrix Υ = Diag ( υ , υ · · · , υ K ) ∈ C K × K with unimodular diagonal entries, i.e., | υ k | = 1 for k = 1 , · · · , K . In other words, for each reﬂection element, we have υ k = e j θ k for some phaseshift θ k ∈ [0 , π ] . The received signal for user m can be expressed as follows: y m = α (cid:32) h H s-u ,m ΥH b-s M (cid:88) m (cid:48) =1 f m (cid:48) s m (cid:48) + h H b-u ,m M (cid:88) m (cid:48) =1 f m (cid:48) s m (cid:48) + w (cid:33) , m = 1 , · · · , M, (1)where s m ∼ C N ( s ; 0 , is the unknown transmit symbol, w ∼ C N ( w ; 0 , σ w ) denotes ad-ditive white Gaussian noise (AWGN), and α ∈ R refers to the scalar pre-processing at thereceiver. Here, f m ∈ C N × are the precoding vector that are used for power allocation andbeamforming purposes. Let F = [ f , f , · · · , f M ] be the precoding matrix and let P denotethe total transmit power then E (cid:8) (cid:107) F s (cid:107) (cid:9) = P . Let H b-u = [ h b-u , , h b-u , , · · · , h b-u ,M ] and H s-u =[ h s-u , , h s-u , , · · · , h s-u ,M ] . Then, by stacking all the users’ signals in one vector y = [ y , y , · · · , y M ] T ,we can express the input-output relationship of the multi-user MIMO system as: y = α  H H s-u ΥH b-s F s (cid:124) (cid:123)(cid:122) (cid:125) Users-IRS-BS + H H b-u F s (cid:124) (cid:123)(cid:122) (cid:125) Users-BS + w  . (2) BS IRS K reﬂectors User User M h b-u , h b-u ,M h s-u , h s-u ,M H b-s ... ... · · ·· · · . . . Fig. 1: IRS-assisted multi-user MIMO system.The overall effective channel matrix for all users thus given by: H H = H H s-u ΥH b-s + H H b-u . (3)Moreover, we assume that the channel state information (CSI) for all the channels is perfectlyknown at the BS. Denoting the SINR experienced by user m as ρ m , the associated spectralefﬁciency is given by: C m = log (1 + ρ m ) . (4)We aim to minimize the received symbol error of each user under the MMSE criterion, whichconsequently maximizes user SINR. In fact According to the I-MMSE relationship [22], [23],the SINR of each m -th user is inversely proportional to the MMSE of its received symbol: ρ m = 1 MMSE m . (5)Subsequently, the spectral efﬁciency for user m can be expressed as: C MMSE m = log (cid:18) MMSE m (cid:19) . (6)The MSE of the received symbol for user m is given by E y m , s m (cid:8) | y m − s m | (cid:9) , and for M users,the sum symbol MSE can be written as: M (cid:88) i =1 E y m , s m (cid:8) | y m − s m | (cid:9) = E y , s (cid:8) (cid:107) y − s (cid:107) (cid:9) . (7) Thus, our problem under the MMSE criterion can be formulated as follows: arg min α, F , Υ E y , s (cid:8) (cid:107) y − s (cid:107) (cid:9) , (8a)subject to E s (cid:8) (cid:107) F s (cid:107) (cid:9) = P, (8b) υ ik = 0 , i (cid:54) = k, (8c) | υ ii | = 0 , i = 1 , , · · · , K. (8d)Note that the objective function in (8a) ensures that the MSE is minimized for each user andhence the spectral efﬁciency is maximized for each user. We take expectation involved in (8a)and (8b) with respect to (w.r.t.) the random vectors s and w and thereby leading to: arg min α, F , Υ (cid:107) α H H F − I M (cid:107) F + M α σ w , (9a)s.t. (cid:107) F (cid:107) F = P, (9b) υ ik = 0 , i (cid:54) = k, (9c) | υ ii | = 0 , i = 1 , , · · · , K. (9d)The problem in (9) is a non-convex optimization problem due to the unimodular constraint on in(9d). Furthermore, later in this paper, we will solve the same problem under another constraint onthe reﬂection coefﬁcients. VAMP is a low-complexity iterative algorithm that can be utilized tosolve optimization problems and we will show that it can efﬁciently handle different constraintson the optimization variables. One could also envisage the use of ADMM, which is a penalty-based iterative algorithm for solving constrained optimization problems. However, it is shown in[24] that VAMP converges faster than ADMM because of its automatic tuning of the requiredparameters.III. M ODIFIED

VAMP A

LGORITHM FOR C ONSTRAINED O PTIMIZATION

Recently, message passing algorithms [21], [25], [26] have gained attention in estimationtheory because of their high performance, fast convergence. Vector approximate message passing(VAMP) [21] is a low-complexity algorithm that solves quadratic loss optimization of recoveringa vector from noisy linear measurements. In this section, we brieﬂy discuss the standard max-sum VAMP algorithm and we modify it to solve constrained optimization problems for matricesinvolving linear mixing.

A. Background on Max-Sum VAMP

Approximate message passing (AMP)-based computational techniques have gained a lot ofattention since their introduction within the compressed sensing framework [25]. To be precise,AMP solves the standard linear regression problem of recovering a vector x ∈ C N from noisylinear observations: z = Ax + w , (10)where A ∈ C M × N (with M (cid:28) N ) is called sensing matrix and w ∼ C N (0 , γ − w I M ) , with γ w > , so p z | x ( z | x ) = C N ( z ; Ax , γ − w I M ) . Interestingly, the performance of AMP under independentand identically distributed (i.i.d.) Gaussian sensing matrices, A , can be rigorously tracked throughscalar state evolution (SE) equations [27]. One major drawback of AMP, however, is that it oftendiverges if the sensing matrix, A , is ill-conditioned or has a non-zero mean. To circumvent thisproblem, vector AMP (VAMP) algorithm was proposed and rigorously analyzed through SEequations in [21]. Although there is no theoretical guarantee that VAMP will always converge,strong empirical evidence suggests that VAMP is more resilient to badly conditioned sensingmatrices given that they are right-orthogonally invariant [21]. Consider the joint probabilitydistribution function (pdf) of x and z , p x , z ( x , z ) , as follows: p x , z ( x , z ) = p x ( x ) C N ( z ; Ax , γ − w I M ) . (11)Here p x ( x ) is some prior distribution on the vector x whose elements are assumed to be i.i.d.vector with a common prior distribution, p x ( x ) , as follows: p x ( x ) = N (cid:89) i =1 p x ( x i ) . (12)Thus, max-sum VAMP solves the following optimization problem: (cid:98) x = arg min x (cid:107) z − Ax (cid:107) , (13)by ﬁnding the maximum a posteriori (MAP) estimate of x as follows: (cid:98) x = arg max x p x | z ( x | z ) . (14)The algorithm consists of the following two modules.

1) MAP/LMMSE Estimator:

At iteration t , the MAP estimator receives extrinsic information(message) from the separable (i.e., element-wise) MAP denoiser of x in the form of a meanvector, r t − , and a scalar variance, γ − t − , and computes the MAP estimate and the associatedposterior variance from the linear observations, z = Ax + w , by using a Gaussian prior, C N ( x ; r t − , γ − t − I N ) , on x . Because we are dealing with Gaussian densities, the MAP estimateis equal to the Linear MMSE (LMMSE) and given as follows: ¯ x t = (cid:0) γ w A H A + γ t − I N (cid:1) − (cid:0) γ w A H z + γ t − r t − (cid:1) , (15)and its posterior variance is given by: ¯ γ − t = 1 N Tr (cid:16)(cid:2) γ w A H A + γ t − I N (cid:3) − (cid:17) . (16)The extrinsic information on x is updated as follows: C N ( x ; ¯ x t , ¯ γ − t I N ) / C N ( x ; r t − , γ − t − I N ) , and then sent back in the form of mean vector, (cid:101) r t = (¯ x t ¯ γ t − r t − γ t − ) / (¯ γ t − γ t − ) , and a scalarvariance, (cid:101) γ − t = (¯ γ t − γ t − ) − , to the separable MAP denoiser of x . The SVD (singular valuedecomposition) form of VAMP directly computes extrinsic mean vector (cid:101) r t and scalar variance (cid:101) γ − t , and can be readily obtained by substituting A = U Diag ( ω ) V H in (15) and (16).

2) MAP Estimator of x : This module computes the MAP estimate, (cid:98) x t , of x from the jointdistribution p x ( x ) C N ( x ; (cid:101) r t , (cid:101) γ − t I N ) . Because x is i.i.d., the MAP estimator, also called denoiserfunction, is separable. It is expressed as: (cid:98) x i,t = g ,i ( (cid:101) r i,t , (cid:101) γ t ) (cid:44) arg max x i (cid:2) − (cid:101) γ t | x i − (cid:101) r i,t | + ln p x ( x i ) (cid:3) , (17)or equivalently, g ,i ( (cid:101) r i,t , (cid:101) γ t ) = arg min x i (cid:2)(cid:101) γ | x i − (cid:101) r i,t | − ln p x ( x i ) (cid:3) . (18)The derivative of the scalar MAP denoiser w.r.t. (cid:101) r i is given by [21]: g (cid:48) ,i ( (cid:101) r i,t , (cid:101) γ t ) = (cid:101) γ t (cid:98) γ t , (19)where (cid:98) γ t is the posterior precision. Similar to LMMSE module, the denoiser module computesextrinsic mean vector, r t = ( (cid:98) x t (cid:98) γ t − (cid:101) r t (cid:101) γ t ) / ( (cid:98) γ t − (cid:101) γ t ) , and scalar variance, γ − t = ( (cid:98) γ t − (cid:101) γ t ) − , andsends them back to the LMMSE module for the next iteration. The process is repeated untilconvergence. A major advantage of VAMP is that it decouples the prior information, p x ( x ) , and theobservations, p x | z ( z | x ) , into two separate modules. Moreover, it also enables the denoiser functionto be separable even if the elements of x are correlated in which case the LMMSE modulecan easily incorporate such correlation information. The steps of the standard max-sum VAMPalgorithm are shown in Algorithm 1 . Algorithm 1

Max-sum VAMP SVDGiven A ∈ C M × N , z ∈ C M , precision tolerance ( (cid:15) ) andmaximum number of iterations ( T MAX ) Initialize r , γ ≥ and t ← Compute economy-size SVD A = U Diag ( ω ) V H R A = Rank ( A ) = length ( ω ) Compute (cid:101) z = Diag ( ω ) − U H z repeat // LMMSE SVD Form. d t = γ w Diag ( γ w ω + γ t − ) − ω (cid:101) r t = r t − + NR A V Diag ( d t / (cid:104) d t (cid:105) ) (cid:0)(cid:101) z − V H r t − (cid:1) (cid:101) γ t = γ t − (cid:104) d t (cid:105) / (cid:16) NR A − (cid:104) d t (cid:105) (cid:17) // Denoiser (cid:98) x t = g ( (cid:101) r t , (cid:101) γ t ) (cid:98) γ t = (cid:104) g (cid:48) ( (cid:101) r t , (cid:101) γ t ) (cid:105) / (cid:101) γ t γ t = (cid:98) γ t − (cid:101) γ t r t = ( (cid:98) γ t (cid:98) x t − (cid:101) γ t (cid:101) r t ) /γ t t ← t + 1 until (cid:107) (cid:98) x t − (cid:98) x t − (cid:107) ≤ (cid:15) (cid:107) (cid:98) x t − (cid:107) or t > T MAX return (cid:98) x t B. VAMP for Optimization

In this section, we study how max-sum VAMP can be applied to constrained optimizationproblems. Given that we have matrices A ∈ C M × N and B ∈ C Q × R , the goal is to solve an optimization problem of the form: arg min X (cid:107) AXB − Z (cid:107) F (20a)s.t. f ( x ik ) = 0 , (20b)where X ∈ C N × Q with Z ∈ C M × R . In the context of optimization, the observation matrix, Z , isconsidered as the desired output matrix and it is also assumed to be known. Unlike the estimationproblem in (13), we do not have a prior distribution on X . Yet the above optimization problemin (20) can be solved by modifying the modules of standard max-sum VAMP.

1) Extended LMMSE:

Through vectorization, the objective function in (20) can be written inthe same form of the objective function in (13) in the following way:vec ( Z ) = ( B T ⊗ A ) vec ( X ) . (21)The SVD form of VAMP allows for exploiting the Kronecker structure throughout the algorithmto avoid any large matrix multiplication. The product of a Kronecker matrix and a vector canbe computed in an efﬁcient way through reverse vectorization or unvectorization by computingthe product of three smaller matrices, and then vectorizing the result. Let A = U A Diag ( ω A ) V H A , B T = U B Diag ( ω B ) V H B and ( B T ⊗ A ) = U Diag ( ω ) V H . Using the fact that: U Diag ( ω ) V H = ( U B ⊗ U A ) Diag ( ω B ⊗ ω A ) (cid:0) V H B ⊗ V H A (cid:1) , (22)we modify the steps of Algorithm t , the LMMSE module receives extrinsic mean matrix, R t − , such that:vec (cid:0) R t − (cid:1) = r t − , and scalar variance γ − t − from the MAP estimator. First, the line of thealgorithm can be written as: (cid:101) Z (cid:44) unvec ( (cid:101) z ) = unvec (cid:0) Diag ( ω ) − U H vec ( Z ) (cid:1) = Diag ( ω A ) − U H A Z (cid:0) Diag ( ω B ) − U H B (cid:1) T . (23)Similarly, the intermediary vector d t in line of Algorithm 1 can be re-expressed as follows: d t = γ w Diag (cid:0) γ w ( ω B ⊗ ω A ) + γ k (cid:1) − ( ω B ⊗ ω A ) = γ w (cid:0) γ w ( ω B ⊗ ω A ) + γ k (cid:1) − (cid:12) ( ω B ⊗ ω A ) . (24)The extrinsic mean matrix, (cid:101) R t , and the scalar variance, (cid:101) γ − t , are computed directly withoutneeding to compute the LMMSE estimate, ¯ X t , and posterior variance, ¯ γ t . While the step for MAPEstimatorof X LMMSEEstimator Z = AXB extext (cid:98)

X Z (cid:101) R , (cid:101) γ ¯ X ¯ γ R , γ (cid:98) X (cid:98) γ Fig. 2: Block diagram of batch VAMP for optimization.computing extrinsic variance, (cid:101) γ − t , remains unchanged, the extrinsic mean matrix, (cid:101) R t , is com-puted through unvectorization of the extrinsic mean vector, (cid:101) r t , as follows: (cid:101) R t = unvec (cid:18) r t − + N Q

Rank ( B ⊗ A ) V Diag ( d t / (cid:104) d t (cid:105) ) (cid:0)(cid:101) z − V H r t − (cid:1)(cid:19) = R t − + N Q

Rank ( B ⊗ A ) V A (cid:18) unvec (cid:18) Diag (cid:18) d t (cid:104) d t (cid:105) (cid:19) vec (cid:16)(cid:101) Z − V H A R t − (cid:0) V H B (cid:1) T (cid:17)(cid:19)(cid:19) V T B . (25)Hence, the only Kronecker product required for the LMMSE is of the two vectors ω B and ω A .

2) Scalar MAP Estimator:

Because the constraint on X is component-wise, we model the con-straint on its entries, x ik , as a prior with some precision, γ p > , i.e., p x ( x ik ) ∝ exp[ − γ p f ( x ik )] and deﬁne the scalar denoiser function (now called projector function in the context of opti-mization) as follows: (cid:98) x ik,t = g ik ( (cid:101) r ik,t , (cid:101) γ t ) (cid:44) arg min x ik (cid:2)(cid:101) γ t | r ik,t − (cid:101) r ik,t | − ln p x ( x ik ) (cid:3) , (26)equivalently: g ik ( (cid:101) r ik,t , (cid:101) γ t ) = arg min x ik (cid:2)(cid:101) γ t | x ik − (cid:101) r ik,t | + γ p f ( x ik ) (cid:3) . (27)The parameter γ p in (27) governs weightage given to the constraint, f ( x ik ) . Choosing γ p (cid:29) (cid:101) γ enforces the constraint while choosing γ p (cid:28) (cid:101) γ ignores the constraint. Taking the derivative ofthe scalar projector function yields: g (cid:48) ik ( (cid:101) r ik,t , (cid:101) γ t ) = (cid:101) γ t (cid:98) γ t , (28) Algorithm 2

Matrix max-sum VAMP SVD for optimizationGiven A ∈ C M × N , B ∈ C Q × R , Z ∈ C M × R , precision tolerance ( (cid:15) ) andmaximum number of iterations ( T MAX ) Select initial R , γ ≥ and t ← Compute economy-size SVD A = U A Diag ( ω A ) V H A Compute economy-size SVD B T = U B Diag ( ω B ) V H B Compute ω = ω B ⊗ ω A R B-A = Rank ( B T ⊗ A ) = length ( ω ) Compute (cid:101) Z = Diag ( ω A ) − U H A Z (cid:0) Diag ( ω B ) − U H B (cid:1) T repeat // LMMSE SVD Form. d t = γ w ( γ w ω + γ t − ) − (cid:12) ω (cid:98) D t = unvec (cid:16) d t (cid:104) d t (cid:105) (cid:12) vec (cid:16)(cid:101) Z − V H A R t − (cid:0) V H B (cid:1) T (cid:17)(cid:17) (cid:101) R t = R t − + NQR

B-A V A (cid:98) D t V T B (cid:101) γ t = γ t − (cid:104) d t (cid:105) / (cid:16) NQR

B-A − (cid:104) d t (cid:105) (cid:17) // Projector. (cid:98) X t = g ( (cid:101) R t , (cid:101) γ ) (cid:98) γ t = (cid:68) g (cid:48) ( (cid:101) R t , (cid:101) γ ) (cid:69) / (cid:101) γ t γ t = (cid:98) γ t − (cid:101) γ t R t = ( (cid:98) γ t (cid:98) X t − (cid:101) γ t (cid:101) R t ) /γ t t ← t + 1 until (cid:13)(cid:13)(cid:13) (cid:98) X t − (cid:98) X t − (cid:13)(cid:13)(cid:13) F ≤ (cid:15) (cid:13)(cid:13)(cid:13) (cid:98) X t − (cid:13)(cid:13)(cid:13) F or t > T MAX return (cid:98) X t where (cid:98) γ t is the posterior precision. Similar to the denoiser module, extrinsic information fromthe projector module is calculated in the form of the mean matrix: R t = unvec (cid:16) vec (cid:16) (cid:98) X t (cid:17) (cid:98) γ t − vec (cid:16) (cid:101) R t (cid:17) (cid:101) γ t (cid:17) / ( (cid:98) γ t − (cid:101) γ t )= (cid:16) (cid:98) X t (cid:98) γ t − (cid:101) R t (cid:101) γ t (cid:17) / ( (cid:98) γ t − (cid:101) γ t ) , (29)and scalar variance, γ − t = ( (cid:98) γ t − (cid:101) γ t ) − which are then fed to the LMMSE module. In ananalogous way to sum-product VAMP, the max-sum VAMP (for optimization) decouples the constraint from the objective function and also enables the projector function to be separable.The LMMSE module solves unconstrained optimization of the objective function, whereasthe projector function enforces the constraint. This favorable property makes VAMP a robustalgorithm for solving optimization problems in the presence of linear mixing and under variouscomponent-wise constraints. The complete steps for the VAMP for optimization are presentedin Algorithm 2 .IV. VAMP-B

ASED S OLUTION FOR THE J OINT B EAMFORMING P ROBLEM

In this section, we present a novel iterative algorithm that simultaneously ﬁnds optimal phasematrix, Υ , as well as the optimal precoding matrix F . We decouple the joint optimizationproblem into two sub-problems through alternate optimization. One of them ﬁnds the optimalphase matrix, Υ , by utilizing the modiﬁed max-sum VAMP and the other sub-problem optimizesthe transmit precoding F . A. Alternate Optimization

We use block coordinate descent to decompose the joint optimization problem into two simpleroptimization problems. It is a simple iterative approach that optimizes one variable at a time (while ﬁxing the others) and the process is repeated for every variable. Thus, we divide theoptimization problem in (8) into the following two sub-optimization problems:1) arg min Υ E y , s (cid:8) (cid:107) y − s (cid:107) (cid:9) (30a)s.t. υ ik = 0 , i (cid:54) = k (30b) | υ ii | = 0 , i = 1 , , · · · , K. (30c)2) arg min α, F E y , s (cid:8) (cid:107) y − s (cid:107) (cid:9) (31a)s.t. E s (cid:107) F s (cid:107) = P. (31b) Note here that a variable can be a scalar, a vector, or a whole matrix. Let us deﬁne the error at iteration t as follows: E ( t ) (cid:44) (cid:13)(cid:13)(cid:13)(cid:98) α t (cid:16) H H s-u (cid:98) Υ t H b-s + H H b-u (cid:17) (cid:98) F t − I M (cid:13)(cid:13)(cid:13) F + M (cid:98) α t σ w . (32)The algorithm stops iterating when | E ( t ) − E ( t − | < (cid:15)E ( t − , where (cid:15) ∈ R + is someprecision tolerance. The algorithmic steps for BCD after taking expectation w.r.t s and w areshown in Algorithm 3 . Algorithm 3

Block coordinate descentGiven H s-u , H b-u , H b-s , precision tolerance ( (cid:15) ) , and maximum number of iterations ( T MAX ) Initialize (cid:98) Υ and t ← . repeat [ (cid:98) α t , (cid:98) F t ] = arg min α, F (cid:13)(cid:13)(cid:13) α (cid:16) H H s-u (cid:98) Υ t − H b-s + H H b-u (cid:17) F − I M (cid:13)(cid:13)(cid:13) F + M α σ w s.t. (cid:107) F (cid:107) F = P (cid:98) Υ t = arg min Υ (cid:13)(cid:13)(cid:13)(cid:98) α t (cid:0) H H s-u ΥH b-s + H H b-u (cid:1) (cid:98) F t − I M (cid:13)(cid:13)(cid:13) F s.t. υ ik = 0 i (cid:54) = k, | υ ii | = 1 i = 1 , , · · · , K t ← t + 1 until | E ( t ) − E ( t − | < (cid:15)E ( t − or t > T MAX return (cid:98) Υ t , (cid:98) F t , (cid:98) α t . B. Optimal Phase Matrix

We adapt max-sum VAMP in order to ﬁnd the optimal phase matrix, Υ . Let us restate theassociated optimization after taking expectation w.r.t. s and w in (30) as: arg min Υ (cid:13)(cid:13) α H H s-u ΥH b-s F − ( I M − α H H b-u F ) (cid:13)(cid:13) F (33a)s.t. υ ik = 0 i (cid:54) = k, (33b) | υ ii | = 1 i = 1 , , · · · , K. (33c) MAPEstimatorof Υ LMMSEEstimator Y = AΥB extext (cid:98)

Υ Y (cid:101) R , (cid:101) γ ¯ γ ¯ ΥR , γ (cid:98) Υ (cid:98) γ Precoder (cid:98) F (cid:98) α Fig. 3: Block diagram of the proposed algorithm.The solution is obtained by setting A = α H H s-u , B = H b-s F and Z = I M − α H b-u F in Algorithm 2 and then choosing a suitable projector function to satisfy the constraints on reﬂection coefﬁcients.The unconstrained minimization of the objective function in (33) is performed by the LMMSEmodule. We deﬁne the projector function that enforces the constraint on the reﬂection coefﬁcientsas: g ,ik ( (cid:101) r ik , (cid:101) γ ) (cid:44)  arg min υik (cid:2)(cid:101) γ | υ ik − (cid:101) r ik | − γ p ( | υ ik | − (cid:3) i = k i (cid:54) = k. (34)Solving the optimization problem in (34) for i = k results in the following closed-form expressionfor the underlying projector function g ,ii ( (cid:101) r ii , (cid:101) γ ) = (cid:101) γ (cid:101) γ + γ p (cid:101) r ii + γ p (cid:101) γ + γ p (cid:101) r ii | (cid:101) r ii | − . (35)To enforce the constraint, we choose γ p (cid:29) (cid:101) γ such that (cid:101) γ (cid:101) γ + γ p ≈ and γ p (cid:101) γ + γ p ≈ . Therefore, theprojector function simpliﬁes to: g ,ik ( (cid:101) r ik ) = (cid:101) r ik | (cid:101) r ik | − i = k i (cid:54) = k. (36)Finally, the derivative of the projector function (36) w.r.t. (cid:101) r ik is obtained as follows: g (cid:48) ,ik ( (cid:101) r ik ) =  | (cid:101) r ik | − i = k i (cid:54) = k. (37) C. Optimal Precoding

The sub-optimization problem in (31) is a constrained MMSE transmit precoding optimizationfor a MIMO system. It can be solved by jointly optimizing F and α using Lagrange optimization.We construct the Lagrangian function for the problem (31) after taking expectation as follows: L ( F , α, λ ) = (cid:13)(cid:13) α H H F − I M (cid:13)(cid:13) F + M α σ w + λ ( Tr ( FF H ) − P ) , (38)with λ ∈ R being the Lagrange multiplier. The closed-form solutions for optimal α and F aregiven below and we refer the reader to [28] for more details: α opt = g ( H ) (cid:44) (cid:114) P (cid:118)(cid:117)(cid:117)(cid:116) Tr (cid:32)(cid:20) HH H + M σ w I N P (cid:21) − HH H (cid:33) . (39) F opt = g ( H ) (cid:44) √ P (cid:104) HH H + Mσ w I N P (cid:105) − H (cid:115) Tr (cid:18)(cid:104) HH H + Mσ w I N P (cid:105) − HH H (cid:19) . (40)It is worth mentioning here that, on the receiver side, α can be obtained in a blind manner basedon the received signal power as it is just a scalar and independent of the transmitted signal.Consequently, the receiver does not require any additional information from the BS (e.g., thephase matrix Υ ) to calculate α .Now that we have solved both sub-optimization problems required for the coordinate descentalgorithm (i.e., Algorithm 3 ). We substitute the solutions to the sub-optimization problems in(30) and in (31) into

Algorithm 3 . The overall steps are stated in

Algorithm 4 .V. J

OINT B EAMFORMING U NDER R EACTIVE L OADING AT THE

IRSWe consider a reﬂective element that is combined with a tunable reactive load instead of anideal phase shifter, i.e., υ ii = (1 + j χ ii ) − where χ ii ∈ R is a scalar reactance value that has The value is the normalized resistive part of the element impedance whereas χ ii is the normalized reactive part of theantenna plus reactive termination. Accordingly υ ii represents the induced current ﬂowing across the antenna. We assume theantenna elements to be uncoupled which holds approximately for half-wavelength element spacing. Algorithm 4

VAMP-based joint optimization algorithmGiven H s-u , H b-u , H b-s , precision tolerance ( (cid:15) ) , and maximum number of iterations ( T MAX ) Initialize (cid:98) Υ , R , γ ≥ and t ← (cid:98) H = (cid:16) H H s-u (cid:98) Υ H b-s + H H b-u (cid:17) H (cid:98) α = g (cid:16) (cid:98) H (cid:17) (cid:98) F = g (cid:16) (cid:98) H (cid:17) repeat // LMMSE SVD Form. A = α t − H H s-u B = H b-s F t − . Y = I M − α t − H H b-u F t − Compute economy-size SVD A = U A Diag ( ω A ) V H A Compute economy-size SVD B T = U B Diag ( ω B ) V H B Compute ω = ω B ⊗ ω A R B-A = Rank ( B T ⊗ A ) = length ( ω ) Compute (cid:101) Y =: Diag ( ω A ) − U H A Y (cid:0) Diag ( ω B ) − U H B (cid:1) T d t = γ w ( γ w ω + γ t − ) − (cid:12) ω (cid:98) D t = unvec (cid:16) d t (cid:104) d t (cid:105) (cid:12) vec (cid:16) (cid:101) Y − V H A R t − (cid:0) V H B (cid:1) T (cid:17)(cid:17) (cid:101) R t = R t − + K R B-A V A (cid:98) D t V T B (cid:101) γ t = γ t − (cid:104) d t (cid:105) / (cid:16) K R B-A − (cid:104) d t (cid:105) (cid:17) // Projector Part (cid:98) Υ t = g (cid:16) (cid:101) R t (cid:17) (cid:98) γ t = (cid:68) g (cid:48) (cid:16) (cid:101) R t (cid:17)(cid:69) / (cid:101) γ t . γ t = (cid:98) γ t − (cid:101) γ t R t = (cid:16)(cid:98) γ t (cid:98) Υ t − (cid:101) γ t (cid:101) R t (cid:17) /γ t //Find α and F through their closed-form solutions. (cid:98) H t = (cid:16) H H s-u (cid:98) Υ t H b-s + H H b-u (cid:17) H (cid:98) α t = g (cid:16) (cid:98) H t (cid:17) (cid:98) F t = g (cid:16) (cid:98) H t (cid:17) t ← t + 1 until | E ( t ) − E ( t − | < (cid:15)E ( t − or t > T MAX return (cid:98) Υ t , (cid:98) F t , (cid:98) α t . to be optimized for each reﬂection coefﬁcient. We rewrite the objective function under the newconstraint on phases as follows: arg min Υ (cid:13)(cid:13) α H H s-u ΥH b-s F − ( I M − α H H b-u F ) (cid:13)(cid:13) F (41a)s.t. υ ik = 0 , i (cid:54) = k, (41b) υ ii = 11 + j χ ii , i = 1 , , · · · , K. (41c)To ﬁnd the optimal phase matrix under the new constraint, we change the projector functionaccordingly as follows: g ,ik ( (cid:101) r ik , (cid:101) γ ) (cid:44)  arg min υ ik (cid:20)(cid:101) γ | υ ik − (cid:101) r ik | + γ p (cid:16) υ ik − χ opt ii (cid:17) (cid:21) i = k i (cid:54) = k, (42)where χ opt ii = g ( (cid:101) r ii ) (cid:44) arg min χ ii (cid:12)(cid:12)(cid:12)(cid:12)(cid:101) r ii −

11 + j χ ii (cid:12)(cid:12)(cid:12)(cid:12) . (43)The optimization problem in (42) is a bi-level one [29]. The solution to (43) is substituted in(42) and then solved as ordinary MAP optimization. We show in Appendix A that the solutionto (43) is given by: g ( (cid:101) r ii ) = 12 (cid:61) { (cid:101) r ii } (cid:18) (cid:60) { (cid:101) r ii } − − (cid:113) (1 − (cid:60) { (cid:101) r ii } ) + 4 (cid:61) { (cid:101) r ii } (cid:19) . (44)Substituting (44) back into (42) and solving the minimization leads to the following result: g ,ii ( (cid:101) r ii , (cid:101) γ ) = (cid:101) γ (cid:101) γ + γ p (cid:101) r ii + γ p (cid:101) γ + γ p (1 + j g ( (cid:101) r ii )) − . (45)The parameter γ p is chosen in the same way as in Section IV-B to ensure that the constraint issatisﬁed. Thus, the projector function can be expressed as: g ,ik ( (cid:101) r ik ) =  (1 + j g ( (cid:101) r ik )) − i = k i (cid:54) = k, (46)whose derivative is obtained as follows: g (cid:48) ,ik ( (cid:101) r ik ) =  − j g (cid:48) ( (cid:101) r ik ) (1 + j g ( (cid:101) r ik )) − i = k i (cid:54) = k, (47)where g (cid:48) ( (cid:101) r ii ) = 12 (cid:18) ∂g ( (cid:101) r ii ) ∂ (cid:60) { (cid:101) r ii } − j ∂g ( (cid:101) r ii ) ∂ (cid:61) { (cid:101) r ii } (cid:19) . (48) The partial derivatives involved in (48) are given by: ∂g ( (cid:101) r ii ) ∂ (cid:60) { (cid:101) r ii } = (cid:61) { (cid:101) r ii } − + (1 − (cid:60) { (cid:101) r ii } ) (cid:18) (cid:61) { (cid:101) r ii } (cid:113) (1 − (cid:60) { (cid:101) r ii } ) + 4 (cid:61) { (cid:101) r ii } (cid:19) − , (49)and ∂g ( (cid:101) r ii ) ∂ (cid:61) { (cid:101) r ii } = (1 − (cid:60) { (cid:101) r ii } ) (cid:0) (cid:61) { (cid:101) r ii } (cid:1) − + (1 − (cid:60) { (cid:101) r ii } ) (cid:18) (cid:61) { (cid:101) r ii } (cid:113) (1 − (cid:60) { (cid:101) r ii } ) + 4 (cid:61) { (cid:101) r ii } (cid:19) − . (50)Since the derivative is required to be a real scalar, we take the absolute value of the complexderivative and, therefore, we modify the derivative of the projector function (47) as follows: g (cid:48) ,ik ( (cid:101) r ik ) = (cid:12)(cid:12) − j g (cid:48) ( (cid:101) r ii ) (1 + j g ( (cid:101) r ii )) − (cid:12)(cid:12) i = k i (cid:54) = k. (51)Lastly, we replace the projector function, g , with g in lines and of Algorithm 4 .VI. N

UMERICAL R ESULTS : P

ERFORMANCE AND C OMPLEXITY A NALYSIS

A. Simulation Model and Parameters

We present simulation results to assess the performance of the proposed algorithm. We assumethat the IRS is located at a ﬁxed distance of m from the BS and the users are spread uniformlyat a radial distance of m to m from the IRS. A path-based propagation channel model, alsoknown as parametric channel model [22], is used. Such model accounts for both scattered andreﬂected signal components. The channel between the IRS and the BS is generated accordingTABLE I: Simulation parameters and their notations Parameter Notation Parameter Notation

IRS antenna spacing d s BS antenna spacing d b Angle of departure (BS array vector) φ Elevation angle (IRS array vector) ϕ Azimuth angle (IRS array vector) ψ Channel path gain c IRS-BS distance d IRS

User-IRS distance d User-BS distance d (cid:48) Path-loss at reference distance C Reference distance d Path-loss exponent η Number of channel paths Q Noise variance σ w to: H b-s = (cid:112) L ( d IRS ) Q IRS (cid:88) q =1 c q a IRS ( ϕ q , ψ q ) a BS ( φ q ) T . (52)Here, Q IRS and L ( d IRS ) denote the number of channel paths and the distance-dependent path-lossfactor, respectively. The vectors a BS ( φ ) and a IRS ( ϕ, ψ ) are array response vectors for the BS andthe IRS, respectively. The coefﬁcients c q in (52) denote the path gains which are modeled by acomplex normal distribution c q ∼ C N ( c q ; 0 , . Assuming uniform linear array (ULA) with N antennas is used at the BS, we have a BS ( φ ) = [1 , e π j d b λ cos φ , · · · , e π j d b λ ( N −

1) cos φ ] T wherein λ , φ and d b represent wavelength, angle of departure (AOD), and inter-antenna spacing at the BS,respectively. The IRS is equipped with (square) uniform planar array (UPA) with K antennaelements. The array response vector for the IRS is expressed as follows: a IRS ( ϕ, ψ ) = (cid:112) | cos ϕ |  e π j d s λ sin ϕ sin ψ ... e π j d s λ ( √ K −

1) sin ϕ sin ψ  ⊗  e π j d s λ sin ϕ cos ψ ... e π j d s λ ( √ K −

1) sin ϕ cos ψ  , (53)Here d s represents inter-antenna spacing at the IRS whereas ϕ and ψ are angles of elevation andazimuth, respectively. In simulations we set d b = d s = λ/ . All three angles φ, ϕ and ψ areuniformly distributed in the interval [0 , π ) . The channel of direct link between the BS and thesingle-antenna user m with Q b-u number of paths is modeled as follows: h b-u ,m = (cid:112) L ( d m ) Q b-u (cid:88) q =1 c m,q a BS ( φ m,q ) , m = 1 , · · · , M. (54)Similar to the IRS-BS channel, c m,q ∼ C N ( c m,q ; 0 , and the angle φ m is uniformly distributedin [0 , π ) . The channel vectors in (54) are assumed to be independent among users. Finally,the channel vector for the link between each m -th user and the IRS with Q s-u channel paths ismodeled as follows: h s-u ,m = (cid:112) L ( d (cid:48) m ) Q s-u (cid:88) q =1 c m,q a IRS ( ϕ m,q , ψ m,q ) . (55)The term L ( d ) = C ( d/d ) η in (52), (54), (55) is the distance-dependent path-loss factor, where C denotes the path-loss at a reference distance d = 1 m, and η is the path-loss exponent.Moreover, to include the line-of-sight (LOS) component in the channel, the gain of one channelpath is set to a minimum of dB higher than the other channel paths. In the simulations,for the IRS-BS channel, d IRS = 150 m is ﬁxed whereas the user-BS distance, d , and user-IRS distance, d (cid:48) , vary for each user according to its location from the BS and the IRS, respectively.In all simulations, we set C = − dB, η = 3 . (NLOS channel), η = 2 (LOS channel), Q b-u = 2 , Q s-u = 2 , Q IRS = 10 , (cid:15) = 10 − and σ w = 1 . The results are averaged over channel realizations.We consider the following two scenarios: First, we consider a case where none of the channelscontains a LOS component. Then we consider the case scenario where BS-IRS and IRS-userchannels have a LOS component but all the direct BS-user channels have no LOS component. Theproposed VAMP-base algorithm is compared against the following three different conﬁgurations:i. A MIMO system assisted by one IRS where the joint optimization of the phase matrix theand precoding is solved through alternate optimization and penalty-based ADMM.ii. A massive MIMO system with a large number of BS antennas where optimal precoding iscomputed using the maximal ratio transmission (MRT) technique.iii. An IRS-assisted MIMO system with unoptimized IRS phases and MMSE transmit precod-ing. B. Benchmarking Metrics

We use the two metrics for performance evaluation, namely the sum-rate , (cid:98) C , and the normal-ized root mean square error (NRMSE) which are deﬁned as follows: (cid:98) C = M (cid:88) m =1 log (1 + ρ m ) , (56)where SINR, ρ m , of user m is given by: ρ m = (cid:12)(cid:12) h H m f m (cid:12)(cid:12) σ w + (cid:88) i (cid:54) = m (cid:12)(cid:12) h H m f i (cid:12)(cid:12) , (57)where h Hm = h H s-u ,m ΥH b-s + h H b-u ,m .NRMSE ( α, Υ , F ) (cid:44) √ M (cid:113)(cid:13)(cid:13) α (cid:0) H H s-u ΥH b-s + H H b-u (cid:1) F − I M (cid:13)(cid:13) F + M α σ w , (58) C. Performance Results1) NLOS Scenario:

Here we set the number of users to M = 9 and the number of BSantennas to N = 32 for every conﬁguration except for massive MIMO for which we use N =512 . Fig. 4(a), depicts the achievable sum rate versus the transmit power, P , for the different (a) M = 9 and K = 256 . (b) M = 9 , N = 32 and P = 10 dB. (c) M = 9 , N = 32 and K = 256 . Fig. 4: NLOS channels: (a) Sum-rate versus transmit power, (b) Sum-rate versus the number ofIRS reﬂective elements, and (c) NRMSE versus the number of iterations.considered transmission schemes. The superiority of the proposed algorithm can be observed asit outperforms massive MIMO system with a signiﬁcantly smaller number of transmit antennas.Since the proposed algorithm is based on VAMP, it outperforms the ADMM based solutiondue to the automatic tuning of scalar parameters (i.e., scalar precisions) by VAMP which behavesimilarly to a penalty parameter inside the algorithm. This improves convergence, unlike ADMM,where the penalty parameter must be manually chosen. As per the IRS-assisted conﬁguration,where we do not optimize the reﬂection coefﬁcients (i.e., Υ = I K ), a huge gap is observedbetween the achieved sum-rates as compared to the proposed algorithm.Fig. 4(b) shows a plot of sum-rate against the number of IRS reﬂective elements. It is to beobserved that even with a small number of active transmitting antennas and merely ten pathsbetween the IRS and the BS, the sum-rate for the proposed solution keeps increasing with theincreasing number of reﬂective elements. In contrast, the sum-rate saturates after a small gainwhen the IRS reﬂection coefﬁcients are not optimized. Compared to ADMM, the proposedalgorithm shows higher throughput at every point.The convergence of the proposed algorithm is investigated in Fig. 4(c) which depicts theNRMSE as a function of the number of iterations. Observe that the major portion of the gain isachieved in the ﬁrst few iterations. The small number of iterations required for convergence incombination with the low per-iteration complexity makes the proposed algorithm very attractivefrom the practical implementation point of view. The superiority of the proposed VAMP-basedalgorithm over the ADMM-based approach because of the feedback mechanism of VAMP that (a) M = 9 and K = 256 . (b) M = 9 , N = 32 and P = 10 dB. (c) M = 9 , N = 32 and K = 256 . Fig. 5: LOS IRS-user, BS-IRS channels: (a) Sum-rate versus transmit power, (b) Sum-rate versusthe number of IRS reﬂection elements, and (c) NRMSE versus the number of iterations.controls the weightage given to the Υ computed in each iteration compared to that of thepreceding iteration with the help of scalar precision parameters that act as weighting coefﬁcientsfor Υ computed in the current and the preceding iteration. In addition to the plots shiftingdownward, the increase in transmit power widens the gap between ADMM and the proposedVAMP-based algorithm. This demonstrates that the proposed algorithm utilizes the availabletransmit power in a more efﬁcient way than ADMM.

2) BS-IRS and IRS-user Channels with LOS components:

This situation is encountered intypical urban/suburban environments where the BS is located far away from the users and hasno direct LOS component. However, the IRS is installed at a location where a LOS componentis present in the BS-IRS link as well as the user-IRS link. We include a LOS path in the IRS-BSchannel and in the user-IRS channels by setting the gain of one path dB higher than the others.Fig. 5(a) illustrates the sum-rate versus the transmit power for this conﬁguration. As expected,the results show that by adding the LOS component, the use of an IRS together with theproposed joint beamforming optimization solution yields considerably higher sum-rates comparedto a massive MIMO system with no IRS. Moreover, although the ADMM-based solution nowoutperforms massive MIMO, the advantage of the proposed VAMP-based solution over ADMMis higher when compared to the NLOS channels.The results in Fig. 5(b), i.e., sum-rate vs the number of IRS reﬂective elements, also exhibit thesame trends as in the NLOS channel yet with a broader gap between the plots, thereby, corrobo-rating the superiority of the proposed solution. This is because the presence of a LOS component (a) M = 9 , N = 32 and P = 10 dB. (b) M = 9 , N = 32 and K = 256 . Fig. 6: LOS IRS-user, BS-IRS channels: (a) Sum-rate versus IRS elements with simple reactiveloading, and (b) NRMSE versus the number of iterations.helps the VAMP-based joint beamforming scheme to focus most of the transmit/reﬂected energyin that direction. This is clearly depicted in Fig. 5(c), where the NRMSE achieved by the proposedalgorithm is approximately equivalent to the NRMSE achieved by the ADMM-based solutionbut at almost dB lower SNR.

3) Reactive Loading at the IRS:

In this subsection, we assess the effect of replacing theunimodular constraint on the reﬂection coefﬁcients by a reactive load. We use the same channelconﬁguration in Section VI-C2. But, we rely on optimizing just the reactive part of the reﬂectioncoefﬁcients. Therefore, as portrayed by Fig. 6(a), the new constraint decreases the throughputwhen compared with ideal phase shifters setup. However, the resulting sum-rate is still muchhigher than the one obtained by using unoptimized IRS reﬂection coefﬁcients. Similarly, dueto having less room for optimizing the optimal reﬂection coefﬁcients, Fig. 6(b) shows thatthe NRMSE saturates sooner and at a higher value as compared to the case of a unimodularconstraint. Nonetheless, even with the more practical reactive load constraint, the resultingNRMSE is close to the NRMSE achieved by the ADMM with ideal phase shifters.

D. Complexity Analysis and CPU Execution Time

Note that, by utilizing the Kronecker structure, we successfully avoid any large matrix mul-tiplication or even taking SVD of Kronecker products of matrices. Indeed, let A = α H H s-u and B = H b-u F . For our system model, the matrices A and B T are of the same size M × K . TABLE II: (a) Comparison between CPU execution time of the proposed VAMP-based algorithmand the ADMM-based algorithm for various design conﬁgurations, and (b) CPU running timefor a ﬁxed number of iterations for different design conﬁgurations.

II(a) Algorithm terminates when NRMSE < − or t > . Design Parameters VAMP-basedalgorithm(msec) ADMM-basedalgorithm(msec) M = 2 , N = 16 , K = 64 1 . . M = 8 , N = 16 , K = 64 45 53 M = 2 , N = 16 , K = 256 6 . . M = 8 , N = 16 , K = 256 47 85 II(b) Algorithm terminates when t > . Design Parameters VAMP-basedalgorithm(msec) M = 4 , N = 64 , K = 64 31 . M = 8 , N = 64 , K = 64 35 M = 4 , N = 64 , K = 256 117 M = 8 , N = 64 , K = 256 125 M = 4 , N = 256 , K = 64 410 M = 8 , N = 256 , K = 64 424 Assuming that the matrices A and B to be full rank, the complexity of the truncated SVDsof the matrices is in the order of O ( M K ) . The computational complexity of the Kroneckerproduct of two vectors in line of Algorithm 4 is in the order of O ( M ) . The component-wiseproducts of vectors in lines and are also of order O ( M ) . The projector function and itsderivative has a complexity order of O ( K ) . The functions g ( H ) and g ( H ) entail a complexityof O ( M N + N ) . The complexity of all other matrix multiplication operations elsewhereincluding the LMMSE part are dominated by O ( M N K + M K ) . Therefore, the overall periteration complexity of the algorithm is of order O ( M K + M N K + M K + M N + N ) . Since M < N in our case, the overall per iteration complexity simpliﬁes to O ( M N K + M K + N ) .Table II(a) provides a comparison of CPU (central processing unit) utilization time between theVAMP-based solution and the ADMM-based approach for different design conﬁgurations. For comparison purpose, we measure the time until the NRMSE falls below a certain value, thereforewe run the algorithms until NRMSE < − or t > T MAX while setting T MAX = 100 . We setthe channel simulation parameters as in Section VI-C2 with P = 10 dB. The algorithms aresimulated on MATLAB R2020a on a laptop PC having a Core i7-4720HQ processor and 16 GBof RAM with Windows 10 operating system. The superiority of the proposed solution over theADMM-based approach in terms of convergence time is especially highlighted in the case whenthere is a high number of users to serve, which requires efﬁcient utilization of the IRS reﬂectionelements in order to achieve the desired NRMSE. To validate the theoretical complexity of theVAMP-based solution, we run the proposed algorithm for a ﬁxed number of iterations by setting T MAX = 30 for different design conﬁgurations. It is seen from Table II(b) that the numericalresults of the proposed solution coincide with the aforementioned complexity analysis. In otherwords, the results are in line with the fact that the complexity is cubic in N , quadratic in K ,and only linear in M . VII. C ONCLUSION

We tackled the problem of joint active and passive beamforming design for an IRS-assisteddownlink multi-user MIMO system. Using block coordinate descent, overall optimization wasdecomposed into two sub-optimization tasks. We presented a novel approach of utilizing theapproximate message passing-based low-complexity VAMP algorithm to solve the optimizationproblem at hand. We did so by ﬁrst extending the traditional VAMP algorithm and then usedthe extended version to ﬁnd the optimal phase shifters matrix that must be used by the IRS. Theoptimal precoder at the BS, however, was found in closed-form using Lagrange optimization.The overall algorithm was shown to have low per-iteration complexity, which is quadratic withthe number of IRS reﬂective elements, cubic with the number of BS antennas, but grows onlylinearly with the number of users. We also devised a more practical version of the proposeddesign by solving the joint optimization problem under a different constraint on the reﬂectioncoefﬁcients, wherein each antenna element is terminated by a reactive load instead of an idealphase shifter. Numerical results suggest that, with a small number of active BS antennas, theproposed solution yields higher throughput or spectral efﬁciency than a massive MIMO systemunder both LOS and NLOS propagation conditions. We also gauged the convergence of theproposed solution against ADMM and showed that the proposed algorithm converges faster, andresults in higher throughput at the same time. Since the proposed solution provides ﬂexibility in terms of choosing the constraint on the IRS reﬂection coefﬁcients, it opens up the possibilityof solving the joint optimization problem using more physically realistic models for the IRSreﬂection elements. The optimality of the VAMP-based solution can also be investigated throughstate evolution analysis. For non-convex optimization problems like optimizing the phase shiftermatrix under unimodular constraint, the proposed solution might be asymptotically (for largematrix sizes) optimal if the proximal functions (i.e., g and g (cid:48) ) are Lipschitz continuous and thestate evolution analysis reveals that the VAMP-based algorithm has only one ﬁxed point [30].A PPENDIX

AWe solve the following optimization problem: arg min χ f ( χ ) , (59)where f ( χ ) (cid:44) (cid:12)(cid:12)(cid:12)(cid:12)(cid:101) r −

11 + j χ (cid:12)(cid:12)(cid:12)(cid:12) , (60)where χ ∈ R and (cid:101) r ∈ C . Expanding the objective function, we re-express it as follows: arg min χ (cid:101) r ∗ (cid:101) r − (cid:101) r ∗ χ − (cid:101) r − j χ + 1(1 − j χ )(1 + j χ ) . (61)Let a (cid:44) (cid:60) { (cid:101) r } and b (cid:44) (cid:61) { (cid:101) r } . We substitute a and b into equation (61) and simplify the objectivefunction as: arg min χ a + b + 1 − a χ + 2 bχ χ . (62)We deﬁne c (cid:44) (1 − a ) and take the derivative w.r.t. χ and equate it to zero to get the followingexpression: f (cid:48) ( χ ) = 2 b (1 − χ )(1 + χ ) − cχ (1 + χ ) = 0 . (63)Simplifying the equation in (63) we get: bχ + cχ − b = 0 . (64)The roots of the quadratic equation in (64) are real and distinct and given as under: χ = − c − √ c + 4 b b . (65)and χ = − c + √ c + 4 b b , (66) where b (cid:54) = 0 . We take the second derivative w.r.t. χ and after simpliﬁcations we have thefollowing expression: f (cid:48)(cid:48) ( χ ) = 2(1 + χ ) (2 bχ − bχ + 3 cχ − c ) . (67)Substituting χ = χ in equation (67) and simplifying the result we get: f (cid:48)(cid:48) ( χ ) = 1(1 + χ ) (cid:18) b (cid:16) c + c √ c + 4 b (cid:17) + 4 (cid:16) c + √ c + 4 b (cid:17)(cid:19) . (68)Since b (cid:54) = 0 , therefore we have c √ c + 4 b > | c | and √ c + 4 b > | c | which implies that f (cid:48)(cid:48) ( χ ) > . Similarly, we have: f (cid:48)(cid:48) ( χ ) = 1(1 + χ ) (cid:18) b (cid:16) c − c √ c + 4 b (cid:17) + 4 (cid:16) c − √ c + 4 b (cid:17)(cid:19) < , b (cid:54) = 0 . (69)Thus, we choose: χ opt = χ = − (1 − a ) − (cid:112) (1 − a ) + 4 b b , (70)Interestingly, the solution, χ , results in the same sign for (cid:61) { (1 + j χ ) − } as (cid:61) { (cid:101) r } .R EFERENCES [1] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling up MIMO:Opportunities and challenges with very large arrays,”

IEEE Signal Processing Magazine , vol. 30, no. 1, pp. 40–60, 2012.[2] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,”

IEEECommunications Magazine , vol. 52, no. 2, pp. 186–195, 2014.[3] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,”

IEEE Transactionson Wireless Communications , vol. 9, no. 11, pp. 3590–3600, 2010.[4] S. Buzzi, I. Chih-Lin, T. E. Klein, H. V. Poor, C. Yang, and A. Zappone, “A survey of energy-efﬁcient techniques for 5Gnetworks and challenges ahead,”

IEEE Journal on Selected Areas in Communications , vol. 34, no. 4, pp. 697–709, 2016.[5] S. Zhang, Q. Wu, S. Xu, and G. Y. Li, “Fundamental green tradeoffs: Progresses, challenges, and impacts on 5G networks,”

IEEE Communications Surveys & Tutorials , vol. 19, no. 1, pp. 33–56, 2016.[6] X. Tan, Z. Sun, D. Koutsonikolas, and J. M. Jornet, “Enabling indoor mobile millimeter-wave networks based on smartreﬂect-arrays,” in

IEEE Conference on Computer Communications (INFOCOM) , 2018, pp. 270–278.[7] C. Liaskos, S. Nie, A. Tsioliaridou, A. Pitsillides, S. Ioannidis, and I. Akyildiz, “A new wireless communication paradigmthrough software-controlled metasurfaces,”

IEEE Communications Magazine , vol. 56, no. 9, pp. 162–169, 2018.[8] S. Hu, F. Rusek, and O. Edfors, “Beyond massive MIMO: The potential of data transmission with large intelligent surfaces,”

IEEE Transactions on Signal Processing , vol. 66, no. 10, pp. 2746–2758, 2018.[9] C. Pan, H. Ren, K. Wang, W. Xu, M. Elkashlan, A. Nallanathan, and L. Hanzo, “Multicell MIMO communications relyingon intelligent reﬂecting surfaces,”

IEEE Transactions on Wireless Communications , vol. 19, no. 8, pp. 5218–5233, 2020.[10] Q. Wu and R. Zhang, “Intelligent reﬂecting surface enhanced wireless network via joint active and passive beamforming,”

IEEE Transactions on Wireless Communications , vol. 18, no. 11, pp. 5394–5409, 2019. [11] C. Huang, A. Zappone, G. C. Alexandropoulos, M. Debbah, and C. Yuen, “Reconﬁgurable intelligent surfaces for energyefﬁciency in wireless communication,” IEEE Transactions on Wireless Communications , vol. 18, no. 8, pp. 4157–4170,2019.[12] Q. Wu and R. Zhang, “Beamforming optimization for wireless network aided by intelligent reﬂecting surface with discretephase shifts,”

IEEE Transactions on Communications , vol. 68, no. 3, pp. 1838–1851, 2019.[13] G. Zhou, C. Pan, H. Ren, K. Wang, and A. Nallanathan, “A framework of robust transmission design for IRS-aided MISOcommunications with imperfect cascaded channels,”

IEEE Transactions on Signal Processing , vol. 68, pp. 5092–5106,2020.[14] H. Guo, Y.-C. Liang, and S. Xiao, “Model-free optimization for reconﬁgurable intelligent surface with statistical CSI,” arXiv preprint, arXiv:1912.10913 , 2019.[15] X. Yu, D. Xu, and R. Schober, “Optimal beamforming for MISO communications via intelligent reﬂecting surfaces,” in

IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) . IEEE, 2020,pp. 1–5.[16] P. Wang, J. Fang, X. Yuan, Z. Chen, and H. Li, “Intelligent reﬂecting surface-assisted millimeter wave communications:Joint active and passive precoding design,”

IEEE Transactions on Vehicular Technology , 2020.[17] S. Abeywickrama, R. Zhang, Q. Wu, and C. Yuen, “Intelligent reﬂecting surface: Practical phase shift model andbeamforming optimization,”

IEEE Transactions on Communications , vol. 68, no. 9, pp. 5849–5863, 2020.[18] Q.-U.-A. Nadeem, A. Kammoun, A. Chaaban, M. Debbah, and M.-S. Alouini, “Asymptotic analysis of large intelligentsurface assisted MIMO communication,”

ArXiv , vol. abs/1903.08127, 2019.[19] Y. Li, M. Jiang, Q. Zhang, and J. Qin, “Joint beamforming design in multi-cluster MISO NOMA reconﬁgurable intelligentsurface-aided downlink communication networks,”

IEEE Transactions on Communications , vol. 69, no. 1, pp. 664–674,2021.[20] D. P. Bertsekas, “Nonlinear programming,”

Journal of the Operational Research Society , vol. 48, no. 3, pp. 334–334, 1997.[21] S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate message passing,”

IEEE Transactions on InformationTheory , vol. 65, no. 10, pp. 6664–6684, 2019.[22] R. W. Heath Jr. and A. Lozano,

Foundations of MIMO Communication . Cambridge University Press, 2018.[23] Dongning Guo, S. Shamai, and S. Verdu, “Mutual information and minimum mean-square error in Gaussian channels,”

IEEE Transactions on Information Theory , vol. 51, no. 4, pp. 1261–1282, 2005.[24] A. Manoel, F. Krzakala, G. Varoquaux, B. Thirion, and L. Zdeborov´a, “Approximate message-passing for convexoptimization with non-separable penalties,” arXiv preprint arXiv:1809.06304 , 2018.[25] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algorithms for compressed sensing: I. motivation andconstruction,” in

IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo) , 2010, pp. 1–5.[26] C. M. Bishop,

Pattern Recognition and Machine Learning . springer, 2006.[27] M. Bayati and A. Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,”

IEEE Transactions on Information Theory , vol. 57, no. 2, pp. 764–785, 2011.[28] M. Joham, W. Utschick, and J. A. Nossek, “Linear transmit processing in MIMO communications systems,”

IEEETransactions on Signal Processing , vol. 53, no. 8, pp. 2700–2712, 2005.[29] A. Sinha, P. Malo, and K. Deb, “A review on bilevel optimization: From classical to evolutionary approaches andapplications,”

IEEE Transactions on Evolutionary Computation , vol. 22, no. 2, pp. 276–295, 2018.[30] J. Barbier, F. Krzakala, N. Macris, L. Miolane, and L. Zdeborov´a, “Optimal errors and phase transitions in high-dimensionalgeneralized linear models,”