Reconfigurable Intelligent Surface Assisted Edge Machine Learning
Shanfeng Huang, Shuai Wang, Rui Wang, Miaowen Wen, Kaibin Huang
aa r X i v : . [ c s . I T ] F e b Reconfigurable Intelligent Surface AssistedEdge Machine Learning
Shanfeng Huang ∗† , Shuai Wang ∗ , Rui Wang ∗ , Miaowen Wen ‡ and Kaibin Huang †∗ Department of Electrical and Electronic Engineering, Southern University of Science and Technology † Department of Electrical and Electronic Engineering, The University of Hong Kong ‡ School of Electronic and Information Engineering, South China University of TechnologyEmail: { sfhuang, huangkb } @eee.hku.hk, { wangs3,wang.r } @sustech.edu.cn, [email protected] Abstract —The ever-growing popularity and rapid improvingof artificial intelligence (AI) have raised rethinking on theevolution of wireless networks. Mobile edge computing (MEC)provides a natural platform for AI applications since it providesrich computation resources to train AI models, as well as low-latency access to the data generated by mobile and Internetof Things devices. In this paper, we present an infrastructureto perform machine learning tasks at an MEC server with theassistance of a reconfigurable intelligent surface (RIS). In con-trast to conventional communication systems where the principalcriteria are to maximize the throughput, we aim at optimizing thelearning performance. Specifically, we minimize the maximumlearning error of all users by jointly optimizing the beamformingvectors of the base station and the phase-shift matrix of the RIS.An alternating optimization-based framework is proposed tooptimize the two terms iteratively, where closed-form expressionsof the beamforming vectors are derived, and an alternatingdirection method of multipliers (ADMM)-based algorithm isdesigned together with an error level searching framework toeffectively solve the nonconvex optimization problem of thephase-shift matrix. Simulation results demonstrate significantgains of deploying an RIS and validate the advantages of ourproposed algorithms over various benchmarks.
I. I
NTRODUCTION
The prevalence of mobile terminals and rapid growth ofInternet of Things (IoT) technology have boosted a wide spec-trum of new applications, many of which are computation-intensive and latency-critical, such as image recognition, mo-bile augmented reality, and edge machine intelligence. Mobileedge computing (MEC) is naturally well-suited for the AI-oriented networks, and the marriage of mobile edge computing(MEC) and AI has given rise to a new research area, called“edge intelligence (EI)” or “edge AI” [1]–[4]. Moreover, toovercome wireless channel hostilities, an emerging paradigmcalled reconfigurable intelligent surface (RIS) was proposed,aiming at creating a smart radio environment by turning thewireless environment into an optimization variable, which canbe controlled and programmed [5]. Hence, we would like toinvestigate the design of an RIS-assisted edge learning system.In contrast with conventional communication systemswhere the general goals are to maximize the throughput,
This work was supported in part by the National Natural Science Founda-tion of China under Grant 62001203, in part by the Shenzhen FundamentalResearch Program under Grant JCYJ20190809142403596, and in part bythe Fundamental Research Funds for the Central Universities under Grant2019SJ02. edge ML systems aim at optimizing the learning performance.As a result, the well-known resource allocation schemes thatare optimized for conventional systems, such as water-fillingand max-min fairness schemes may lead to poor learningperformance since they do not take into account the learning-specific factors such as model and data complexities. Recently,there are some outstanding works that aim at optimizingthe resource allocation schemes for learning-centric systems.In [6], the authors proposed a data-importance aware userscheduling scheme for edge ML systems, where data areregarded as having different importance levels based on cer-tain importance measurement. Nevertheless, the analysis ismainly based on SVM. For more general ML models, theimportance of training data is hard to quantify. In [7], theauthors investigated an RIS-assisted edge inference system.However, the inference tasks are considered as general edgecomputing tasks in essence, leading to few insights for realML tasks. More recently, our previous work [8] put forth andvalidated a nonlinear classification error model for ML tasks,based on which a learning-centric power allocation schemewas proposed and shown to outperform conventional resourceallocation schemes significantly with respect to learning error.In this paper, we further extend [8] to the scenario wherean RIS is deployed to provide intelligence to the wirelesschannels. With the presence of the RIS, new challenges inthe beamforming vector and phase shift optimization arise.In this paper, we shed light on the design of RIS-assistededge ML with heterogeneous learning tasks. Specifically, weadopt the nonlinear learning error model in [8], [9], andaim at minimizing the maximum learning error of all thelearning tasks by jointly optimizing the beamforming vectorsat the base station (BS) and the phase shift matrix at theRIS. The optimization problem is nonconvex and involvesmany optimization variables. To address this challenge, wedesign an alternating optimization (AO)-based frameworkto decompose the primal problem and each subproblem isefficiently solved either in closed form or with low-complexityalgorithms. Specifically, the optimization of beamformingvectors is shown to be equivalent to maximizing the signal-to-interference-plus-noise ratios (SINRs), and closed-form ex-pressions are derived. To solve the phase-shift matrix opti-mization problem, we propose an error level searching (ELS)-based framework to transform the exponential objective into … Reconfigurable Intelligent SurfaceMDs Direct links(User-BS links)Datasets Reflected links(User-RIS-BS links) Base StationMEC ServerMachine Learning Models
Fig. 1: An RIS-assisted edge ML system.SINR constraints, and exploit alternating direction methodof multipliers (ADMM) to decouple the problem to a setof subproblems that can be solved in a distributed manner.Simulations on well-known ML models and public datasetsverify the nonlinear learning error model, and demonstrate thatour proposed scheme can achieve significantly lower learningerror than that of various benchmarks.II. S
YSTEM M ODEL
We consider an edge ML system as shown in Fig. 1, wherean intelligent edge server attached to a BS with N antennasis serving K single-antenna users, each with an ML task.The communication is assisted by an RIS, consisting of M passive reflecting elements which could rotate the phase ofthe incident signal waves. In particular, the edge server isdesignated to train K classification models by collecting dataobserved at the K mobile users. The classification models canbe CNNs, SVMs, etc.The training data are transmitted from the mobile users tothe edge server via wireless channels which have intrinsicrandom feature due to multi-path effect and can suffer fromhigh propagation loss [10]. To this end, this paper considersan RIS-assisted scheme that can configure the channel intel-ligently by tuning the phase shifts of the reflecting elementsadaptively. With the presence of the RIS, the channel fromuser k to the BS includes both the direct link (user-BS link)and the reflected link (user-RIS-BS link), where the reflectedlink consists of the user-RIS link, the phase shifts at RIS, andthe RIS-BS link [11]. Denote the channel vector from k -thuser to the BS as h k . It can be expressed as h k = h d ,k |{z} direct link + G H Θ H h r ,k | {z } reflected link , (1)where h d ,k ∈ C N × , h r ,k ∈ C M × , and G ∈ C M × N denotethe channel vectors and matrix from user k to the BS, fromuser k to the RIS, and from the RIS to the BS, respectively.Moreover, Θ = β diag ( e jϕ , · · · , e jϕ M ) ∈ C M × M denotesthe phase-shift matrix of the RIS, where β ∈ [0 , is theamplitude reflection coefficient and ϕ m ∈ [0 , π ) is the phaseshift of the m -th reflecting element. Without loss of generality, β is typically set to 1. Denote the transmitted signal of user k ∈ { , , · · · , K } as x k with power E [ | x k | ] = p k . Accordingly, the receivedsignal y = [ y , · · · , y N ] ∈ C N × at the BS can be written as y = K X k =1 h k x k + n , (2)where n ∼ CN ( , σ I N ) is the additive white Gaussian noise(AWGN) at the BS. A beamforming vector w k with w H k w k =1 is applied for the received signal from each user k . Thus,the estimated symbol at the BS for user k is given by ˆ y k = w H k y = w H k h k x k + K X i =1 ,i = k w H k h i x i + w H k n . (3)Accordingly, the achievable spectral efficiency of user k interms of bps/Hz is given by R k = log p k | w H k h k | P Ki =1 ,i = k p i | w H k h i | + σ ! . (4)Let B denote the bandwidth of the considered system, and T the total transmission time. Thus, the total number of datasamples for user k ’s task is given by v k = (cid:22) BT R k D k (cid:23) ≈ BT R k D k , (5)where D k is the number of bits for each data sample, and theapproximation is due to ⌊ x ⌋ → x when x ≫ .III. P ROBLEM F ORMULATION
In contrast with the conventional communication systemswhere the principal design criterion is usually to maximize thethroughput, edge ML systems aim at maximizing the learningperformance. Specifically, in the edge ML system consideredherein, we aim at minimizing the maximum learning error ofall the participating users by jointly optimizing the beamform-ing vectors { w k } Kk =1 at the BS, and the phase-shift matrix Θ of the RIS. Thus, we have the following optimization problem. P : min { w k } Kk =1 , Θ , v max k =1 , ··· ,K Ψ k ( v k )s . t . w H k w k = 1 , k = 1 , · · · , K, (6a) BT R k D k = v k , k = 1 , · · · , K, (6b) ≤ ϕ m < π, m = 1 , · · · , M, (6c)where Ψ k ( v k ) is the classification error of learning model k given the sample size v k . In general, the functions { Ψ , · · · , Ψ K } can hardly be expressed analytically. Propi-tiously, their approximate expressions can be obtained basedon the analysis in [8], [9], [12]. Here, we simply adopt thenon-linear model developed in [8], i.e., Ψ k ( v k ) ≈ c k v − d k k , (7)where c k and d k are tuning parameters which can be obtainedby curve fitting.y substituting (6b) and (7) into the objective function,problem P is transformed into the following problem. P { w k } Kk =1 , Θ max k =1 , ··· ,K c k (cid:20) BTD k log (cid:18) | w H k ( h d ,k + G H Θ H h r ,k ) | p k P Ki =1 ,i = k | w H k ( h d ,i + G H Θ H h r ,i ) | p i + σ (cid:19)(cid:21) − d k s . t . w H k w k = 1 , k = 1 , · · · , K, (8a) | θ m | = 1 , m = 1 , · · · , M. (8b) Remark 1 (Scaling law with large number of reflecting ele-ments) . To gain some insights on how the number of reflectingelements affect the learning accuracy, we consider the casewith a single user and a single-antenna BS, i.e., K = 1 and N = 1 , and ignore the direct link. Thus, G becomesa vector and is denoted by g . The receive SNR becomes p | h H r Θg | /σ . Assume Θ = I M , h r ∼ CN ( , ̺ h I M ) , and g ∼ CN ( , ̺ g I M ) . According to the central limit theorem,we have h H r g ∼ CN ( , M ̺ h ̺ g ) as M → ∞ . Thus, theaverage receive SNR is E h r , g p | h H r Θg | /σ = M p̺ h ̺ g . Thisindicates that the learning error is asymptotically proportionalto (log ( M )) − d . IV. J
OINT B EAMFORMING AND P HASE -S HIFTER D ESIGN
Note that problem P is highly nonconvex due to thenonlinear learning error model in the objective function andthe unit-modulus constraints. Moreover, the large numberof optimization variables make the problem even more un-tractable. Fortunately, the optimization of the beamformingvectors and the phase-shift matrix can be decomposed. Hence,we adopt an AO-based algorithm to solve P in an iterativemanner via alternatively optimizing { w k } Kk =1 and Θ . A. Beamforming Vectors Optimization
Note that given Θ , the objective function of the originalproblem P is still nonconvex in w k . However, since theobjective function is monotonically decreasing in the SINRof each user and is decomposable with respect to k , theoptimization of w k with fixed Θ can be equivalently solvedby maximizing the SINR of each user k . Consequently, theoptimal beamforming vectors can be obtained by solving thefollowing K subproblems. P w k : max w k | w H k ( h d ,k + G H Θ H h r ,k ) | p k P Ki =1 ,i = k | w H k ( h d ,i + G H Θ H h r ,i ) | p i + σ s . t . w H k w k = 1 . (9a)Although each problem P w k is still nonconvex in w k , itsoptimal solution can be achieved in closed-form as given inthe following lemma. Lemma 1.
Given Θ , the optimal solution of P w k for arbitrary k is given in closed-form by w ⋄ k = (cid:16) I N + P Ki =1 p i σ h i h H i (cid:17) − h k (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) I N + P Ki =1 p i σ h i h H i (cid:17) − h k (cid:13)(cid:13)(cid:13)(cid:13) , (10) where h i = h d ,i + G H Θ H h r ,i , for i = 1 , · · · , K .Proof. The proof is similar to that in [13] and is neglectedhere due to page limitation.
B. Phase-shift Matrix Optimization
Given the beamforming vectors { w k } Kk =1 , there remainonly the unit-modulus constraints of the RIS elements. Byexploiting Θ = diag( θ ) and setting a k,i = β diag( h H r ,i ) Gw k , b k,i = h H d ,i w k , the optimization of phase-shift matrix Θ canbe equivalently written as the following problem. P θ :min θ max k =1 , ··· ,K c k BTD k log | θ H a k,k + b k,k | p kK P i =1 ,i = k | θ H a k,i + b k,i | p i + σ − d k s . t . | θ m | = 1 , ∀ m = 1 , · · · , M. (11a)A common approach to address the nonconvex unit-modulus constraints is semidefinite relaxation (SDR). Nev-ertheless, even SDR can circumvent the nonconvex unit-modulus constraints, the objective function remains noncon-vex due to the nonlinear learning error model. Moreover, thesolution achieved by SDR generally does not conform to therank-1 constraint, and large number of Gaussian randomiza-tions are required to find a rank-1 solution, which increases thecomplexity dramatically. Besides, SDR lifts the optimizationvariable from an M × vector to an M × M matrix. Thus,SDR cannot scale up the number of RIS elements. To thisend, we propose an ELS framework and an ADMM-basedalgorithm to solve problem P θ . Specifically, we first definethe error level of the k -th ML task for all k as δ k = c k " BTD k log | θ H a k,k + b k,k | p k P Ki =1 ,i = k | θ H a k,i + b k,i | p i + σ ! − d k . (12)Thus, the maximum error level of all participating tasks isgiven by δ = max k ∈K δ k . Then, for a given error level δ ,problem P θ can be equivalently transformed to the followingfeasibility problem. P ′ θ : find θ (13a) s . t . | θ H a k,k + b k,k | p k P Ki =1 ,i = k | θ H a k,i + b k,i | p i + σ ≥ γ k , ∀ k (13b) | θ m | = 1 , ∀ m, (13c)where γ k = 2 Dk ( ckδ ) dkBT − . If problem P ′ θ is feasible, wecan reduce δ ; otherwise, we increase δ to make P ′ θ feasible,until δ converges to a certain value. We call this procedureerror level searching (ELS).In the sequel, we design an ADMM-based algorithm tosolve problem P ′ θ . By introducing a series of auxiliary vari-bles { q k } Kk =1 and a new constraint q = q = · · · = q K = θ , problem P ′ θ can be further rewritten as the following form. find { q k } Kk =1 , θ (14a) s . t . | q H k a k,k + b k,k | p k P Ki =1 ,i = k | q H k a k,i + b k,i | p i + σ ≥ γ k , k = 1 , · · · , K (14b) | θ m | = 1 , m = 1 , · · · , M (14c) q k = θ , k = 1 , · · · , K. (14d)The augmented Lagrangian (using the scaled dual variable)of problem (14) is given by L ρ ( q , · · · , q K , θ , u , · · · , u K )= K X k =1 I B k ( q k )+ I C ( θ )+ ρ K X k =1 k q k − θ + u k k , where B k is the feasibility region of the k -th constraint in(14b) and C is the feasibility region of constraint (14c), ρ > is the penalty parameter, and u k is the scaled dual variable.Moreover, I is the indicator function with I X ( x ) = 0 if x ∈ X and + ∞ otherwise.The ADMM algorithm iteratively update q k , θ and u k asfollows, until a feasible solution is found. q t +1 k := argmin q k L ρ ( q , · · · , q K , θ t , u t , · · · , u tK ) , ∀ k (15a) θ t +1 := argmin θ L ρ ( q t +11 , · · · , q t +1 K , θ , u t , · · · , u tK ) (15b) u t +1 k := u tk + q t +1 k − θ t +1 , ∀ k (15c)In the sequel, we show that each update in (15) can be ef-ficiently solved either in closed-form or with low complexity.1) q k update: The update of q k can be equivalently writtenas the following problem after removing the irrelevant terms. q t +1 k = argmin q k K X k =1 I A k ( q k )+ ρ K X k =1 k q k − θ t + u tk k . (16)Note that the update of q k can be decoupled into K subprob-lems for each k . min q k k q k − θ t + u tk k (17a) s . t . | q H k a k,k + b k,k | p k P Ki =1 ,i = k | q H k a k,i + b k,i | p i + σ ≥ γ k . (17b)Although problem (17) is nonconvex in general, strongduality holds and the Lagrangian relaxation produces theoptimal solution since there is only one constraint [14]. Thus,we can solve it efficiently using the Lagrangian dual method.Rephrasing problem (17), it can be equivalently written as thefollowing compact form. min q k k q k − ζ tk k (18a) s . t . q H k A k q k − { b H k q k } = τ k , (18b)where ζ tk = θ t − u tk , A k = γ k P Ki =1 ,i = k a k,i a H k,i p i − a k,k a H k,k p k , b k = a k,k b ∗ k,k p k − γ k P Ki =1 ,i = k a k,i b ∗ k,i p i , and τ k = | b k,k | p k − γ k P Ki =1 ,i = k | b k,i | p i − γ k σ . Note that we have changed the constraint to equality to simplify the follow-up derivations. When considering the inequality constraint, wecan just check whether q k = ζ tk is feasible. If yes, q ∗ k = ζ tk isthe optimal solution; if not, the optimal solution must satisfythe equality constraint.For ease of notation, we neglect the subscript k in problem(18), and let A = QΛQ H be the eigenvalue decomposition.Then, problem (18) is equivalent to min ˜ q k ˜ q − ˜ ζ t k (19a) s . t . ˜ q H Λ ˜ q − { ˜ b H ˜ q } = τ, (19b)where ˜ q = Q H q , ˜ ζ t = Q H ζ t , and ˜ b = Q H b .As a result, the optimal solution can be efficiently foundby the following lemma. Lemma 2.
The optimal solution of problem (19) is given by ˜ q ∗ = ( I + µ Λ ) − ( ˜ ζ + µ ˜ b ) , (20) where µ is the Lagrangian multiplier of problem (19). More-over, µ can be found by solving a nonlinear equation χ ( µ ) =0 with χ ( µ ) = M X m =1 λ m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ ζ m + µ ˜ b m µλ m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − ( M X m =1 ˜ b ∗ m ˜ ζ m + µ ˜ b m µλ m ) − τ, where λ m is the m -th diagonal entry of Λ .Proof. Please refer to Appendix A.Taking derivative on χ ( µ ) with respect to µ , we have χ ′ ( µ ) = − M X m =1 | ˜ b m − λ m ˜ ζ m | (1 + µλ m ) . (21)Since we assume the feasibility of problem (19), there mustexist µ with I + µ Λ (cid:23) , such that value of ˜ q minimizingthe Lagrangian also satisfies the equality constraint. Thus, µλ m ≥ , m = 1 , · · · , M , and χ ′ ( µ ) < . Therefore, χ ( µ ) ismonotonic in the possible region of the solution, and any localsolution is guaranteed to be the unique solution. Moreover, theequation χ ( µ ) = 0 can be efficiently solved by either bisectionsearch method or Newton’s method.After obtaining ˜ q k from problem (19), the optimal q k update is given by q t +1 k = Q ˜ q k . (22)2) θ update: The update of θ can be obtained by solvingthe following problem. θ t +1 = argmin θ K X k =1 k q t +1 k − θ + u tk k s . t . | θ m | = 1 , m = 1 , · · · , M. (23)Thus, the optimal θ is simply the projection of K P Kk =1 ( q t +1 k + u tk ) onto the unit-modulus constraints, i.e., θ t +1 = e j ∠ K P Kk =1 ( q t +1 k + u tk ) . (24)) u k update: The update of u k is standard dual ascent andis given by u t +1 k = u tk + q t +1 k − θ t +1 . (25)As a result, the optimal phase-shift matrix can be obtainedby jointly exploiting ELS and ADMM. C. Alternating Optimization Framework
We summarize the proposed alternating optimization algo-rithm here. Specifically, the AO algorithm is first initializedby w k and θ . Then, given fixed w tk and θ t in the t -thiteration, w t +1 k and θ t +1 in the ( t +1) -th iteration are updatedalternatively. Moreover, the convergence of the AO algorithmis demonstrated in Lemma 3. Lemma 3.
With the AO algorithm, the objective value of P is non-increasing in the consecutive iterations.Proof. Please refer to Appendix B.V. S
IMULATION R ESULTS
In this section, we evaluate the performance of our proposedalgorithms via simulations. We consider 4 users each with alearning task. The 4 learning tasks considered herein are SVM,CNN with MNIST dataset, CNN with Fashion-MNIST datasetand PointNet. The number of BS antennas varies from 10 to50, and the number of reflecting elements of the RIS is set to50. The total transmission time T = 10 s, bandwidth B = 5 MHz, and noise power σ = − dBm. All the channelsinvolved are assumed to be Rayleigh fading, and the channelcoefficients (i.e., the elements in G , h d ,k , and h r ,k , for all k )are normalized with zero mean and unit variance [15]. Thepathloss exponent of the direct link, i.e., from BS to the usersis 4 and the pathloss exponents of BS-RIS link and RIS-userlink are set to 2.2. A. Parameter Fitting for the Learning Tasks
In this part, the parameters c k ’s and d k ’s in the nonlinearlearning error models for the K learning tasks are acquiredby least mean square (LMS) fitting. Specifically, the SVMclassifier is trained on the digits dataset in the PythonScikit-learn ML toolbox. The dataset contains 1797 imagesof size × from 10 classes, with 5 bits (representingintegers ∼ ) for each pixel. Thus, each images needs D k = 8 × × bits. We train the SVMclassifier using the first 1000 image samples with sizes , , , , , , , and use the last 797 imagesamples for testing. We record the corresponding test errorswith different training sample sizes. After that, LMS fittingis applied to obtain ( c k , d k ) for the SVM classifier. Then,we consider a 6-layer CNN with MNIST and Fashion-MNIST datasets, respectively. The CNN consists of a × convolution layer (with ReLu activation, 32 channels), a × max pooling layer, another × convolution layer (withReLu activation, 64 channels), a × max pooling layer, afully connected layer with 128 units (with ReLu activation),and a final softmax output layer (with 10 outputs). Forthe MNIST dataset, it consists of 70000 grayscale images (a training set of 60000 examples and a test set of 10000examples) of handwritten digits, each with × pixels.Thus, each image needs D k = 28 × × bits.Each image sample of Fashion-MNIST dataset also needs D k = 6276 bits. We train the CNN classifier with samplesizes , , , , , , , , , for both MNIST and Fashion-MNIST datasets, and recordthe test errors corresponding to the different training samplesizes. Then, similar LMS fitting is exploited to obtain ( c k , d k ) for these two learning tasks. We also considerPointNet [16] as another learning task to classify 3Dpoint clouds dataset ModelNet40 that contains 12311CAD models from 40 object categories and splits into9843 for training and 2468 for testing. Each data samplehas 2000 points with three single-precision floating-pointcoordinates (4 Bytes). Thus, the data size per sample is D k = (2000 × × × bits. Similarly, we trainthe PointNet with sample sizes , , , , ,and fit the result to the nonlinear learning error model toobtain ( c k , d k ) for PointNet. The resultant fitting parameters ( c k , d k ) of the nonlinear learning error model are (7.07,0.81),(10.79,0.73), (0.82,0.23) and (0.96,0.24) for SVM, MNIST,Fashion-MNIST and PointNet, respectively. B. Convergence of AO and ADMM algorithms
The convergence of the AO algorithm has been provedtheoretically and we further show it by simulations here. Thetop of Fig. 2 shows that the value of the objective function isnon-increasing in the consecutive AO iterations, and convergesafter around 4 iterations, which is quite efficient. Moreover,the convergence of the ADMM algorithm is also verified bysimulations. It is shown in the bottom of Fig. 2 that the primalresidual concussively degrades and the ADMM algorithmconverges after around 30 iterations.
C. Comparison with Various Benchmarks
We demonstrate the superiority of our RIS-assistedlearning-centric scheme with various benchmarks in Fig. 3.The three benchmarks considered in this paper are: 1) withoutdeploying the RIS, 2) deploying the RIS with random phase-shift matrix, and 3) maximizing the sumrate as in conventionalcommunication systems. It is shown that the performancesof learning-centric schemes are always dramatically betterthan that of conventional sumrate-maximization scheme, evenwithout the help of the RIS, which demonstrates the ne-cessity of redesign of the wireless communication systemsin learning-driven scenarios. Also shown in Fig. 3 is thatwith the presence of the RIS, the learning performance canbe improved remarkably, justifying the gain of deployingthe RIS. Moreover, it can be seen that our proposed phase-shift optimization can further improve the learning accuracysignificantly, validating the effectiveness of our proposedoptimization algorithms.To demonstrate the validity of the nonlinear learning errormodel, we compare the learning errors obtained from thetheoretical error model with those obtained from real exper-iments. Specifically, we record the optimal number of data
AO iterations
Lea r n i ng e rr o r ADMM iterations P r i m a l r e s i dua l Fig. 2: Convergence of the AO andADMM algorithms.
10 15 20 25 30 35 40 45 50
Number of antennas
Lea r n i ng e rr o r Optimal phase, with RISRandom phase, with RISwithout RISMax sumrate
Fig. 3: Learning error comparison ofvarious benchmarks.
SVM CNN/MNIST CNN/Fashion PointNet00.020.040.060.080.10.120.140.16
Lea r n i ng e rr o r Theoretical valueExperimental value
Fig. 4: Theoretical learning errors v.s.Experimental learning errors.samples for each ML task and the corresponding theoreticallearning error. Then, we use the optimized sample sizes totrain the corresponding learning models, and average theresulting learning errors from 10 runs to obtain the experi-mental learning errors. Fig. 4 shows that the theoretical resultsconform to the experimental results very well.VI. C
ONCLUSIONS
We have investigated the RIS-assisted mobile edge com-puting systems with learning tasks. The design of a learning-efficient system was achieved by jointly optimizing the beam-forming vectors of the BS and the phase-shift matrix of theRIS in an AO framework. Efficient algorithms were elabo-rated to address the highly nonconvex optimization probleminduced by the nonlinear learning error model and unit-modulus constraints of RIS elements. Experimental resultsdemonstrated the validity of the learning error model andsuperiority of our proposed scheme over various benchmarks.A
PPENDIX
A. Proof of Lemma 2
Since strong duality holds for QCQP problems with oneconstraint as proved in [14], we can solve the dual problemof (19). The Lagrangian of (19) is L ( q , µ ) = k ˜ q − ζ t k + µ (˜ q H Λ ˜ q − { ˜( b ) H ˜ q } − τ ) . Setting ∂ L ( q ,µ ) ∂ q = 0 , we obtain the optimal ˜ q as ˜ q ∗ = ( I + µ Λ ) − ( ˜ ζ + µ ˜ b ) . Substituting the above equation back to the equality constraintin (19), it becomes a nonlinear equation with respect to µ : χ ( µ ) = M X m =1 λ m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ ζ m + µ ˜ b m µλ m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − ( M X m =1 ˜ b ∗ m ˜ ζ m + µ ˜ b m µλ m ) − τ, where λ m is the m -th diagonal entry of Λ . B. Proof of Lemma 3
For ease of notation, we denote the objective functionof P as g ( w , θ ) . Assume w t and θ t are obtained by thecorresponding optimization problems in the t -th iteration,respectively. Then, we have g ( w t , θ t +1 ) = min θ g ( w t , θ ) ≤ g ( w t , θ t ) . Analogously, it holds that g ( w t +1 , θ t +1 ) = min w g ( w , θ t +1 ) ≤ g ( w t , θ t +1 ) ≤ g ( w t , θ t ) . R EFERENCES[1] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edgeintelligence: Paving the last mile of artificial intelligence with edgecomputing,”
Proc. IEEE , vol. 107, no. 8, pp. 1738–1762, Aug. 2019.[2] E. Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-demand acceler-ating deep neural network inference via edge computing,”
IEEE Trans.Wireless Commun. , vol. 19, no. 1, pp. 447–457, Jan. 2020.[3] G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang, “Towardan intelligent edge: Wireless communication meets machine learning,”
IEEE Commun. Mag. , vol. 58, no. 1, pp. 19–25, Jan. 2020.[4] S. Yu, X. Chen, L. Yang, D. Wu, M. Bennis, and J. Zhang, “Intelligentedge: Leveraging deep imitation learning for mobile edge computationoffloading,”
IEEE Wireless Commun. , vol. 27, no. 1, pp. 92–99, Feb.2020.[5] M. D. Renzo, M. Debbah, D.-T. Phan-Huy, A. Zappone, M.-S.Alouini, C. Yuen, V. Sciancalepore, G. C. Alexandropoulos, J. Hoydis,H. Gacanin, J. d. Rosny, A. Bounceur, G. Lerosey, and M. Fink, “Smartradio environments empowered by reconfigurable AI meta-surfaces: anidea whose time has come,”
EURASIP J. Wirel. Commun. Netw. , vol.2019, no. 1, p. 129, May 2019.[6] D. Liu, G. Zhu, J. Zhang, and K. Huang, “Data-importance aware userscheduling for communication-efficient edge machine learning,”
IEEETrans. Cogn. Commun. Netw. , pp. 1–1, 2020.[7] S. Hua and Y. Shi, “Reconfigurable intelligent surface for green edgeinference in machine learning,” in
Proc. IEEE Global Commun. Conf.(GLOBECOM) Wkshps , 2019, pp. 1–6.[8] S. Wang, Y. Wu, M. Xia, R. Wang, and H. V. Poor, “Machineintelligence at the edge with learning centric power allocation,”
IEEETrans. Wireless Commun. , vol. 19, no. 11, pp. 7293–7308, Jul. 2020.[9] M. Johnson, P. Anderson, M. Dras, and M. Steedman, “Predictingaccuracy on large datasets from smaller pilot data,” in
Proc. ACL ,Melbourne, Australia, Jul. 2018, pp. 450–455.[10] A. Goldsmith,
Wireless Communications . Cambridge University Press,2005.[11] Q. Wu and R. Zhang, “Intelligent reflecting surface enhanced wirelessnetwork via joint active and passive beamforming,”
IEEE Trans. Wire-less Commun. , vol. 18, no. 11, pp. 5394–5409, Nov. 2019.[12] C. Beleites, U. Neugebauer, T. Bocklitz, C. Krafft, and J. Popp, “Samplesize planning for classification models,”
Analytica Chimica Acta , vol.760, pp. 25 – 33, Jan. 2013.[13] E. Bjornson, M. Bengtsson, and B. Ottersten, “Optimal multiusertransmit beamforming: A difficult problem with a simple solutionstructure [lecture notes],”
IEEE Signal Process. Mag. , vol. 31, no. 4,pp. 142–148, Jul. 2014.[14] S. Boyd, S. P. Boyd, and L. Vandenberghe,
Convex Optimization .Cambridge university press, 2004.[15] H. Guo, Y. Liang, J. Chen, and E. G. Larsson, “Weighted sum-rate maximization for reconfigurable intelligent surface aided wirelessnetworks,”
IEEE Trans. Wireless Commun. , vol. 19, no. 5, pp. 3064–3076, May 2020.[16] R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, “Pointnet: Deeplearning on point sets for 3d classification and segmentation,” in