[PDF] Reconfigurable Intelligent Surface Assisted Edge Machine Learning

Abstract

The ever-growing popularity and rapid improving of artificial intelligence (AI) have raised rethinking on the evolution of wireless networks. Mobile edge computing (MEC) provides a natural platform for AI applications since it provides rich computation resources to train AI models, as well as low-latency access to the data generated by mobile and Internet of Things devices. In this paper, we present an infrastructure to perform machine learning tasks at an MEC server with the assistance of a reconfigurable intelligent surface (RIS). In contrast to conventional communication systems where the principal criteria are to maximize the throughput, we aim at optimizing the learning performance. Specifically, we minimize the maximum learning error of all users by jointly optimizing the beamforming vectors of the base station and the phase-shift matrix of the RIS. An alternating optimization-based framework is proposed to optimize the two terms iteratively, where closed-form expressions of the beamforming vectors are derived, and an alternating direction method of multipliers (ADMM)-based algorithm is designed together with an error level searching framework to effectively solve the nonconvex optimization problem of the phase-shift matrix. Simulation results demonstrate significant gains of deploying an RIS and validate the advantages of our proposed algorithms over various benchmarks.

Full PDF

aa r X i v : . [ c s . I T ] F e b Reconﬁgurable Intelligent Surface AssistedEdge Machine Learning

Shanfeng Huang ∗† , Shuai Wang ∗ , Rui Wang ∗ , Miaowen Wen ‡ and Kaibin Huang †∗ Department of Electrical and Electronic Engineering, Southern University of Science and Technology † Department of Electrical and Electronic Engineering, The University of Hong Kong ‡ School of Electronic and Information Engineering, South China University of TechnologyEmail: { sfhuang, huangkb } @eee.hku.hk, { wangs3,wang.r } @sustech.edu.cn, [email protected] Abstract —The ever-growing popularity and rapid improvingof artiﬁcial intelligence (AI) have raised rethinking on theevolution of wireless networks. Mobile edge computing (MEC)provides a natural platform for AI applications since it providesrich computation resources to train AI models, as well as low-latency access to the data generated by mobile and Internetof Things devices. In this paper, we present an infrastructureto perform machine learning tasks at an MEC server with theassistance of a reconﬁgurable intelligent surface (RIS). In con-trast to conventional communication systems where the principalcriteria are to maximize the throughput, we aim at optimizing thelearning performance. Speciﬁcally, we minimize the maximumlearning error of all users by jointly optimizing the beamformingvectors of the base station and the phase-shift matrix of the RIS.An alternating optimization-based framework is proposed tooptimize the two terms iteratively, where closed-form expressionsof the beamforming vectors are derived, and an alternatingdirection method of multipliers (ADMM)-based algorithm isdesigned together with an error level searching framework toeffectively solve the nonconvex optimization problem of thephase-shift matrix. Simulation results demonstrate signiﬁcantgains of deploying an RIS and validate the advantages of ourproposed algorithms over various benchmarks.

I. I

NTRODUCTION

The prevalence of mobile terminals and rapid growth ofInternet of Things (IoT) technology have boosted a wide spec-trum of new applications, many of which are computation-intensive and latency-critical, such as image recognition, mo-bile augmented reality, and edge machine intelligence. Mobileedge computing (MEC) is naturally well-suited for the AI-oriented networks, and the marriage of mobile edge computing(MEC) and AI has given rise to a new research area, called“edge intelligence (EI)” or “edge AI” [1]–[4]. Moreover, toovercome wireless channel hostilities, an emerging paradigmcalled reconﬁgurable intelligent surface (RIS) was proposed,aiming at creating a smart radio environment by turning thewireless environment into an optimization variable, which canbe controlled and programmed [5]. Hence, we would like toinvestigate the design of an RIS-assisted edge learning system.In contrast with conventional communication systemswhere the general goals are to maximize the throughput,

This work was supported in part by the National Natural Science Founda-tion of China under Grant 62001203, in part by the Shenzhen FundamentalResearch Program under Grant JCYJ20190809142403596, and in part bythe Fundamental Research Funds for the Central Universities under Grant2019SJ02. edge ML systems aim at optimizing the learning performance.As a result, the well-known resource allocation schemes thatare optimized for conventional systems, such as water-ﬁllingand max-min fairness schemes may lead to poor learningperformance since they do not take into account the learning-speciﬁc factors such as model and data complexities. Recently,there are some outstanding works that aim at optimizingthe resource allocation schemes for learning-centric systems.In [6], the authors proposed a data-importance aware userscheduling scheme for edge ML systems, where data areregarded as having different importance levels based on cer-tain importance measurement. Nevertheless, the analysis ismainly based on SVM. For more general ML models, theimportance of training data is hard to quantify. In [7], theauthors investigated an RIS-assisted edge inference system.However, the inference tasks are considered as general edgecomputing tasks in essence, leading to few insights for realML tasks. More recently, our previous work [8] put forth andvalidated a nonlinear classiﬁcation error model for ML tasks,based on which a learning-centric power allocation schemewas proposed and shown to outperform conventional resourceallocation schemes signiﬁcantly with respect to learning error.In this paper, we further extend [8] to the scenario wherean RIS is deployed to provide intelligence to the wirelesschannels. With the presence of the RIS, new challenges inthe beamforming vector and phase shift optimization arise.In this paper, we shed light on the design of RIS-assistededge ML with heterogeneous learning tasks. Speciﬁcally, weadopt the nonlinear learning error model in [8], [9], andaim at minimizing the maximum learning error of all thelearning tasks by jointly optimizing the beamforming vectorsat the base station (BS) and the phase shift matrix at theRIS. The optimization problem is nonconvex and involvesmany optimization variables. To address this challenge, wedesign an alternating optimization (AO)-based frameworkto decompose the primal problem and each subproblem isefﬁciently solved either in closed form or with low-complexityalgorithms. Speciﬁcally, the optimization of beamformingvectors is shown to be equivalent to maximizing the signal-to-interference-plus-noise ratios (SINRs), and closed-form ex-pressions are derived. To solve the phase-shift matrix opti-mization problem, we propose an error level searching (ELS)-based framework to transform the exponential objective into … Reconﬁgurable Intelligent SurfaceMDs Direct links(User-BS links)Datasets Reﬂected links(User-RIS-BS links) Base StationMEC ServerMachine Learning Models

Fig. 1: An RIS-assisted edge ML system.SINR constraints, and exploit alternating direction methodof multipliers (ADMM) to decouple the problem to a setof subproblems that can be solved in a distributed manner.Simulations on well-known ML models and public datasetsverify the nonlinear learning error model, and demonstrate thatour proposed scheme can achieve signiﬁcantly lower learningerror than that of various benchmarks.II. S

YSTEM M ODEL

We consider an edge ML system as shown in Fig. 1, wherean intelligent edge server attached to a BS with N antennasis serving K single-antenna users, each with an ML task.The communication is assisted by an RIS, consisting of M passive reﬂecting elements which could rotate the phase ofthe incident signal waves. In particular, the edge server isdesignated to train K classiﬁcation models by collecting dataobserved at the K mobile users. The classiﬁcation models canbe CNNs, SVMs, etc.The training data are transmitted from the mobile users tothe edge server via wireless channels which have intrinsicrandom feature due to multi-path effect and can suffer fromhigh propagation loss [10]. To this end, this paper considersan RIS-assisted scheme that can conﬁgure the channel intel-ligently by tuning the phase shifts of the reﬂecting elementsadaptively. With the presence of the RIS, the channel fromuser k to the BS includes both the direct link (user-BS link)and the reﬂected link (user-RIS-BS link), where the reﬂectedlink consists of the user-RIS link, the phase shifts at RIS, andthe RIS-BS link [11]. Denote the channel vector from k -thuser to the BS as h k . It can be expressed as h k = h d ,k |{z} direct link + G H Θ H h r ,k | {z } reﬂected link , (1)where h d ,k ∈ C N × , h r ,k ∈ C M × , and G ∈ C M × N denotethe channel vectors and matrix from user k to the BS, fromuser k to the RIS, and from the RIS to the BS, respectively.Moreover, Θ = β diag ( e jϕ , · · · , e jϕ M ) ∈ C M × M denotesthe phase-shift matrix of the RIS, where β ∈ [0 , is theamplitude reﬂection coefﬁcient and ϕ m ∈ [0 , π ) is the phaseshift of the m -th reﬂecting element. Without loss of generality, β is typically set to 1. Denote the transmitted signal of user k ∈ { , , · · · , K } as x k with power E [ | x k | ] = p k . Accordingly, the receivedsignal y = [ y , · · · , y N ] ∈ C N × at the BS can be written as y = K X k =1 h k x k + n , (2)where n ∼ CN ( , σ I N ) is the additive white Gaussian noise(AWGN) at the BS. A beamforming vector w k with w H k w k =1 is applied for the received signal from each user k . Thus,the estimated symbol at the BS for user k is given by ˆ y k = w H k y = w H k h k x k + K X i =1 ,i = k w H k h i x i + w H k n . (3)Accordingly, the achievable spectral efﬁciency of user k interms of bps/Hz is given by R k = log p k | w H k h k | P Ki =1 ,i = k p i | w H k h i | + σ ! . (4)Let B denote the bandwidth of the considered system, and T the total transmission time. Thus, the total number of datasamples for user k ’s task is given by v k = (cid:22) BT R k D k (cid:23) ≈ BT R k D k , (5)where D k is the number of bits for each data sample, and theapproximation is due to ⌊ x ⌋ → x when x ≫ .III. P ROBLEM F ORMULATION

In contrast with the conventional communication systemswhere the principal design criterion is usually to maximize thethroughput, edge ML systems aim at maximizing the learningperformance. Speciﬁcally, in the edge ML system consideredherein, we aim at minimizing the maximum learning error ofall the participating users by jointly optimizing the beamform-ing vectors { w k } Kk =1 at the BS, and the phase-shift matrix Θ of the RIS. Thus, we have the following optimization problem. P : min { w k } Kk =1 , Θ , v max k =1 , ··· ,K Ψ k ( v k )s . t . w H k w k = 1 , k = 1 , · · · , K, (6a) BT R k D k = v k , k = 1 , · · · , K, (6b) ≤ ϕ m < π, m = 1 , · · · , M, (6c)where Ψ k ( v k ) is the classiﬁcation error of learning model k given the sample size v k . In general, the functions { Ψ , · · · , Ψ K } can hardly be expressed analytically. Propi-tiously, their approximate expressions can be obtained basedon the analysis in [8], [9], [12]. Here, we simply adopt thenon-linear model developed in [8], i.e., Ψ k ( v k ) ≈ c k v − d k k , (7)where c k and d k are tuning parameters which can be obtainedby curve ﬁtting.y substituting (6b) and (7) into the objective function,problem P is transformed into the following problem. P { w k } Kk =1 , Θ max k =1 , ··· ,K c k (cid:20) BTD k log (cid:18) | w H k ( h d ,k + G H Θ H h r ,k ) | p k P Ki =1 ,i = k | w H k ( h d ,i + G H Θ H h r ,i ) | p i + σ (cid:19)(cid:21) − d k s . t . w H k w k = 1 , k = 1 , · · · , K, (8a) | θ m | = 1 , m = 1 , · · · , M. (8b) Remark 1 (Scaling law with large number of reﬂecting ele-ments) . To gain some insights on how the number of reﬂectingelements affect the learning accuracy, we consider the casewith a single user and a single-antenna BS, i.e., K = 1 and N = 1 , and ignore the direct link. Thus, G becomesa vector and is denoted by g . The receive SNR becomes p | h H r Θg | /σ . Assume Θ = I M , h r ∼ CN ( , ̺ h I M ) , and g ∼ CN ( , ̺ g I M ) . According to the central limit theorem,we have h H r g ∼ CN ( , M ̺ h ̺ g ) as M → ∞ . Thus, theaverage receive SNR is E h r , g p | h H r Θg | /σ = M p̺ h ̺ g . Thisindicates that the learning error is asymptotically proportionalto (log ( M )) − d . IV. J

OINT B EAMFORMING AND P HASE -S HIFTER D ESIGN

Note that problem P is highly nonconvex due to thenonlinear learning error model in the objective function andthe unit-modulus constraints. Moreover, the large numberof optimization variables make the problem even more un-tractable. Fortunately, the optimization of the beamformingvectors and the phase-shift matrix can be decomposed. Hence,we adopt an AO-based algorithm to solve P in an iterativemanner via alternatively optimizing { w k } Kk =1 and Θ . A. Beamforming Vectors Optimization

Note that given Θ , the objective function of the originalproblem P is still nonconvex in w k . However, since theobjective function is monotonically decreasing in the SINRof each user and is decomposable with respect to k , theoptimization of w k with ﬁxed Θ can be equivalently solvedby maximizing the SINR of each user k . Consequently, theoptimal beamforming vectors can be obtained by solving thefollowing K subproblems. P w k : max w k | w H k ( h d ,k + G H Θ H h r ,k ) | p k P Ki =1 ,i = k | w H k ( h d ,i + G H Θ H h r ,i ) | p i + σ s . t . w H k w k = 1 . (9a)Although each problem P w k is still nonconvex in w k , itsoptimal solution can be achieved in closed-form as given inthe following lemma. Lemma 1.

Given Θ , the optimal solution of P w k for arbitrary k is given in closed-form by w ⋄ k = (cid:16) I N + P Ki =1 p i σ h i h H i (cid:17) − h k (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) I N + P Ki =1 p i σ h i h H i (cid:17) − h k (cid:13)(cid:13)(cid:13)(cid:13) , (10) where h i = h d ,i + G H Θ H h r ,i , for i = 1 , · · · , K .Proof. The proof is similar to that in [13] and is neglectedhere due to page limitation.

B. Phase-shift Matrix Optimization

Given the beamforming vectors { w k } Kk =1 , there remainonly the unit-modulus constraints of the RIS elements. Byexploiting Θ = diag( θ ) and setting a k,i = β diag( h H r ,i ) Gw k , b k,i = h H d ,i w k , the optimization of phase-shift matrix Θ canbe equivalently written as the following problem. P θ :min θ max k =1 , ··· ,K c k  BTD k log  | θ H a k,k + b k,k | p kK P i =1 ,i = k | θ H a k,i + b k,i | p i + σ  − d k s . t . | θ m | = 1 , ∀ m = 1 , · · · , M. (11a)A common approach to address the nonconvex unit-modulus constraints is semideﬁnite relaxation (SDR). Nev-ertheless, even SDR can circumvent the nonconvex unit-modulus constraints, the objective function remains noncon-vex due to the nonlinear learning error model. Moreover, thesolution achieved by SDR generally does not conform to therank-1 constraint, and large number of Gaussian randomiza-tions are required to ﬁnd a rank-1 solution, which increases thecomplexity dramatically. Besides, SDR lifts the optimizationvariable from an M × vector to an M × M matrix. Thus,SDR cannot scale up the number of RIS elements. To thisend, we propose an ELS framework and an ADMM-basedalgorithm to solve problem P θ . Speciﬁcally, we ﬁrst deﬁnethe error level of the k -th ML task for all k as δ k = c k " BTD k log | θ H a k,k + b k,k | p k P Ki =1 ,i = k | θ H a k,i + b k,i | p i + σ ! − d k . (12)Thus, the maximum error level of all participating tasks isgiven by δ = max k ∈K δ k . Then, for a given error level δ ,problem P θ can be equivalently transformed to the followingfeasibility problem. P ′ θ : ﬁnd θ (13a) s . t . | θ H a k,k + b k,k | p k P Ki =1 ,i = k | θ H a k,i + b k,i | p i + σ ≥ γ k , ∀ k (13b) | θ m | = 1 , ∀ m, (13c)where γ k = 2 Dk ( ckδ ) dkBT − . If problem P ′ θ is feasible, wecan reduce δ ; otherwise, we increase δ to make P ′ θ feasible,until δ converges to a certain value. We call this procedureerror level searching (ELS).In the sequel, we design an ADMM-based algorithm tosolve problem P ′ θ . By introducing a series of auxiliary vari-bles { q k } Kk =1 and a new constraint q = q = · · · = q K = θ , problem P ′ θ can be further rewritten as the following form. ﬁnd { q k } Kk =1 , θ (14a) s . t . | q H k a k,k + b k,k | p k P Ki =1 ,i = k | q H k a k,i + b k,i | p i + σ ≥ γ k , k = 1 , · · · , K (14b) | θ m | = 1 , m = 1 , · · · , M (14c) q k = θ , k = 1 , · · · , K. (14d)The augmented Lagrangian (using the scaled dual variable)of problem (14) is given by L ρ ( q , · · · , q K , θ , u , · · · , u K )= K X k =1 I B k ( q k )+ I C ( θ )+ ρ K X k =1 k q k − θ + u k k , where B k is the feasibility region of the k -th constraint in(14b) and C is the feasibility region of constraint (14c), ρ > is the penalty parameter, and u k is the scaled dual variable.Moreover, I is the indicator function with I X ( x ) = 0 if x ∈ X and + ∞ otherwise.The ADMM algorithm iteratively update q k , θ and u k asfollows, until a feasible solution is found. q t +1 k := argmin q k L ρ ( q , · · · , q K , θ t , u t , · · · , u tK ) , ∀ k (15a) θ t +1 := argmin θ L ρ ( q t +11 , · · · , q t +1 K , θ , u t , · · · , u tK ) (15b) u t +1 k := u tk + q t +1 k − θ t +1 , ∀ k (15c)In the sequel, we show that each update in (15) can be ef-ﬁciently solved either in closed-form or with low complexity.1) q k update: The update of q k can be equivalently writtenas the following problem after removing the irrelevant terms. q t +1 k = argmin q k K X k =1 I A k ( q k )+ ρ K X k =1 k q k − θ t + u tk k . (16)Note that the update of q k can be decoupled into K subprob-lems for each k . min q k k q k − θ t + u tk k (17a) s . t . | q H k a k,k + b k,k | p k P Ki =1 ,i = k | q H k a k,i + b k,i | p i + σ ≥ γ k . (17b)Although problem (17) is nonconvex in general, strongduality holds and the Lagrangian relaxation produces theoptimal solution since there is only one constraint [14]. Thus,we can solve it efﬁciently using the Lagrangian dual method.Rephrasing problem (17), it can be equivalently written as thefollowing compact form. min q k k q k − ζ tk k (18a) s . t . q H k A k q k − { b H k q k } = τ k , (18b)where ζ tk = θ t − u tk , A k = γ k P Ki =1 ,i = k a k,i a H k,i p i − a k,k a H k,k p k , b k = a k,k b ∗ k,k p k − γ k P Ki =1 ,i = k a k,i b ∗ k,i p i , and τ k = | b k,k | p k − γ k P Ki =1 ,i = k | b k,i | p i − γ k σ . Note that we have changed the constraint to equality to simplify the follow-up derivations. When considering the inequality constraint, wecan just check whether q k = ζ tk is feasible. If yes, q ∗ k = ζ tk isthe optimal solution; if not, the optimal solution must satisfythe equality constraint.For ease of notation, we neglect the subscript k in problem(18), and let A = QΛQ H be the eigenvalue decomposition.Then, problem (18) is equivalent to min ˜ q k ˜ q − ˜ ζ t k (19a) s . t . ˜ q H Λ ˜ q − { ˜ b H ˜ q } = τ, (19b)where ˜ q = Q H q , ˜ ζ t = Q H ζ t , and ˜ b = Q H b .As a result, the optimal solution can be efﬁciently foundby the following lemma. Lemma 2.

The optimal solution of problem (19) is given by ˜ q ∗ = ( I + µ Λ ) − ( ˜ ζ + µ ˜ b ) , (20) where µ is the Lagrangian multiplier of problem (19). More-over, µ can be found by solving a nonlinear equation χ ( µ ) =0 with χ ( µ ) = M X m =1 λ m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ ζ m + µ ˜ b m µλ m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − ( M X m =1 ˜ b ∗ m ˜ ζ m + µ ˜ b m µλ m ) − τ, where λ m is the m -th diagonal entry of Λ .Proof. Please refer to Appendix A.Taking derivative on χ ( µ ) with respect to µ , we have χ ′ ( µ ) = − M X m =1 | ˜ b m − λ m ˜ ζ m | (1 + µλ m ) . (21)Since we assume the feasibility of problem (19), there mustexist µ with I + µ Λ (cid:23) , such that value of ˜ q minimizingthe Lagrangian also satisﬁes the equality constraint. Thus, µλ m ≥ , m = 1 , · · · , M , and χ ′ ( µ ) < . Therefore, χ ( µ ) ismonotonic in the possible region of the solution, and any localsolution is guaranteed to be the unique solution. Moreover, theequation χ ( µ ) = 0 can be efﬁciently solved by either bisectionsearch method or Newton’s method.After obtaining ˜ q k from problem (19), the optimal q k update is given by q t +1 k = Q ˜ q k . (22)2) θ update: The update of θ can be obtained by solvingthe following problem. θ t +1 = argmin θ K X k =1 k q t +1 k − θ + u tk k s . t . | θ m | = 1 , m = 1 , · · · , M. (23)Thus, the optimal θ is simply the projection of K P Kk =1 ( q t +1 k + u tk ) onto the unit-modulus constraints, i.e., θ t +1 = e j ∠ K P Kk =1 ( q t +1 k + u tk ) . (24)) u k update: The update of u k is standard dual ascent andis given by u t +1 k = u tk + q t +1 k − θ t +1 . (25)As a result, the optimal phase-shift matrix can be obtainedby jointly exploiting ELS and ADMM. C. Alternating Optimization Framework

We summarize the proposed alternating optimization algo-rithm here. Speciﬁcally, the AO algorithm is ﬁrst initializedby w k and θ . Then, given ﬁxed w tk and θ t in the t -thiteration, w t +1 k and θ t +1 in the ( t +1) -th iteration are updatedalternatively. Moreover, the convergence of the AO algorithmis demonstrated in Lemma 3. Lemma 3.

With the AO algorithm, the objective value of P is non-increasing in the consecutive iterations.Proof. Please refer to Appendix B.V. S

IMULATION R ESULTS

In this section, we evaluate the performance of our proposedalgorithms via simulations. We consider 4 users each with alearning task. The 4 learning tasks considered herein are SVM,CNN with MNIST dataset, CNN with Fashion-MNIST datasetand PointNet. The number of BS antennas varies from 10 to50, and the number of reﬂecting elements of the RIS is set to50. The total transmission time T = 10 s, bandwidth B = 5 MHz, and noise power σ = − dBm. All the channelsinvolved are assumed to be Rayleigh fading, and the channelcoefﬁcients (i.e., the elements in G , h d ,k , and h r ,k , for all k )are normalized with zero mean and unit variance [15]. Thepathloss exponent of the direct link, i.e., from BS to the usersis 4 and the pathloss exponents of BS-RIS link and RIS-userlink are set to 2.2. A. Parameter Fitting for the Learning Tasks

In this part, the parameters c k ’s and d k ’s in the nonlinearlearning error models for the K learning tasks are acquiredby least mean square (LMS) ﬁtting. Speciﬁcally, the SVMclassiﬁer is trained on the digits dataset in the PythonScikit-learn ML toolbox. The dataset contains 1797 imagesof size × from 10 classes, with 5 bits (representingintegers ∼ ) for each pixel. Thus, each images needs D k = 8 × × bits. We train the SVMclassiﬁer using the ﬁrst 1000 image samples with sizes , , , , , , , and use the last 797 imagesamples for testing. We record the corresponding test errorswith different training sample sizes. After that, LMS ﬁttingis applied to obtain ( c k , d k ) for the SVM classiﬁer. Then,we consider a 6-layer CNN with MNIST and Fashion-MNIST datasets, respectively. The CNN consists of a × convolution layer (with ReLu activation, 32 channels), a × max pooling layer, another × convolution layer (withReLu activation, 64 channels), a × max pooling layer, afully connected layer with 128 units (with ReLu activation),and a ﬁnal softmax output layer (with 10 outputs). Forthe MNIST dataset, it consists of 70000 grayscale images (a training set of 60000 examples and a test set of 10000examples) of handwritten digits, each with × pixels.Thus, each image needs D k = 28 × × bits.Each image sample of Fashion-MNIST dataset also needs D k = 6276 bits. We train the CNN classiﬁer with samplesizes , , , , , , , , , for both MNIST and Fashion-MNIST datasets, and recordthe test errors corresponding to the different training samplesizes. Then, similar LMS ﬁtting is exploited to obtain ( c k , d k ) for these two learning tasks. We also considerPointNet [16] as another learning task to classify 3Dpoint clouds dataset ModelNet40 that contains 12311CAD models from 40 object categories and splits into9843 for training and 2468 for testing. Each data samplehas 2000 points with three single-precision ﬂoating-pointcoordinates (4 Bytes). Thus, the data size per sample is D k = (2000 × × × bits. Similarly, we trainthe PointNet with sample sizes , , , , ,and ﬁt the result to the nonlinear learning error model toobtain ( c k , d k ) for PointNet. The resultant ﬁtting parameters ( c k , d k ) of the nonlinear learning error model are (7.07,0.81),(10.79,0.73), (0.82,0.23) and (0.96,0.24) for SVM, MNIST,Fashion-MNIST and PointNet, respectively. B. Convergence of AO and ADMM algorithms

The convergence of the AO algorithm has been provedtheoretically and we further show it by simulations here. Thetop of Fig. 2 shows that the value of the objective function isnon-increasing in the consecutive AO iterations, and convergesafter around 4 iterations, which is quite efﬁcient. Moreover,the convergence of the ADMM algorithm is also veriﬁed bysimulations. It is shown in the bottom of Fig. 2 that the primalresidual concussively degrades and the ADMM algorithmconverges after around 30 iterations.

C. Comparison with Various Benchmarks

We demonstrate the superiority of our RIS-assistedlearning-centric scheme with various benchmarks in Fig. 3.The three benchmarks considered in this paper are: 1) withoutdeploying the RIS, 2) deploying the RIS with random phase-shift matrix, and 3) maximizing the sumrate as in conventionalcommunication systems. It is shown that the performancesof learning-centric schemes are always dramatically betterthan that of conventional sumrate-maximization scheme, evenwithout the help of the RIS, which demonstrates the ne-cessity of redesign of the wireless communication systemsin learning-driven scenarios. Also shown in Fig. 3 is thatwith the presence of the RIS, the learning performance canbe improved remarkably, justifying the gain of deployingthe RIS. Moreover, it can be seen that our proposed phase-shift optimization can further improve the learning accuracysigniﬁcantly, validating the effectiveness of our proposedoptimization algorithms.To demonstrate the validity of the nonlinear learning errormodel, we compare the learning errors obtained from thetheoretical error model with those obtained from real exper-iments. Speciﬁcally, we record the optimal number of data

AO iterations

Lea r n i ng e rr o r ADMM iterations P r i m a l r e s i dua l Fig. 2: Convergence of the AO andADMM algorithms.

10 15 20 25 30 35 40 45 50

Number of antennas

Lea r n i ng e rr o r Optimal phase, with RISRandom phase, with RISwithout RISMax sumrate

Fig. 3: Learning error comparison ofvarious benchmarks.

SVM CNN/MNIST CNN/Fashion PointNet00.020.040.060.080.10.120.140.16

Lea r n i ng e rr o r Theoretical valueExperimental value

Fig. 4: Theoretical learning errors v.s.Experimental learning errors.samples for each ML task and the corresponding theoreticallearning error. Then, we use the optimized sample sizes totrain the corresponding learning models, and average theresulting learning errors from 10 runs to obtain the experi-mental learning errors. Fig. 4 shows that the theoretical resultsconform to the experimental results very well.VI. C

ONCLUSIONS

We have investigated the RIS-assisted mobile edge com-puting systems with learning tasks. The design of a learning-efﬁcient system was achieved by jointly optimizing the beam-forming vectors of the BS and the phase-shift matrix of theRIS in an AO framework. Efﬁcient algorithms were elabo-rated to address the highly nonconvex optimization probleminduced by the nonlinear learning error model and unit-modulus constraints of RIS elements. Experimental resultsdemonstrated the validity of the learning error model andsuperiority of our proposed scheme over various benchmarks.A

PPENDIX

A. Proof of Lemma 2

Since strong duality holds for QCQP problems with oneconstraint as proved in [14], we can solve the dual problemof (19). The Lagrangian of (19) is L ( q , µ ) = k ˜ q − ζ t k + µ (˜ q H Λ ˜ q − { ˜( b ) H ˜ q } − τ ) . Setting ∂ L ( q ,µ ) ∂ q = 0 , we obtain the optimal ˜ q as ˜ q ∗ = ( I + µ Λ ) − ( ˜ ζ + µ ˜ b ) . Substituting the above equation back to the equality constraintin (19), it becomes a nonlinear equation with respect to µ : χ ( µ ) = M X m =1 λ m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ ζ m + µ ˜ b m µλ m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − ( M X m =1 ˜ b ∗ m ˜ ζ m + µ ˜ b m µλ m ) − τ, where λ m is the m -th diagonal entry of Λ . B. Proof of Lemma 3

For ease of notation, we denote the objective functionof P as g ( w , θ ) . Assume w t and θ t are obtained by thecorresponding optimization problems in the t -th iteration,respectively. Then, we have g ( w t , θ t +1 ) = min θ g ( w t , θ ) ≤ g ( w t , θ t ) . Analogously, it holds that g ( w t +1 , θ t +1 ) = min w g ( w , θ t +1 ) ≤ g ( w t , θ t +1 ) ≤ g ( w t , θ t ) . R EFERENCES[1] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edgeintelligence: Paving the last mile of artiﬁcial intelligence with edgecomputing,”

Proc. IEEE , vol. 107, no. 8, pp. 1738–1762, Aug. 2019.[2] E. Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-demand acceler-ating deep neural network inference via edge computing,”

IEEE Trans.Wireless Commun. , vol. 19, no. 1, pp. 447–457, Jan. 2020.[3] G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang, “Towardan intelligent edge: Wireless communication meets machine learning,”

IEEE Commun. Mag. , vol. 58, no. 1, pp. 19–25, Jan. 2020.[4] S. Yu, X. Chen, L. Yang, D. Wu, M. Bennis, and J. Zhang, “Intelligentedge: Leveraging deep imitation learning for mobile edge computationofﬂoading,”

IEEE Wireless Commun. , vol. 27, no. 1, pp. 92–99, Feb.2020.[5] M. D. Renzo, M. Debbah, D.-T. Phan-Huy, A. Zappone, M.-S.Alouini, C. Yuen, V. Sciancalepore, G. C. Alexandropoulos, J. Hoydis,H. Gacanin, J. d. Rosny, A. Bounceur, G. Lerosey, and M. Fink, “Smartradio environments empowered by reconﬁgurable AI meta-surfaces: anidea whose time has come,”

EURASIP J. Wirel. Commun. Netw. , vol.2019, no. 1, p. 129, May 2019.[6] D. Liu, G. Zhu, J. Zhang, and K. Huang, “Data-importance aware userscheduling for communication-efﬁcient edge machine learning,”

IEEETrans. Cogn. Commun. Netw. , pp. 1–1, 2020.[7] S. Hua and Y. Shi, “Reconﬁgurable intelligent surface for green edgeinference in machine learning,” in

Proc. IEEE Global Commun. Conf.(GLOBECOM) Wkshps , 2019, pp. 1–6.[8] S. Wang, Y. Wu, M. Xia, R. Wang, and H. V. Poor, “Machineintelligence at the edge with learning centric power allocation,”

IEEETrans. Wireless Commun. , vol. 19, no. 11, pp. 7293–7308, Jul. 2020.[9] M. Johnson, P. Anderson, M. Dras, and M. Steedman, “Predictingaccuracy on large datasets from smaller pilot data,” in

Proc. ACL ,Melbourne, Australia, Jul. 2018, pp. 450–455.[10] A. Goldsmith,

Wireless Communications . Cambridge University Press,2005.[11] Q. Wu and R. Zhang, “Intelligent reﬂecting surface enhanced wirelessnetwork via joint active and passive beamforming,”

IEEE Trans. Wire-less Commun. , vol. 18, no. 11, pp. 5394–5409, Nov. 2019.[12] C. Beleites, U. Neugebauer, T. Bocklitz, C. Krafft, and J. Popp, “Samplesize planning for classiﬁcation models,”

Analytica Chimica Acta , vol.760, pp. 25 – 33, Jan. 2013.[13] E. Bjornson, M. Bengtsson, and B. Ottersten, “Optimal multiusertransmit beamforming: A difﬁcult problem with a simple solutionstructure [lecture notes],”

IEEE Signal Process. Mag. , vol. 31, no. 4,pp. 142–148, Jul. 2014.[14] S. Boyd, S. P. Boyd, and L. Vandenberghe,

Convex Optimization .Cambridge university press, 2004.[15] H. Guo, Y. Liang, J. Chen, and E. G. Larsson, “Weighted sum-rate maximization for reconﬁgurable intelligent surface aided wirelessnetworks,”

IEEE Trans. Wireless Commun. , vol. 19, no. 5, pp. 3064–3076, May 2020.[16] R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, “Pointnet: Deeplearning on point sets for 3d classiﬁcation and segmentation,” in