[PDF] Federated Learning on the Road: Autonomous Controller Design for Connected and Autonomous Vehicles

Abstract

A new federated learning (FL) framework enabled by large-scale wireless connectivity is proposed for designing the autonomous controller of connected and autonomous vehicles (CAVs). In this framework, the learning models used by the controllers are collaboratively trained among a group of CAVs. To capture the varying CAV participation in the FL training process and the diverse local data quality among CAVs, a novel dynamic federated proximal (DFP) algorithm is proposed that accounts for the mobility of CAVs, the wireless fading channels, as well as the unbalanced and nonindependent and identically distributed data across CAVs. A rigorous convergence analysis is performed for the proposed algorithm to identify how fast the CAVs converge to using the optimal autonomous controller. In particular, the impacts of varying CAV participation in the FL process and diverse CAV data quality on the convergence of the proposed DFP algorithm are explicitly analyzed. Leveraging this analysis, an incentive mechanism based on contract theory is designed to improve the FL convergence speed. Simulation results using real vehicular data traces show that the proposed DFP-based controller can accurately track the target CAV speed over time and under different traffic scenarios. Moreover, the results show that the proposed DFP algorithm has a much faster convergence compared to popular FL algorithms such as federated averaging (FedAvg) and federated proximal (FedProx). The results also validate the feasibility of the contract-theoretic incentive mechanism and show that the proposed mechanism can improve the convergence speed of the DFP algorithm by 40% compared to the baselines.

Full PDF

11 Federated Learning on the Road: Autonomous ControllerDesign for Connected and Autonomous Vehicles

Tengchan Zeng,

Student Member , IEEE , Omid Semiari,

Member , IEEE ,Mingzhe Chen,

Member , IEEE , Walid Saad,

Fellow , IEEE ,and Mehdi Bennis,

Fellow , IEEE

Abstract

The deployment of future intelligent transportation systems is contingent upon seamless and reliableoperation of connected and autonomous vehicles (CAVs). One key challenge in developing CAVs isthe design of an autonomous controller that can accurately execute near real-time control decisions,such as a quick acceleration when merging to a highway and frequent speed changes in a stop-and-gotrafﬁc. However, the use of conventional feedback controllers or traditional learning-based controllers,solely trained by each CAV’s local data, cannot guarantee a robust controller performance over a widerange of road conditions and trafﬁc dynamics. In this paper, a new federated learning (FL) frameworkenabled by large-scale wireless connectivity is proposed for designing the autonomous controller ofCAVs. In this framework, the learning models used by the controllers are collaboratively trained amonga group of CAVs. To capture the varying CAV participation in the FL training process and the diverselocal data quality among CAVs, a novel dynamic federated proximal (DFP) algorithm is proposed thataccounts for the mobility of CAVs, the wireless fading channels, as well as the unbalanced and non-independent and identically distributed data across CAVs. A rigorous convergence analysis is performedfor the proposed algorithm to identify how fast the CAVs converge to using the optimal autonomouscontroller. In particular, the impacts of varying CAV participation in the FL process and diverse CAVdata quality on the convergence of the proposed DFP algorithm are explicitly analyzed. Leveraging thisanalysis, an incentive mechanism based on contract theory is designed to improve the FL convergence

A preliminary version of this work has been submitted to the proceeding of IEEE Conference on Decision and Control (CDC),2021 [1]. This research was supported by the U.S. National Science Foundation under Grants CNS-1739642, CNS-1941348,and CNS-2008646, and by the Academy of Finland Project CARMA, by the Academy of Finland Project MISSION, by theAcademy of Finland Project SMARTER, as well as by the INFOTECH Project NOOR.T. Zeng and W. Saad are with Wireless@VT, Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg,VA, 24061 USA. E-mail: { tengchan, walids } @vt.edu.O. Semiari is with the Department of Electrical and Computer Engineering, University of Colorado Colorado Springs, ColoradoSprings, CO, 80918 USA. E-mail: [email protected]. Chen is with the Department of Electrical Engineering, Princeton University, Princeton, NJ, 08544 USA, and also with theShenzhen Research Institute of Big Data (SRIBD) and the Future Network of Intelligence Institute (FNii), Chinese Universityof Hong Kong, Shenzhen, 518172, China. E-mail: [email protected]. Bennis is with the Centre for Wireless Communications, University of Oulu, 90014 Oulu, Finland. E-mail:mehdi.bennis@oulu.ﬁ. a r X i v : . [ ee ss . S Y ] F e b speed. Simulation results using real vehicular data traces show that the proposed DFP-based controllercan accurately track the target CAV speed over time and under different trafﬁc scenarios. Moreover,the results show that the proposed DFP algorithm has a much faster convergence compared to popularFL algorithms such as federated averaging (FedAvg) and federated proximal (FedProx). The resultsalso validate the feasibility of the contract-theoretic incentive mechanism and show that the proposedmechanism can improve the convergence speed of the DFP algorithm by % compared to the baselines. I. I

NTRODUCTION

As a key component of tomorrow’s intelligent transportation systems (ITSs), connected andautonomous vehicles (CAVs) are emerging as a promising solution to reduce trafﬁc accidents,alleviate road congestion, and increase transportation efﬁciency. CAVs leverage sensors to-gether with wireless systems to increase their situational awareness and improve their motionplanning and automatic control. However, to operate full-ﬂedged CAVs, we need to addressa number of challenges, ranging from providing seamless wireless connectivity to designingreliable controllers. Among these challenges, designing an autonomous controller to achievetarget movements for CAVs is critical in order to allow a CAV to accomplish its target tasks andoperate safely. In particular, a CAV’s controller must accurately execute navigation decisions sothat the CAV can quickly adapt to the dynamic road trafﬁc [2]. For example, the controllersmust generate frequent slow-down and speed-up for CAVs in a stop-and-go trafﬁc, whereas arapid acceleration will be the target output for the controllers when CAVs merge into highways.

A. Motivation and Related Works

There are two common methods to design an autonomous controller for CAVs. The ﬁrstmethod uses a conventional feedback controller. In particular, the conventional feedback con-troller ﬁrst determines the CAV’s dynamic models (e.g., the tire model [3]) as well as the roadconditions (e.g., road slope [4] and slip ratio between the road and tire [5]), and then optimizesthe controller design based on these settings. However, due to various types of roads, dynamicroad trafﬁc, and varying weather and payloads, the vehicle dynamics and road conditions willchange constantly. Hence, a conventional feedback controller cannot guarantee the controllerperformance over a wide range of environmental parameter changes. To ensure that the CAVscan adapt to changing vehicle dynamics and road conditions, the second method relies on the useof adaptive controllers, based on machine learning (ML), for the CAV’s autonomy. For example, in [6], the authors propose a learning-based model predictive control (MPC) design where therecorded trajectory data is trained to optimize the parameterization of the MPC controller thatleads to the optimal closed-loop performance. In [7], a database-driven proportional-integral-derivative (PID) controller is proposed where ML algorithms are executed on a local dataset totune control parameters. These works in [8]–[10] use supervised learning models to train thecamera data and the steering commands of the human drivers for the adaptive lateral controllerdesign. However, when using learning methods (e.g., neural networks) for adaptive controllerdesign, the local data can be insufﬁcient to train the learning model due to the limited on-chipmemory available on board CAVs [11]. In fact, because of the limited storage, an individual CAVcan only store data pertaining to its most recent travels, and this data can be easily skewed and ofpoor quality. Hence, when changing to a new trafﬁc environment or when a CAV encounters lessfrequently occurring events (e.g., trafﬁc accidents), a controller solely trained by the local datacan fail to adapt to such dynamics. An effective controller design will thereby hinge on trainingthe ML model using the data collected by more than one CAV. In other words, a cooperativetraining framework among multiple CAVs will be needed for properly designing the autonomouscontroller of a CAV.To this end, one can leverage the wireless connectivity in CAVs and use federated learning(FL) to enable a network of CAVs to collaboratively train the learning models used by theircontrollers [12]. In FL, the CAVs can train the controller models based on their local data and,then, a parameter server, such as a base station (BS), can aggregate the trained controller modelsfrom CAVs. These processes will be repeated among the CAVs and parameter server iterativelyuntil all controllers converge to the optimal learning model. In this way, the learning modelcan be collaboratively trained among multiple CAVs, and such a trained model can enable aparticular CAV’s controller to adapt to new trafﬁc scenarios unknown to the CAV but experiencedby other CAVs . For example, as shown in Fig. 1(a), the CAVs participating in an FL processcan learn from each other to operate in a wide range of scenarios, such as accident, trafﬁc jam,and roadwork areas. Moreover, the FL process is naturally privacy-preserving as the CAVs donot share their local data, e.g., the history trajectory.To reap all these beneﬁts, we need to address a number of challenges. First, due to the CAV’smobility and uncertainty of wireless channels, the participation of CAVs in the FL process willvary over time, and hence, it can be challenging to guarantee a good training performance.Second, because of the unbalanced and non-independent and identically distributed (non-IID) local data across CAVs, the data quality among CAVs will be different and such diverse dataquality can impact the FL convergence. Third, when implementing FL for autonomous controllerdesign, it is necessary to design an effective mechanism that incentivizes the CAVs to participatethe ML training. In particular, the designed incentive mechanism must offer a reward to CAVsso as to compensate the cost of the energy spent on the local training and uplink transmission.Meanwhile, considering the diverse local data quality at the CAVs, the incentive mechanism mustbe designed in a way to motivate the participation of CAVs with good data quality and preventCAVs with poor data quality from engaging in the FL process. Such incentive mechanism designbecomes more challenging in the context of FL since there exists an information asymmetry between the parameter server and CAVs. That is, only the CAVs can know their own dataquality while the parameter server cannot access to the CAVs’ local data.To design an effective incentive mechanism in FL, there are a number of works using game-theoretic and learning concepts (such as deep reinforcement learning [13] and Stackelberg game[14]) where the parameter server offers rewards to the local users for their participation in the FLprocess. In particular, in each round, the parameter server will communicate iteratively with thelocal users to determine the payment plan which ensures a target number of users participating inthe FL process while minimizing the total payment at the parameter server. However, this processcan be time-consuming and could result in a non-negligible delay, posing a safety threat to thereal-time operation of CAVs. Meanwhile, there are works that use the framework of contracttheory to design realistic incentive strategies, as done in [15]–[17]. These works generally groupthe users into different types and design a contract for each user type. Users will then self-revealtheir types by choosing the contracts especially designed for their type. Nevertheless, these priorworks group users according to simple deﬁnitions of data quality (e.g., image quality [15] andaccuracy [16] and [17]). Such metrics can only capture the data quality at the level of individualdata samples while ignoring the data size and distribution. In addition, the works in [15]–[17]assume that any CAV participation in the FL process will improve the overall convergence. Infact, due to the diverse data quality, the convergence of the FL process can be impeded byCAVs that have poor quality data. Hence, for CAVs, one must design an incentive mechanismthat can offer rewards for the energy cost at CAVs while also accelerating the convergence ofthe controller design.

B. Contributions and Outcomes

The main contribution of this paper is a novel FL framework that enables CAVs to col-laboratively learn and optimize their autonomous controller design in presence of wirelesslink uncertainties and environmental dynamics. In particular, we propose a dynamic federatedproximal (DFP) algorithm. Different from federated averaging (FedAvg) algorithm [18], theproposed algorithm introduces an L regularizer in the local training process at the CAVsto minimize the impact of non-IID and unbalanced data on the FL convergence. Meanwhile,in contrast to popular FL algorithms (e.g., MOCHA [19]), we also account for the varyingparticipation of CAVs in the FL process due to CAVs’ mobility and uncertainty of wirelesschannels. We preform a rigorous convergence study for the proposed DFP algorithm wherewe identify how the wireless connectivity, mobility, and local data quality affect the learningconvergence. We also show how our algorithm can be used to design the controller for electricCAVs with stringent energy constraints, and we demonstrate how to optimize the DFP algorithmdesign by intelligently choosing the optimal number of iterations at the local training phase.To improve the convergence performance of the proposed DFP algorithm, we design anincentive mechanism based on contract theory. In particular, we model the interactions betweenthe parameter server and the CAVs as a labor market where the parameter server is the employerand CAVs are the employees. Here, we ﬁrst mathematically capture the data quality for eachCAV according to how its local data affects the overall convergence. Then, according to thedata quality, we partition the CAVs into different types, and we design a contract, i.e., resource-reward bundle, for each CAV type. The contracts will be designed to improve the convergenceof controller design by motivating CAVs with good data quality to join in the FL processwhile preventing CAVs with poor data quality from training the controller model. Using realvehicular data traces, i.e., the Berkeley deep drive (BDD) data [20] and the dataset of annotatedcar trajectories (DACT) [21], we show that the controller trained by our proposed algorithm cantrack the target speed over time and under different trafﬁc scenarios (e.g., trafﬁc accidents, trafﬁccongestion, and roadwork zones). Also, when using the proposed algorithm for the controllerdesign, the distance error is shown to be two times smaller than controllers solely trained bythe local data. In addition, simulation results show that, the proposed algorithm can achievea faster convergence than FedAvg and FedProx algorithms, leading to a quick adaptation tothe trafﬁc dynamics. Furthermore, the results validate the feasibility of the proposed contract- Road workCrash

Traffic jam

Base station (a)

ANN based auto- tuning unit

PID controller Actuator K n,p , K n,i , K n,d _v n,r + u n e n CAV n Federated learningANN model Downlink Uplink v n,a v n,a Local data (b)

Fig. 1. Illustration of our system model. The trafﬁc model is presented in (a) where green triangles and red squares,respectively, represent CAVs that do and do not participate in the FL process. The adaptive controller and learningmodels are shown in (b). theoretic incentive mechanism and show that the mechanism can improve the convergence speedof DFP algorithm by 40% compared with the baseline schemes, i.e., maximum and randomtransmit power allocations.

To the best of our knowledge, this is the ﬁrst work that develops anFL framework to optimize the autonomous controller design for CAVs.

The rest of the paper is organized as follows. Section II presents the control, learning, andcommunication models. The proposed algorithm and its convergence proof are studied in SectionIII. In Section IV, the contract-theory based incentive mechanism is introduced. Section Vprovides the simulation results and conclusions are drawn in Section VI.II. S

YSTEM MODEL

Consider a cellular BS serving a set N of N CAVs that move along a road system, asshown in Fig. 1(a). Each CAV will perceive its surrounding environment and accordingly adjustthe controller decisions in order to achieve the target movement. FL is used to collaborativelytrain the controller so that CAVs can automatically change their control parameters, executethe control decisions, and become adapted to their local trafﬁc. We will next introduce thecontroller, communication, and learning models used for our FL-based autonomous controllerdesign framework.

A. Adaptive Longitudinal Controller Model

To perceive their surrounding environment, CAVs will use sensors and communicate withnearby CAVs and BS. This environmental perception enables the longitudinal controller of eachCAV to automatically adjust its acceleration or deceleration and maintain a safe spacing and target speed. Due to the simplicity and ease of implementation of a PID controller, we assumethat it is used by CAVs to control their longitudinal movement. Then, the acceleration u n ( t ) ofvehicle n ∈ N at sample t is [22] u n ( t ) = u n ( t − (cid:18) K n,p + K n,i ∆ t + K n,d ∆ t (cid:19) e n ( t )+ (cid:18) − K n,p − K n,d ∆ t (cid:19) e n ( t − K n,d ∆ t e n ( t − , (1)where non-negative coefﬁcients K n,p , K n,i , and K n,d are, respectively, the proportional gain,integral time constant, and derivative time constant used by the PID controller at CAV n ∈ N . ∆ t is the sampling period and e n ( t ) = v n,r ( t ) − v n,a ( t ) captures the difference between the targetreference speed v n,r ( t ) and the actual speed v n,a ( t ) at sample t . Note that the target referencespeed is decided by the motion planner in the CAV based on the environmental perception .According to (1), we can calculate the actual speed at sample t + 1 as v n,a ( t + 1) = v n,a ( t ) + u n ( t )∆ t and the distance traversed between samples t and t + 1 as d n,p = v n,a ( t +1)+ v n,a ( t )2 ∆ t .Clearly, achieving the target speed and safe spacing will depend on the control parameter settingof the PID controller. Hence, it is imperative to adjust these control parameters adaptively to dealwith varying trafﬁc dynamics and road conditions. In particular, instead of spending a substantialtime to manually tune the control parameters K n,p , K n,i , and K n,d , n ∈ N , a CAV can use anadaptive PID controller enabled by an artiﬁcial neural network (ANN)-based auto-tuning unit,as shown in Fig. 1(b). In this case, to adapt to various trafﬁc conditions, the CAV will train theauto-tuning unit using its own local data and adjust the control parameters accordingly. This isan emerging approach for adaptive controller design, as discussed in [6]–[10]. B. FL Model

The ANN based auto-tuning unit in Fig. 1(b) can adaptively tune the PID control parametersto achieve the target speed. However, the CAV’s local training data (e.g., camera data containingthe longitudinal movement) is constrained by the onboard memory of the CAV, and, thus, theinformation that can be stored will be limited to a few trafﬁc scenarios. For example, for CAVsdriving on the highway, the longitudinal movement data captured by the camera will be mostlyhigh speed data. As a result, the trained controller can only operate in the highway scenario andcannot adapt to stop-and-go trafﬁc with frequent stops and accelerations when CAVs exit the As the motion planner design has been extensively studied by the prior art and is not the main scope of this work, we omitdetails about the process of choosing the target speed and we refer readers to [2] for further details. highway and drive in urban settings. In other words, by solely training the local data for theauto-tuning unit, the controller can only work in limited trafﬁc scenarios but not in presence of amore general trafﬁc pattern which could jeopardize the safe operation of CAVs. To address thischallenge, we can use the wireless connectivity of CAVs to build a cooperative, learning-basedtraining framework, i.e., FL, among multiple CAVs for the controller design.Here, we consider that CAVs will engage in an FL process to collaboratively train the ANNauto-tuning units for their adaptive controller design. In particular, a wireless BS, operatingas a parameter server, will ﬁrst generate an initial global ANN model parameter w for theauto-tuning unit and send it to all CAVs over a downlink broadcast channel. Then, in the ﬁrstcommunication round, CAVs will use the received model parameters w to independently traintheir own model based on their local data for I iterations. In the uplink, the CAVs transmit theirtrained model parameters to the BS. Next, the BS will aggregate all the received local modelparameters to update the global model parameters which are then sent back to all CAVs over thedownlink broadcast channel. This FL process is repeated over uplink-downlink channels and thelocal and global ANN models are sequentially updated in the following communication rounds.Ultimately, the ANN model parameters used by the CAVs will converge to the optimal modelafter solving the following optimization problem that captures the FL training process [23]: arg min w (1) ,..., w ( N ) ∈ R d N (cid:88) n =1 s n (cid:88) i =1 s n s N f n ( w ( n ) , ξ i ) , (2)s.t. w (1) = w (2) = ... = w ( N ) = w , (3)where s N = (cid:80) n ∈N s n is the size of the entire training data of all CAVs with s n being the size ofthe local data at CAV n . f n ( w ( n ) , ξ i ) is the loss function of CAV n when using the ANN modelparameters w ( n ) in the auto-tuning unit for the selected data ξ i . Note that, the loss function playsa pivotal role in determining the performance of the trained auto-tuning unit. The loss functionused for the controller design can be either convex [24] or non-convex [25]. We assume f ( w ) to be the value of the objective function in (2) when w ( n ) = w , n ∈ N .When training the local ANN models at CAVs, we can calculate the energy consumptionfor CAV n ∈ N in each communication round as E n, comp = κcφ ¯ sI , where κ is the energyconsumption coefﬁcient that depends on the computing system and ¯ s is the size of trainingdata at the local iteration. c is the number of computing cycles needed per bit, and φ is thefrequency of the central processing unit (CPU) clock of CAV. Accordingly, we can obtain the computing delay as t n, comp = I ¯ scφ . Due to the mobility of CAVs and the wireless fading channels,some CAVs cannot ﬁnish their local training and uplink transmission within the duration ¯ t ofthe communication round. With this in mind, next, we present the communication model usedto determine whether the locally trained model at a particular CAV can be used in the modelaggregation or not. C. Communication Model

For the uplink transmissions, we consider an orthogonal frequency-division multiple access(OFDMA) scheme where each CAV in set N will use a unique orthogonal resource block totransmit the trained ANN model parameters to the BS. The uplink data rate for the link betweena CAV n ∈ N and the BS will be r n = B log (cid:18) P n h n d − αn δ n + BN (cid:19) , (4)where B is the bandwidth of each resource block, P n is the transmit power of CAV n , and h n denotes the Rayleigh fading channel gain. Moreover, d n is the distance between CAV n andthe BS, α is the path-loss exponent, and N is the noise power spectral density. In addition, δ n = (cid:80) j (cid:54)∈N P j h j d − αj is the received interference power generated by CAVs in other cells thatshare the same resource block with CAV n . From (4), the uplink transmission delay for CAV n ∈ N can be calculated as t n, comm = s ( w ( n ) ) r n , where s ( w ( n ) ) is the size of the data packetthat depends on the trained model parameters, w ( n ) , transmitted by CAV n . The uplink energyconsumption is E n, comm = P n ˆ t where ˆ t is calculated by the product of the total number of datasymbols and the symbol duration.In the downlink, since the BS can have a higher transmit power and a larger bandwidth, thedownlink transmission delay is considered to be negligible compared to the uplink transmissiondelay, as assumed in [26]. In addition, given the higher computing power of BSs, the computingdelay at the BS can be ignored. Hence, to identify whether the local learning model update fromCAV n ∈ N can be used for the model aggregation in the BS, we can compare the time foruplink transmission and local computing at the CAV with the duration ¯ t of the communicationround. In this case, the probability that CAV n ∈ N participates at communication round t ofFL (i.e., the locally trained model at CAV n is used in the model aggregation) will be given by p n,t = P ( t n, comp + t n, comm ≤ ¯ t ) . When developing the FL framework for the CAV’s controller design, we need to address anumber of challenges. The ﬁrst challenge is that the BS can only aggregate a varying subset ofCAVs to update the global model at each communication round as a result of the mobility ofthe CAVs and the uncertainty of wireless channels. A fast convergence for the controller designwill be challenging to achieve when the participation of the CAVs in the FL process varies overtime [27]. Meanwhile, as the local data is generated under various trafﬁc scenarios and roadincidents, its distribution and size will be different across CAVs. Hence, the second challengewill be mitigating the impact of the non-IID and unbalanced local data on the convergence ofthe controller design. In the following section, we will propose a novel FL algorithm to tacklethese two challenges.Moreover, due to the energy cost in the model training and uplink transmission, anotherchallenge will be designing an incentive mechanism that encourages CAVs to participate inthe proposed FL algorithm. However, to improve the convergence performance, the incentivemechanism should only motivate a subset of CAVs which can improve the convergence processof controller design, while preventing other CAVs that impede the convergence from engagingin the FL process. Such an incentive mechanism is of great importance for enabling CAVs toquickly adapt to the local trafﬁc dynamics when exploiting our proposed FL algorithm. Next,to address this challenge, we will use the insights obtained from the convergence study of theproposed FL algorithm and design a contract-theoretic incentive mechanism.III. D

YNAMIC F EDERATED P ROXIMAL A LGORITHM FOR

CAV C

ONTROLLER D ESIGN

To address the challenges imposed by the varying CAVs’ participation in the learning processand the non-IID and unbalanced data, we propose a new DFP algorithm. In particular, we studyhow the mobility of the CAVs, wireless fading channels, and the diverse local data affect theconvergence of the learning model. Here, we will ﬁrst introduce the proposed DFP algorithmand then study its convergence.

A. Proposed Dynamic Federated Proximal Algorithm

The proposed algorithm is summarized in Algorithm 1. In particular, we assume that the CAVswill run I iterations of stochastic gradient descent (SGD) at each round. In each iteration ofSGD, CAV n ∈ N will solve the following optimization problem that minimizes the sum of theloss of a randomly selected local training sample ξ ∈ S n and an L regularizer: arg min w ∈ R n f n ( w , ξ ) + γ t || w − w t || , ξ ∈ S n , (5)where γ t is the coefﬁcient of the regularizer and w t captures the received learning modelparameters from the BS at communication round t . Different from FedAvg algorithm [18], weintroduce the L regularizer to guarantee that the trained model parameters w of CAV n ∈ N will be close to w t during the local training, reducing the variance introduced by the non-IIDand unbalanced data. Meanwhile, in contrast to popular FL algorithms, such as FedProx [28], weexplicitly consider the impact of CAVs’ mobility and uncertainty of wireless channels and modelthe participation probability as a dynamic variable for each CAV at each communication round.After I iterations of SGD at communication round t , we obtain the trained model parameters ofCAV n as follows: f n (cid:16) w ( n ) t +1 ,I (cid:17) = w t + η t I − (cid:88) i =0 (cid:16) ∇ f n ( w ( n ) t,i , ξ i ) + γ t ( w ( n ) t,i − w t ) (cid:17) , (6)where w nt, = w t , n ∈ N . B. Convergence of the Proposed DFP Algorithm

Next, we perform a convergence study to determine how fast CAVs converge to using theoptimal model in (2) when exploiting the DFP algorithm. Unlike the convergence study done byexisting works such as [18] and [28], we need to consider how both the dynamic participationprobability of CAVs and the L regularizer in the local training affect the convergence. To thisend, we make the following standard assumptions: • The gradient ∇ f n ( w ) , n ∈ N , is uniformly Lipschitz continuous in terms of w with positiveparameter L . • The upper bound of the variance of SGD with respect to the full gradient descent of eachCAV n ∈ N is E ξ ∈S n ||∇ f n ( w , ξ ) − ∇ f n ( w ) || ≤ σ , ∀ n ∈ N , ∀ w ∈ R d , where σ is theupper bound.Both assumptions are commonly used in the convergence study of machine learning algorithms(e.g., see [29]). The ﬁrst constraint can be satisﬁed by some popular loss functions used incontrol theory, such as the squared error loss function. The second constraint is often adoptedin stochastic optimization where the gradient estimator is always assumed to have a boundedvariance. In the autonomous controller design problem, the second constraint can be justiﬁedby the fact that CAVs have limited acceleration and deceleration capabilities. Using these two Algorithm 1

Dynamic Federated Proximal (DFP) Algorithm

Iutput: N , N t , S n , η t , w , I , u t , γ t , s n , n = 1 , ..., N Output:

ANN-based auto-tuning unit w for the CAV’s controller for t = 0 , ..., T − do

1. The BS sends w t to all CAVs over broadcast downlink channels.2. CAV n ∈ N updates w t for I iterations of SGD with a step size as η t in (6) and obtain w ( n ) t +1 ,I which will be sent to the BS.3. Due to the mobility and wireless fading channels, the BS can only aggregate the trainedmodel parameters from a subset N t of N t CAVs and update the global model parameters as w t +1 = (cid:80) n ∈N t s n s Nt w ( n ) t +1 ,I with s N t = (cid:80) n ∈N t s n . end assumptions, we can bound the expected loss function at communication round t + 1 as shownby the following theorem. Theorem 1.

Given that the BS sends the global learning model parameters w t to all CAVs atcommunication round t , an upper bound for the expected loss function at communication round t + 1 can be written as E ξ,n ( f ( w t +1 )) ≤ f ( w t ) − ( η t + γ t η t ) (cid:80) Nn =1 p n,t s n I ||∇ f n ( w t ) || s N (cid:80) Nj =1 p j,t s j + (cid:18) η t s N Lη t I + η t γ t s N ( I + I (1 + η t ) ) + Lη t I (cid:19) (cid:80) Nn =1 p n,t s n (cid:80) Nj =1 p j,t s j σ , (7)if the following two conditions are satisﬁed: L η t I + γ t I (1 + η t ) + 2 s N Lη t I ≤ , (8) L η t γ t I + γ t η t I + 2 s N η t γ t LI ≤ , (9)where p n,t = exp  − δ n + BN P n d − αn  s (cid:18) w ( n ) t (cid:19) B ( ¯ t − I ¯ scφ ) −  .Proof: The proof is provided in Appendix A. (cid:4)

Using Theorem 1, we can calculate how much the total loss decreases between two consecutivecommunication rounds and determine the speed with which the model converges to the optimalauto-tuning model in (2). In particular, as observed from Theorem 1, the convergence speeddepends on the participation probability p n,t , n ∈ N . Such participation probability dependson the quality of wireless fading channels and the distance between the CAVs and the server,as determined by the mobility of the CAVs. In addition, to identify how the participation of a particular CAV in the FL affects the convergence, we also need to consider the size anddistribution of the local data at CAVs. To do so, in the following corollary, we will ﬁrstmathematically deﬁne the local data quality of CAVs and then study the impact of local dataquality on the convergence of learning models. Corollary 1.

Given the conditions in (8) and (9), the local data quality of CAV n ∈ N can bedeﬁned as β n = s n (cid:34)(cid:18) η t s N + γ t η t s N (cid:19) I ||∇ f n ( w t ) || − (cid:18) η t s N Lη t I + Lη t I (cid:19) σ + η t γ t s N ( I + I (1+ η t ) ) σ (cid:35) . The set N can be divided into two subsets N (1) and N (2) with the negative and positive dataquality, respectively. In this case, the results in (7) can be simpliﬁed as f ( w t ) − E ξ,n ( f ( w t +1 )) ≥ (cid:80) n ∈N (1) p n,t β n (cid:80) Nj =1 p j,t s j + (cid:88) n ∈N (2) p n,t β n s N . (10) Proof:

The proof is provided in Appendix B. (cid:4)

According to Corollary 1, the local data quality for a CAV n ∈ N can be calculated basedon the size s n of its local data samples and the loss function f n ( w t ) . Also, from Corollary1, we observe that the participation of CAVs within the subset N (1) in the FL will impedethe convergence whereas the participation of CAVs from subset N (2) will improve the FLconvergence. In other words, depending on the value of the data quality β n , n ∈ N , theconvergence gain contributed by different CAVs can be negative or positive. In the followingcorollary, we also extend Theorem 1 to the case in which the vanilla FedAvg is used for theautonomous controller design. Corollary 2.

When using FedAvg algorithm, i.e., no L regularizer in each SGD, we can obtainthe following upper bound for the expected loss: E ξ,n ( f ( w t +1 )) ≤ f ( w t ) − η t s N (cid:80) Nn =1 p n,t s n I ||∇ f n ( w t ) || (cid:80) Nj =1 p j,t s j + (cid:18) η t s N Lη t I + Lη t I (cid:19) (cid:80) Nn =1 p n,t s n (cid:80) Nj =1 p j,t s j σ , if L η t I + 2 s N Lη t I ≤ . Proof:

We can replace γ t = 0 in Theorem 1 to obtain the bound. (cid:4) By comparing Theorem 1 and Corollary 2, we can prove that, when both constraints (8)and (9) are satisﬁed, the proposed DFP algorithm can achieve a smaller upper bound for the expected loss than FedAvg. In other words, the proposed DFP can achieve a faster convergencefor the controller design in comparison to the FedAvg algorithm, leading to a fast adaptation tothe trafﬁc dynamics for CAVs. In the following corollary, we consider a variant of Theorem 1where the number of local training iterations varies among CAVs. Corollary 3.

To minimize the energy spent on model training, CAVs can dynamically adjustthe number of iteration I n , n ∈ N , of the local SGD performed at each communication round.In this case, we can obtain f ( w t ) − E ξ,n ( f ( w t +1 )) ≥ (cid:80) Nn =1 p n,t s n (cid:20) − (cid:16) η t γ t s N (1 + η t ) σ + η t s N Lη t σ (cid:17) I n (cid:21)(cid:80) Nj =1 p j,t s j + (cid:80) Nn =1 p n,t s n (cid:34)(cid:18)(cid:16) η t s N + γ t η t s N (cid:17) ||∇ f n ( w t ) || − η t γ t s N σ − Lη t σ (cid:19) I n (cid:35)(cid:80) Nj =1 p j,t s j . (11) Proof:

We replace I with I n , n ∈ { , ..., N } , in (7) and simplify the results to obtain (11). (cid:4) Corollary 3 is useful for applications with stringent energy constraints, such as electric CAVs.Also, Corollary 3 can provide guidelines on how to choose the number of iterations that thelocal SGD should be performed at each CAV so as to facilitate the convergence to the optimalcontroller model. In summary, in this section, we designed the DFP algorithm to tackle thechallenges of non-IID and unbalanced data and varying participation of CAVs in the learningprocess when using FL for the autonomous controller. We further proved the convergence andtheoretically studied how the data quality, mobility, wireless fading channels, and number of localtraining iterations affect the overall convergence. Based on these insights, next, we will design acontract-theory based incentive mechanism to further improve the convergence performance ofthe proposed DFP algorithm.IV. C

ONTRACT -T HEORY B ASED I NCENTIVE M ECHANISM D ESIGN

If the data quality β n , n ∈ N , for the CAVs is given, the server can determine whether theCAVs can improve or impede the FL convergence. However, due to the information asymmetrybetween the server and the CAVs, the server cannot obtain the needed information on thedistribution of the local data at each CAV, let alone the data quality. To address such informationasymmetry, we use the framework of contract theory [30] to design an efﬁcient incentive mechanism for the FL-based autonomous controller design where the parameter server andCAVs are modeled as, respectively, employer and employees in a labor market. Contract theoryis apropos here because the parameter server can avoid iterative communications with CAVs andincrease its utility by allowing the CAVs to instantly choose from a limited number of designedcontracts. There are many conventional approaches to design the incentive mechanism, but unlikethe proposed contract-theoretic approach, they are not suitable for the CAV controller design.For example, when using the deep reinforcement learning approach [13], it will take a longtime to converge to an effective incentive mechanism, inevitably delaying the controller trainingprocess and jeopardizing the CAVs’ operation. Moreover, another alternative approach is to usea Stackelberg game [14]. However, in a game setting, each CAV will seek to maximize its ownindividual utility and, thus, such a strategy may not maximize the parameter server’s utility asdone in the proposed contract-based approach. As will be evident from the discussion below,the utility at the parameter server is modeled as the convergence of the learning process, andmaximizing the utility at the server is the key goal of our problem. Hence, to avoid a long delayand improve the FL convergence, we prefer to use contract theory over other alternatives. In thedesigned contract, the parameter server groups CAVs into different types according to the dataquality β n , n ∈ N , and then designs a unique contract for each type of CAVs. In this case, whenfaced with a list of contracts offered by the parameter server, each CAV will self-reveal the typeof its local data quality by choosing the contract designed for its type. Since the data qualityis contingent on how CAVs impact the FL convergence, the designed contract can improve theconvergence of the FL-based controller to the optimal CAV controller. Next, we will deﬁne theutility functions for the parameter server and CAVs and design the contract for the FL-basedautonomous controller design. A. Utility Function of the Parameter Server

From Corollary 1, we can obtain a modiﬁed data quality as θ n = β n s N , n ∈ N . Based on themodiﬁed data quality, we assume that all CAVs in set N (2) can be categorized into M types sortedin an ascending order: < θ ≤ ... ≤ θ M . For CAVs in the set N (1) , their corresponding type isdenoted as type with θ = 0 . Clearly, for CAVs belonging to a higher type, their data quality isbetter and their participation in the FL can expedite the convergence to the optimal autonomouscontroller model used by CAVs. While the parameter server cannot identify the type of a CAV n ∈ N , we assume that the parameter server has the knowledge of the probability ¯ p m that a CAV belongs to type m ∈ { , ..., M } based on the historical data and previous observations, asconsidered in [15]–[17].To achieve the self-revealing property, the parameter server will design the contract, i.e.,the resource-reward bundle, for each type of CAVs. In particular, to compensate the energyconsumption spent on the uplink transmission and local training, the resource-reward bundle forCAVs of type m ∈ { , ..., M } can be written as ( P m , R m ) , where R m is the reward to the CAVswith an uplink transmit power P m . Since CAVs belonging to subset N (1) actually impede the FLconvergence, the parameter server will not give them any compensation, i.e., R = 0 . This zerocompensation can result in the unwillingness of those CAVs to participate in the FL process,leading to P = 0 . However, when incentivizing CAV n ∈ N (2) of type m ∈ { , ..., M } into FLaggregation, the utility function of the parameter server at communication round t will be U ps ( m ) = u exp  − δ n + BN P m d − αn  s (cid:18) w ( n ) t (cid:19) B ( ¯ t − I ¯ scφ ) −  θ m − u R m , where u captures the valuation factor for the convergence gain brought by the participation ofCAVs and u is the unit cost of providing a reward to the CAVs. As the CAVs in N (1) are sortedinto type with the reward as R = 0 and the transmit power as P = 0 , the average utility forthe parameter server at communication round t can be written as U ps = N (cid:88) n =1 M (cid:88) m =1 ¯ p m  u exp  − δ n + BN P m d − αn  s (cid:18) w ( n ) t (cid:19) B ( ¯ t − I ¯ scφ ) −  θ m − u R m  . (12) B. Utility Function of the CAVs

For the CAVs, the reward received from the parameter server will be used to compensate theenergy consumption spent on local model training and uplink transmission. The utility of CAVsof type m ∈ { , ..., M } is thereby obtained as U CAV ( m ) = θ m R m − u ( κcφ ¯ sI + P m ˆ t ) , (13)where u is the unit cost of the energy consumption. C. Contract Design

With the utility functions obtained in (12) and (13), respectively, for the parameter server andCAVs, we can design the optimal contract which can maximize the utility, i.e., the convergencegain between two consecutive communication rounds, at the parameter server. In particular, todesign a feasible contract for the autonomous controllers, two constraints must be satisﬁed. First,the designed contract must meet the individual rationality (IR) constraint where every CAV isrational and will not accept the contract with a negative utility [30]. That is, U CAV ( m ) = θ m R m − u ( κcφ ¯ sI + P m ˆ t ) ≥ , m ∈ { , ..., M } . (14)For CAVs in type , since R = 0 , the CAVs will not train their local controller model and willnot participate in the uplink transmission, justifying P = 0 . Moreover, for a feasible contract,we must impose an incentive compatibility (IC) constraint ensuring that each type of CAVs mustalways prefer to choose the contract designed for their type over contracts for other types [30].In particular, the IC constraints of contract types m and ˆ m, ∀ m, ˆ m ∈ { , ..., M } , will be θ m R m − u ( κcφ ¯ sI + P m ˆ t ) ≥ θ m R ˆ m − u ( κcφ ¯ sI + P ˆ m ˆ t ) , ∀ m, ˆ m ∈ { , ..., M } . (15)According to (14) and (15), we can further simplify the IR and IC constraints and obtain thelist of ﬁve following conditions for a feasible contract. Lemma 1.

The designed contract ( P m , R m ) , m ∈ { , ..., M } , will be feasible if and only if thefollowing ﬁve conditions are satisﬁed: • M (cid:88) m =1 ¯ p m R M ≤ R total , (16) • ≤ R ≤ ... ≤ R m ≤ ... ≤ R M , (17) • ≤ P ≤ ... ≤ P m ≤ ... ≤ P M ≤ P max , (18) • θ R − u ( κcφ ¯ sI + P ˆ t ) ≥ , (19) • θ m − ( R m − R m − ) ≤ u ˆ t ( P m − P m − ) ≤ θ m ( R m − R m − ) , m ∈ { , ..., M } , (20)where R total is total reward at the parameter server and P max denotes the maximum transmitpower of CAVs.The condition in (16) stems from the fact that the parameter server has a limited reward to offer in a contract. The proofs for conditions in (17)-(20) are similar to [30]. Based on theutility function deﬁned in (12) and the conditions presented in Lemma 1, we can formulate thecontract design into an optimization problem whose goal is to maximize the average utility atthe parameter server, as follows: max (( P m ,R m ) ,m ∈{ ,...,M } N (cid:88) n =1 M (cid:88) m =1 ¯ p m (cid:32) u exp (cid:18) − ( δ n + BN ) A n P m d − αn (cid:19) θ m − u R m (cid:33) (21)s.t. (16) , (17) , (18) , (19) , (20) , where A n = (cid:16) s ( w ( n ) t ) B ( ¯ t − I ¯ scφ ) − (cid:17) . Due to the non-concave objective function and the complex con-straints, directly solving the optimization problem in (16)-(21) will be challenging. Alternatively,we will use a sequential method where the optimal power allocation is ﬁrst determined in termsof the reward assignment and the optimal reward assignment for each data quality type is thenderived. In the following theorem, we will study the optimal power allocation when the rewardassignment is given. Theorem 2.

Given a reward assignment R = ( R , ..., R M ) that satisﬁes conditions (16) and (17),the power allocation P ∗ = ( P ∗ , ..., R ∗ M ) that maximizes the average utility at the parameter serverwill be P ∗ m = θ R − u κcφ ¯ sIu ˆ t + m (cid:88) k =1 ρ k , m ∈ { , ..., M } , (22)where ρ k = 0 , if k = 1 ; otherwise, ρ k = θ k ( R k − R k − ) u ˆ t . Proof:

To prove the optimality of the solutions in (22), we will proceed by contradiction. Inparticular, we assume there exists another feasible contract ( P (cid:48) , R ) which achieves a higheraverage utility for the parameter server than the contract ( P ∗ , R ) . Since the utility function atthe parameter server is an increasing function of the transmit power, there will be at least onetype, e.g., type ˆ m ∈ { , ..., M } , of CAVs with P (cid:48) ˆ m > P ∗ ˆ m . Here, we consider two cases with ˆ m = 1 and ˆ m (cid:54) = 1 . When ˆ m = 1 , P (cid:48) > P ∗ . As deﬁned in (22), θ R − u ( κcφ ¯ sI + P ∗ ˆ t ) = 0 .When the CAVs belonging to type 1 are assigned to power P (cid:48) > P ∗ , θ R − u ( κcφ ¯ sI + P (cid:48) ˆ t ) < ,violating the contract feasibility condition (19). When ˆ m (cid:54) = 1 , we have P (cid:48) ˆ m > P ∗ ˆ m . From condition(20), the feasible contract ( P (cid:48) , R ) will satisfy the following condition: u ˆ t ( P (cid:48) ˆ m − P (cid:48) ˆ m − ) ≤ θ ˆ m ( R ˆ m − R ˆ m − ) . (23) Using the deﬁnition of P ∗ m , m ∈ { , ..., M } , in (22), the values of R ˆ m and R ˆ m − will meet R ˆ m − R ˆ m − = u ˆ t ( P ∗ ˆ m − P ∗ ˆ m − ) θ ˆ m . (24)Based on the result in (24), we can simplify the results in (23) and obtain P (cid:48) ˆ m − P ∗ ˆ m ≤ P (cid:48) ˆ m − − P ∗ ˆ m − . As P (cid:48) ˆ m ≥ P ∗ ˆ m , P (cid:48) ˆ m − ≥ P ∗ ˆ m − . Iteratively, the transmit power allocated to the type 1 CAVsin ( P (cid:48) , R ) will be less than the one in ( P ∗ , R ) , i.e., P (cid:48) ≤ P ∗ , which is proved to violate the basicfeasible contract constraint. Hence, there will not exist a feasible contract that achieves a betteraverage utility at the parameter server than the contract ( P ∗ , R ) . In other words, for a givenreward assignment R , the power allocation in the optimal contract is calculated in (22). (cid:4) With the optimal power allocation in Theorem 2, we can verify that the feasible conditionsin (18)-(20) will be automatically satisﬁed when P ∗ ≥ and P ∗ M ≤ P max . Next, we can replace P m with P ∗ m , m ∈ { , ..., M } , in (16)-(21) and reformulate the optimization problem as follows: max R N (cid:88) n =1 M (cid:88) m =1 ¯ p m  u exp (cid:32) − u ¯ t ( δ n + BN ) d αn θ R − u κcφ ¯ sI + (cid:80) mk =1 u ¯ tρ k A n (cid:33) θ m − u R m  (25)s.t. R ≥ u κcφ ¯ sIθ , R M ≤ P max u ˆ t + u κcφ ¯ sIθ M , (26) M (cid:88) m =1 ¯ p m R M ≤ R total , (27) R m ≤ R m +1 , m ∈ { , ..., M − } , (28)where the constraints in (26) result from P ∗ ≥ and P ∗ M ≤ P max , and the constraint in (28) isderived from the feasibility constraint in (17). Deﬁne R as a set of all possible non-negativereward assignments where the constraints in (26) are met. The Lagrangian dual function will be L ( R , λ, µ ) = max R ∈ R N (cid:88) n =1 M (cid:88) m =1 ¯ p m  u exp (cid:32) − u ¯ t ( δ n + BN ) d αn θ R − u κcφ ¯ sI + (cid:80) mk =1 u ¯ tρ k A n (cid:33) θ m − u R m  + λ ( R total − M (cid:88) m =1 ¯ p m R M ) + M − (cid:88) m =1 µ m ( R m +1 − R m ) , (29)where λ and µ = { µ , ..., µ M − } are the Lagrangian multipliers associated to the inequalityconstraints (27) and (28). Hence, the dual optimization problem will be min λ, µ L ( R , λ, µ ) s.t. λ ≥ , µ (cid:23) × ( M − . (30) Table. I. Simulation parameters.

Parameter Description Value η Learning rate . γ Coefﬁcient for the L regularizer . I Iteration number of local SGD P max maximum transmit power W ∆ t Sampling period s ¯ t Duration of each communication round . s κ Energy consumption efﬁciency − [32] c Number of computing cycles per bit [32] φ Frequency of the CPU cycles/s [32] N Noise power spectral density − dBm/Hz B Bandwidth MHz ¯ s Size of randomly selected data at each iteration of SGD , bits M Total number of CAV types 7 α Path-loss exponent 2.5 R total Total reward at the parameter server 5.0

As the dual optimization problem is always convex, it can be solved by updating Lagrangianmultipliers using basic gradient based algorithms. Note that, since the objective function in (25) isnot concave, the solution obtained in the dual optimization problem will be suboptimal. However,instead of tackling the original problem in (25)-(28) with a high complexity, the parameter servercan spend less computation cost and delay when solving the low-complexity dual optimizationproblem. For example, when choosing the ellipsoid method to solve the dual optimizationproblem, the complexity will be O (( M ) ln(1 /ε )) where ε is the accuracy requirement [31].Once the reward assignment is determined, the transmit power allocation in the contract designcan be derived using Theorem 2. V. S IMULATION RESULTS

To evaluate the performance of the proposed DFP algorithm, we use two real datasets: TheBDD data [20] and the DACT data [21]. The BDD data is a large-scale driving video datasetwith extensive annotations for heterogeneous tasks, and such dataset is collected under diversegeographic, environmental, and weather conditions across the United States. The DACT data is acollection of trajectories collected in the city of Columbus, Ohio, where each trajectory recordsmore than minutes of driving data and can be divided into multiple segments annotatedby the operating pattern, like speed-up and slow-down. In terms of the trafﬁc model, weconsider a km × km square area with lanes randomly located around the center ofthe square area. When using BDD data and the DACT data, we assume that CAVs are randomlyassigned to these lanes and all the training data is randomly split among CAVs to capturethe unbalanced distribution of local data. Similar to [33], the CAVs’ velocity is determined by Time (s) V e l o c it y ( m il e s / hou r) Target reference speedActual speed, trained onlyby the local dataActual speed, trained bythe DFP algorithm (a) Harsh brake in a trafﬁc accident.

Time (s) V e l o c it y ( m il e s / hou r) Target reference speedActual speed, trained onlyby the local dataActual speed, trained bythe DFP algorithm (b) Stop-and-go trafﬁc in a congestion.

Time (s) V e l o c it y ( m il e s / hou r) Target reference speedActual speed, trained onlyby the local dataActual speed, trained bythe DFP algorithm (c) Speed limit changes in a work zone.

Fig. 2. Velocity variations over different trafﬁc scenarios. the headway distance to the preceding CAVs. The values of the parameters used for simulationsare summarized in Table I.Fig. 2 shows the velocity tracking performance comparison between the autonomous con-trollers solely trained by the local data (i.e., smooth slow-down) and trained by our proposedDFP algorithm under different trafﬁc scenarios. In this simulation, we consider three trafﬁcscenarios from the DACT dataset. In particular, we choose a use case with a dramatic speeddecline to represent a harsh brake in a trafﬁc accident, the speed variations around zero asthe stop-and-go trafﬁc in a congestion, and the change of the average speed as the speed limitchanges in a roadwork zone. As shown in Fig. 2, the controller trained by our proposed DFPalgorithm can accurately execute the control decisions and track the target speed under all threetrafﬁc scenarios. However, when using the controller trained with the local data, we can facelarge speed variations around the target values. For example, as shown in Fig. 2(a), to achievea harsh brake, the controller trained by the local data will generate sequential deceleration andacceleration instead of a constant deceleration as done by the controller trained by our proposedDFP algorithm. In the trafﬁc congestion and roadwork zone of Figs. 2(b) and 2(c), the controllertrained by the local data will have a more frequent switch between acceleration and decelerationthan the target speed traces, adversely impacting the driving experience of the passengers. Also,in Figs. 2(b) and 2(c), the controller trained by the local data can make aggressive decelerationand acceleration and such behaviors will not only increase the CAVs’ maintenance costs, but itwill also endanger following and preceding CAVs especially when the spacing is small.Fig. 3 shows the velocity tracking performance comparison between the autonomous con-trollers solely trained by the local data (i.e., smooth speed-up) and trained by our proposed DFPalgorithm over time. In this simulation, the trajectory data in the DACT dataset is randomlyassigned to the CAVs. Fig. 3 shows that the DFP-based controller design can accurately track Time (s) V e l o c it y ( m il e s / hou r) Target reference speedActual speed, only trainedby the local dataActual speed, trained bythe DFP algorithm

X: 311Y: 55.17X: 311Y: 52

Fig. 3. Velocity variations over time.

Absolute distance error (m) C D F Only trained by the local dataDFP, B = 10 MHzDFP, B = 5 MHzDFP, B = 1 MHz Fig. 4. The CDF of absolute distance errors. the target velocity over time. However, the actual velocity generated by the controller trainedwith local data can deviate from the target value. In particular, at time t = 311 s, the errorbetween the actual and target velocities can be as large as . miles/hour ( . meters/second),violating the two commonly used design criteria for a vehicle’s controller, i.e., . meters/seconderror upper bound [34] and maximum allowable error [35].Fig. 4 shows the cumulative distribution function (CDF) when the controllers tracks the DACTdataset. In particular, the autonomous controllers are trained, respectively, by local data andby our proposed DFP algorithm with different bandwidth. Also, the absolute distance error iscalculated by the absolute difference between the target distance in the DACT dataset and theactual distance traversed by the CAV with the designed controller at the end of each trajectory.As observed from Fig. 4, the controller trained by the proposed DFP algorithm yields a muchsmaller distance error compared with the case in which the CAVs only use their local data totrain the controller model. In particular, with a . probability, the controller solely trainedwith local data will generate an absolute distance error of less than m, two times largerthan the error resulting from the DFP-based autonomous controller. Moreover, as shown in Fig.4, for a larger bandwidth, the proposed DFP-based controller design will more likely yield asmaller distance error. For example, when the bandwidth B = 10 MHz, the probability that thedistance error generated by DFP-based controller remains below m is around . , while thecounterpart for the case with a bandwidth B = 1 MHz is around . . That is because with alarger bandwidth, more CAVs can meet the time constraint ¯ t and participate in the FL, leading toa better training performance. As shown in Figures 2-4, it is clear that the autonomous controllerbased on the proposed DFP algorithm outperforms the baseline scheme that solely relies on thelocal data for training.Fig. 5 compares the proposed DFP with FedAvg and FedProx. To test the ability of dealingwith unbalanced and non-IID data for these three algorithms, we choose a larger BDD dataset. In Communication round L o ss FedAvgFedProxDFP

Fig. 5. Comparison between the proposed DFP, FedAvg, and FedProx algorithms.

Type of CAVs R e w a r d T r a n s m it po w e r ( W ) (a) Conditions (17) and (18). Contract type -1-0.500.511.522.53 U tilit y o f C AV s Type 7Type 6Type 5Type 4Type 3Type 2Type 1 (b) Conditions (19) and (20).

Fig. 6. Feasibility of the contract-theory based incentive mechanism for the FL-based controller design. particular, the BDD data collected under different trafﬁc scenarios will be assigned to differentvehicles unevenly to capture the unbalanced and non-IID distribution of local data. As observedfrom Fig. 5, when faced with unbalanced and non-IID training data, FedAvg and FedProx fail toconverge near zero loss over communication rounds. In particular, after communicationrounds, the loss values for FedAvg and FedProx are near . and . , respectively. The slowconvergence of FedAvg stems from the fact that the training performance of FedAvg is negativelyimpacted by the unbalanced and non-IID data. The poor performance of FedProx can be explainedby the fact that, in FedProx, the CAVs that are randomly selected for the training process mightnot ﬁnish the uplink transmission in time due to the path loss and fading. However, as shownin Fig. 5, our proposed DFP algorithm can achieve a faster convergence. For example, we onlyneed communication rounds to achieve a loss below . for the controller design. In otherwords, when dealing with the diverse local data and varying participation of CAVs, our proposedDFP algorithm exhibits a fast convergence to the optimal autonomous controller for CAVs. Sucha fast convergence can enable the CAV to quickly adapt to the trafﬁc dynamics and correctlytrack the speed determined by the motion planner.In Fig. 6, we validate the feasibility of the proposed contract-theory based incentive mechanism Communication round L o ss Maximal power allocationRandom power allocationProposed sub-optimal contract designOptimal contract design

Fig. 7. Training performance of the proposed contract-based approach and two baselines. for the FL-based autonomous controller design among CAVs. In particular, as shown in Fig. 6(a),the reward and transmit power increase with the type of CAVs. Hence, from Fig. 6(a), our designcontract can meet the feasibility constraints (17) and (18). Moreover, as shown in Fig. 6(b), weevaluate the utilities of all types of CAVs when selecting all different contracts offered by theparameter server. As observed from Fig. 6(b), when choosing contract type 1, the utility of type1 CAV is non-negative, verifying the feasibility condition (19). Also, we can observe that theutility is a concave function regarding to the CAVs’ type, and each type of CAV can achieveits maximum utility if and only if it selects the type of contract that is designed for its owntype. Thus, by using our proposed contract, the CAVs can self-reveal their own types and choosethe contract intended for their types in order to maximize the utility, satisfying the feasibilitycondition (20). Hence, given the results in Fig. 6, we can validate the feasibility of our proposedcontract design for the CAVs and parameter server.Fig. 7 shows the training performance difference when the DFP algorithm uses our proposedcontract-theory based incentive mechanism and two baseline schemes for the power allocationamong CAVs. The two baseline schemes include the maximum power allocation where all CAVsuse the maximum transmit power for their uplink transmission and the random power allocationwhere all CAVs use a randomly selected transmit power in the range from zero to the maximumpower. In addition, we show the convergence for the optimal contract design where an exhaustivesearch algorithm is used to determine the optimal reward assignment in (25) and then the powerallocation in the optimal contract is derived using Theorem 2. As shown in Fig. 7, we canobserve that when using these four assignment strategies, the training loss will decrease as thecommunication round increases. However, the FL process using our proposed contract-theoreticincentive mechanism for the power allocation can achieve a faster convergence compared with random and maximum power allocation schemes. In particular, to achieve a . loss for thetraining process of the controller design, the FL process with our proposed scheme will onlyneed around communication rounds; whereas the corresponding communication rounds forboth baseline schemes will be around . In other words, our proposed strategy can achieve % faster FL convergence speed compared with both baseline schemes. The reason is that ourproposed incentive mechanism will only allocate the transmit power to the CAVs in N (2) whichbring positive convergence gain to the FL process. However, in the maximum power allocationand the random power allocation, CAVs in N (1) will also be allowed to participate in the FL andtheir negative convergence gain will offset the positive gain brought by CAVs in N (2) . Moreover,we can observe that the convergence of our suboptimal contract design is closely aligned to theoptimal contract solution. In other words, our suboptimal solution is effective to design a contractwhich can improve the convergence of the DFP algorithm.VI. C ONCLUSIONS

In this paper, we have developed an FL framework to enable collaborative training of theautonomous controller model across a group of CAVs. In particular, we have proposed a newDFP algorithm that accounts for the varying participation of CAVs in the FL process as wellas diverse data quality across CAVs. We have performed a rigorous theoretical convergenceanalysis for the proposed algorithm, and we have explicitly studied the impact of CAVs’ mobility,uncertainty of wireless channels, as well as unbalanced and non-IID local data on the overallconvergence performance. To improve the convergence of the proposed algorithm, we havedesigned a contract-theoretic incentive mechanism. Simulation results from using real traces haveshown that the autonomous controller designed by the proposed algorithm can track the targetspeed over time and under different trafﬁc scenarios and the DFP algorithm can lead to a bettercontroller design in comparison to to the FedAvg and FedProx algorithms. Also, the simulationresults have validated the feasibility of our proposed contract-based incentive mechanism andshown that the incentive mechanism can accelerate the convergence of controller models inCAVs. As future extension of the proposed approach, the DFP algorithm and the contract theorybased incentive mechanism can be studied for lateral controller design and MPC design in CAVs.A

PPENDIX

A. Proof of Theorem 1

To prove Theorem 1, we can ﬁnd the upper bound of f ( w t +1 ) as follows: f ( w t +1 ) ( a ) ≤ f ( w t ) + (cid:104)∇ f ( w t ) , w t +1 − w t (cid:105) + 12! ( w t +1 − w t ) T ∇ f ( w t )( w t +1 − w t ) ( b ) ≤ f ( w t ) + (cid:104)∇ f ( w t ) , w t +1 − w t (cid:105) + L (cid:107) w t +1 − w t (cid:107) c ) = f ( w t ) + (cid:42) N (cid:88) n =1 s n s N ∇ f n ( w t ) , − η t (cid:88) n ∈N t s n s N t I − (cid:88) i =0 (cid:16) ∇ f n ( w ( n ) t,i , ξ ) + γ t ( w ( n ) t,i − w t ) (cid:17)(cid:43) + L (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) η t (cid:88) n ∈N t s n s N t I − (cid:88) i =0 (cid:16) ∇ f n ( w ( n ) t,i , ξ ) + γ t ( w ( n ) t,i − w t ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) d ) = f ( w t ) − η t s N (cid:88) n ∈N t s n s N t I − (cid:88) i =0 (cid:68) ∇ f n ( w t ) , ∇ f n ( w ( n ) t,i , ξ ) (cid:69) − η t γ t s N (cid:88) n ∈N t s n s N t I − (cid:88) i =0 (cid:68) ∇ f n ( w t ) , w ( n ) t,i − w t (cid:69) + L η t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) n ∈N t s n s N t I − (cid:88) i =0 (cid:16) ∇ f n ( w ( n ) t,i , ξ ) + γ t ( w ( n ) t,i − w t ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (31) where (a) follows the Taylor expansion, (b) is based on the assumption of the Lipschitz continuity,and (c) follows the deﬁnition of ∇ f ( w t ) and the relationship between w t +1 and w t . And in (d),we use the fact that CAVs train their learning model independently and then further simplifythe calculated results.In Algorithm 1, there are two sources of randomness in the FL training. First, for I localiterations of SGDs at each communication round, the local data samples selected for trainingthe local FL model will be random. Second, the CAVs participation in the FL will vary acrossdifferent communication rounds due to the mobility of CAVs and uncertainty of wireless chan-nels. We will consider these two sources of randomness sequentially. First, when considering theﬁrst source of randomness, we take expectation for both sides of (31) in terms of the randomlyselected set of local samples and we can obtain as follows E ξ ( f ( w t +1 )) ≤ f ( w t ) − η t s N (cid:88) n ∈N t s n s N t I − (cid:88) i =0 E ξ (cid:68) ∇ f n ( w t ) , ∇ f n ( w ( n ) t,i , ξ ) (cid:69) − η t γ t s N (cid:88) n ∈N t s n s N t I − (cid:88) i =0 E ξ (cid:68) ∇ f n ( w t ) , w ( n ) t,i − w t (cid:69) + L η t E ξ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:88) n ∈N t s n s N t I − (cid:88) i =0 (cid:16) ∇ f n ( w ( n ) t,i , ξ ) + γ t ( w ( n ) t,i − w t ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) a ) ≤ f ( w t ) − η t s N (cid:88) n ∈N t s n s N t I − (cid:88) i =0 E ξ (cid:68) ∇ f n ( w t ) , ∇ f n ( w ( n ) t,i ) (cid:69) − η t γ t s N (cid:88) n ∈N t s n s N t I − (cid:88) i =0 E ξ (cid:68) ∇ f n ( w t ) , w ( n ) t,i − w t (cid:69) + L η t (cid:88) n ∈N t s n s N t E ξ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) I − (cid:88) i =0 (cid:16) ∇ f n ( w ( n ) t,i , ξ ) + γ t ( w ( n ) t,i − w t ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (32) where in (a), we use the fact that, for CAV n ∈ N , the gradient for a random set ξ ∈ S n of localdata samples is the unbiased estimation to its full gradient representation, i.e., E ξ ∇ f n (cid:16) w ( n ) t,i , ξ (cid:17) = ∇ f n (cid:16) w ( n ) t,i (cid:17) [36]. For the second type of randomness, we can take the expectation for both sidesof (32) with respect to the CAVs as follows E ξ,n ( f ( w t +1 )) ( a ) ≤ f ( w t ) − η t s N (cid:80) Nj =1 p j,t s j N (cid:88) n =1 p n,t s n I − (cid:88) i =0 E ξ,n (cid:68) ∇ f n ( w t ) , ∇ f n ( w ( n ) t,i ) (cid:69) − η t γ t s N (cid:80) Nj =1 p j,t s j N (cid:88) n =1 p n,t s n I − (cid:88) i =0 E ξ,n (cid:68) ∇ f n ( w t ) , w ( n ) t,i − w t (cid:69) + L η t (cid:80) Nj =1 p j,t s j N (cid:88) n =1 p n,t s n E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) I − (cid:88) i =0 (cid:16) ∇ f n ( w ( n ) t,i , ξ ) + γ t ( w ( n ) t,i − w t ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) b ) = f ( w t ) − η t s N (cid:80) Nj =1 p j,t s j N (cid:88) n =1 p n,t s n I − (cid:88) i =0 E ξ,n (cid:13)(cid:13) ∇ f n ( w t ) (cid:13)(cid:13) − η t s N (cid:80) Nj =1 p j,t s j N (cid:88) n =1 p n,t s n I − (cid:88) i =0 E ξ,n (cid:13)(cid:13)(cid:13) ∇ f n ( w ( n ) t,i ) (cid:13)(cid:13)(cid:13) + η t s N (cid:80) Nj =1 p j,t s j N (cid:88) n =1 p n,t s n I − (cid:88) i =0 E ξ,n (cid:13)(cid:13)(cid:13) ∇ f n ( w t ) − ∇ f n ( w ( n ) t,i ) (cid:13)(cid:13)(cid:13) (cid:124) (cid:123)(cid:122) (cid:125) T − η t γ t s N (cid:80) Nj =1 p j,t s j N (cid:88) n =1 p n,t s n I − (cid:88) i =0 E ξ,n (cid:13)(cid:13) ∇ f n ( w t ) (cid:13)(cid:13) − η t γ t s N (cid:80) Nj =1 p j,t s j N (cid:88) n =1 p n,t s n I − (cid:88) i =0 E ξ,n (cid:13)(cid:13)(cid:13) w ( n ) t,i − w t (cid:13)(cid:13)(cid:13) + η t γ t s N (cid:80) Nj =1 p j,t s j N (cid:88) n =1 p n,t s n I − (cid:88) i =0 E ξ,n (cid:13)(cid:13)(cid:13) ∇ f n ( w t ) − ( w ( n ) t,i − w t ) (cid:13)(cid:13)(cid:13) (cid:124) (cid:123)(cid:122) (cid:125) T + L η t (cid:80) Nj =1 p j,t s j N (cid:88) n =1 p n,t s n E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) I − (cid:88) i =0 (cid:16) ∇ f n ( w ( n ) t,i , ξ ) + γ t ( w ( n ) t,i − w t ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:124) (cid:123)(cid:122) (cid:125) T , (33) where in (a), we consider the fact that the probability that CAV n successfully sends its trainedmodel parameters to the parameter server at t -th communication round is p n,t , and in (b), wefollow the fact that for a real vector space, there exists (cid:104) x , y (cid:105) = ( (cid:107) x (cid:107) + (cid:107) y (cid:107) − (cid:107) x − y (cid:107) ) .In particular, we can simplify T as follows T = E ξ,n (cid:13)(cid:13)(cid:13) ∇ f n ( w ( n ) t,i ) − ∇ f n ( w t ) (cid:13)(cid:13)(cid:13)

22 ( a ) ≤ L η t E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) i − (cid:88) k =0 ∇ f ( w ( n ) t,k , ξ ) + γ t ( w ( n ) t,k − w t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( b ) ≤ L η t E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) i − (cid:88) k =0 ∇ f ( w ( n ) t,k , ξ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 2 L η t E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) i − (cid:88) k =0 γ t ( w ( n ) t,k − w t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) =2 L η t E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) i − (cid:88) k =0 ∇ f ( w ( n ) t,k , ξ ) − ∇ f ( w ( n ) t,k ) + ∇ f ( w ( n ) t,k ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 2 L η t E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) i − (cid:88) k =0 γ t ( w ( n ) t,k − w t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) =2 L η t E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) i − (cid:88) k =0 ∇ f ( w ( n ) t,k , ξ ) − ∇ f ( w ( n ) t,k ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 2 L η t E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) i − (cid:88) k =0 ∇ f ( w ( n ) t,k ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 4 L η t E ξ,n (cid:42) i − (cid:88) k =0 ∇ f ( w ( n ) t,k , ξ ) − ∇ f ( w ( n ) t,k ) , ∇ f ( w ( n ) t,k ) (cid:43) + 2 L η t E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) i − (cid:88) k =0 γ t ( w nt,k − w t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) c ) ≤ L η t iσ + 2 L η t i i − (cid:88) k =0 E ξ,n (cid:13)(cid:13)(cid:13) ∇ f ( w nt,k ) (cid:13)(cid:13)(cid:13) + 2 L η t γ t i i − (cid:88) k =0 E ξ,n (cid:13)(cid:13)(cid:13) ( w nt,k − w t ) (cid:13)(cid:13)(cid:13) , (34) where (a) follows the relationship between w ( n ) t,i and w t , and (b) follows the Cauchy-Schwarzinequality (cid:107) · x +1 · y (cid:107) ≤ (cid:107) x (cid:107) +2 (cid:107) y (cid:107) . In (c), we use the fact, for n ∈ N , E ξ,n ( ∇ f ( w ( n ) t,k , ξ ) −∇ f ( w ( n ) t,k )) = 0 , the assumption of the Lipschitz continuity, and the extension of Cauchy-Schwarzinequality for i variables, i.e., (cid:107) (cid:80) i − k =0 · x k (cid:107) ≤ i (cid:80) i − k =0 (cid:107) x k (cid:107) . For the term T , we have T = E ξ,n (cid:13)(cid:13)(cid:13) ∇ f n ( w t ) − ∇ f n ( w t , ξ ) + ∇ f n ( w t , ξ ) − ( w ( n ) t,i − w t ) (cid:13)(cid:13)(cid:13) = E ξ,n (cid:13)(cid:13) ∇ f n ( w t ) − ∇ f n ( w t , ξ ) (cid:13)(cid:13) + E ξ,n (cid:13)(cid:13)(cid:13) ∇ f n ( w t , ξ ) − ( w ( n ) t,i − w t ) (cid:13)(cid:13)(cid:13) + 2 E ξ,n (cid:68) ∇ f n ( w t ) − ∇ f n ( w t , ξ ) , ∇ f n ( w t , ξ ) − ( w ( n ) t,i − w t ) (cid:69) ( a ) = E ξ,n (cid:13)(cid:13) ∇ f n ( w t ) − ∇ f n ( w t , ξ ) (cid:13)(cid:13) + E ξ,n (cid:13)(cid:13)(cid:13) ∇ f n ( w t , ξ ) − ( w ( n ) t,i − w t ) (cid:13)(cid:13)(cid:13) b ) ≤ σ + E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∇ f n ( w t , ξ ) + η t  i − (cid:88) k =0 ∇ f ( w ( n ) t,k , ξ ) + γ t ( w ( n ) t,k − w t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) c ) ≤ σ + 2 E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∇ f n ( w t , ξ ) + η t i − (cid:88) k =0 ∇ f ( w ( n ) t,k , ξ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 2 E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) η t i − (cid:88) k =0 γ t ( w ( n ) t,k − w t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) d ) ≤ σ + 2(1 + η t ) E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) i − (cid:88) k =0 ∇ f ( w ( n ) t,k , ξ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 2 η t γ t E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) i − (cid:88) k =0 w ( n ) t,k − w t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) e ) ≤ σ + 2 i (1 + η t ) σ + 2(1 + η t ) i i − (cid:88) k =0 E ξ,n (cid:13)(cid:13)(cid:13) ∇ f ( w ( n ) t,k ) (cid:13)(cid:13)(cid:13) + 2 η t γ t i i − (cid:88) k =0 E ξ,n (cid:13)(cid:13)(cid:13) ( w ( n ) t,k − w t ) (cid:13)(cid:13)(cid:13) , (35) where in (a), we use the fact, for n ∈ N , E ξ,n (cid:16) ∇ f ( w ( n ) t,k , ξ ) −∇ f ( w ( n ) t,k ) (cid:17) = 0 , and we expandthe expression of w ( n ) t,i in (b). Also the changes in (c) are based on the fact that w t = w ( n ) t, , andin (d), we follow the Cauchy-Schwarz inequality. The ﬁnal inequality in (e) exploits the factsused in (34). When considering all I iterations of SGD, T and T can be simpliﬁed as follows I (cid:88) i =1 T ≤ L η t I σ + L η t I I − (cid:88) i =0 E ξ,n (cid:13)(cid:13)(cid:13) ∇ f ( w ( n ) t,i ) (cid:13)(cid:13)(cid:13) + L η t I γ t I − (cid:88) i =0 E ξ,n (cid:13)(cid:13)(cid:13) ( w ( n ) t,i − w t ) (cid:13)(cid:13)(cid:13) , (36) I (cid:88) i =1 T ≤ Iσ + I (1 + η t ) σ + (1 + η t ) I I − (cid:88) i =0 E ξ,n (cid:13)(cid:13)(cid:13) ∇ f ( w ( n ) t,i ) (cid:13)(cid:13)(cid:13) + η t γ t I I − (cid:88) i =0 E ξ,n (cid:13)(cid:13)(cid:13) ( w ( n ) t,i − w t ) (cid:13)(cid:13)(cid:13) . (37) To simplify T , we have T a ) ≤ E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) I − (cid:88) i =0 ∇ f n ( w ( n ) t,i , ξ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 2 E ξ,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) I − (cid:88) i =0 γ t ( w ( n ) t,i − w t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) b ) ≤ Iσ + 2 I I − (cid:88) i =0 E ξ,n (cid:107)∇ f n ( w ( n ) t,i ) (cid:107) + 2 γ t I I − (cid:88) i =0 E ξ,n (cid:13)(cid:13)(cid:13) w ( n ) t,i − w t (cid:13)(cid:13)(cid:13) , (38) where (a) and (b) follow the same facts and laws used in simplifying T and T . After replacing(36), (37), and (38) into the corresponding terms in (33), we can have E ξ,n ( f ( w t +1 )) ≤ f ( w t ) − (cid:18) η t s N + γ t η t s N (cid:19) (cid:80) Nn =1 p n,t s j I ||∇ f n ( w t ) || (cid:80) Nj =1 p j,t s j + (cid:18) η t s N Lη t I + η t γ t s N ( I + I (1 + η t ) ) + Lη t I (cid:19) (cid:80) Nn =1 p n,t s n (cid:80) Nj =1 p j,t s j σ − (cid:32) η t s N − L I η t s N − η t γ t I (1 + η t ) s N − ILη t (cid:33) (cid:80) Nn =1 p n,t s n (cid:80) I − i =1 E ξ,n ||∇ f n ( w ( n ) t,i ) || (cid:80) Nj =1 p j,t s j − (cid:32) γ t η t s N − L I η t γ t s N − η t γ t I s N − ILη t γ t (cid:33) (cid:80) Nn =1 p n,t s n (cid:80) I − i =1 E ξ,n || w ( n ) t,i − w t || (cid:80) Nj =1 p j,t s j . (39) By using conditions (8) and (9) for (39), we can obtain the results in (7). To further simplifythe expression of p n,t , we have p n,t = P (cid:0) t n, comp + t n, comm ≤ ¯ t (cid:1) = P  I ¯ scφ + s ( w ( n ) t ) B log (cid:16) P n h n d − αn δ n + BN (cid:17) ≤ ¯ t  = P  h n ≥ δ n + BN P n d − αn  s ( w ( n ) t ) B ( ¯ t − I ¯ scφ ) −  ( a ) = exp  − δ n + BN P n d − αn  s ( w ( n ) t ) B ( ¯ t − I ¯ scφ ) −  , (40) where in (a), we use the fact that the channels between the parameter server and the CAVs areRayleigh fading channels. B. Proof of Corollary 1

Based on the deﬁnition of N (1) and N (2) , we have f ( w t ) − E ξ,n ( f ( w t +1 )) ≥ (cid:80) n ∈N (1) p n,t β n (cid:80) Nj =1 p j,t s j + (cid:80) n ∈N (2) p n,t β n (cid:80) Nj =1 p j,t s j ( a ) ≥ (cid:80) n ∈N (1) p n,t β n (cid:80) Nj =1 p j,t s j + (cid:80) n ∈N (2) p n,t β n (cid:80) Nj =1 s j , (41) where the changes in (a) are based on the fact that β n > , for n ∈ N (2) and the probabilityterm ≤ p j,t ≤ . Since (cid:80) Nj =1 s j = s N , we can obtain the results in Corollary 1. R EFERENCES [1] T. Zeng, O. Semiari, M. Chen, W. Saad, and M. Bennis, “Federated learning for autonomous controller design in connectedand autonomous vehicles,” submitted to

Proc. of IEEE Conference on Decision and Control (CDC) , Austin, TX, USA,Dec. 2021.[2] B. Paden, M. ˇC´ap, S. Z. Yong, D. Yershov, and E. Frazzoli, “A survey of motion planning and control techniques forself-driving urban vehicles,”

IEEE Transactions on Intelligent Vehicles , vol. 1, no. 1, pp. 33–55, Mar. 2016.[3] J. Kong, M. Pfeiffer, G. Schildbach, and F. Borrelli, “Kinematic and dynamic vehicle models for autonomous drivingcontrol design,” in

Proc. of IEEE Intelligent Vehicles Symposium , Seoul, South Korea, Jun. 2015.[4] M. Kamal, M. Mukai, J. Murata, and T. Kawabe, “Ecological vehicle control on roads with up-down slopes,”

IEEETransactions on Intelligent Transportation Systems , vol. 12, no. 3, pp. 783–794, Sept. 2011.[5] K. Nam, Y. Hori, and C. Lee, “Wheel slip control for improving traction-ability and energy efﬁciency of a personal electricvehicle,”

Energies , vol. 8, no. 7, pp. 6820–6840, Jul. 2015.[6] L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, “Learning-based model predictive control: Toward safelearning in control,”

Annual Review of Control, Robotics, and Autonomous Systems , vol. 3, no. 1, pp. 269–296, May 2020.[7] S. Wakitani, T. Yamamoto, and B. Gopaluni, “Design and application of a database-driven PID controller with data-drivenupdating algorithm,”

Industrial & Engineering Chemistry Research , vol. 58, no. 26, pp. 11 419–11 429, May 2019.[8] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al. , “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316 , 2016.[9] V. Rausch, A. Hansen, E. Solowjow, C. Liu, E. Kreuzer, and J. K. Hedrick, “Learning a deep neural net policy for end-to-end control of autonomous vehicles,” in

Proc. of IEEE American Control Conference (ACC) , Seattle, WA, USA, May2017.[10] H. M. Eraqi, M. N. Moustafa, and J. Honer, “End-to-end deep learning for steering autonomous vehicles consideringtemporal dependencies,” in

Proc. of Advances in Neural Information Processing Systems (NeurIPS) , Long Beach, CA,USA, Dec. 2017.[11] S. C. Lin, Y. Zhang, C. H. Hsu, M. Skach, M. E. Haque, L. Tang, and J. Mars, “The architectural implications ofautonomous driving: Constraints and acceleration,” in

Proc. of ACM International Conference on Architectural Supportfor Programming Languages and Operating Systems , Williamsburg, VA, USA, Mar. 2018.[12] H. Shiri, J. Park, and M. Bennis, “Communication-efﬁcient massive UAV online path control: Federated learning meetsmean-ﬁeld game theory,”

IEEE Transactions on Communications , vol. 68, no. 11, pp. 6840–6857, Nov. 2020.[13] Y. Zhan, P. Li, Z. Qu, D. Zeng, and S. Guo, “A learning-based incentive mechanism for federated learning,”

IEEE Internetof Things Journal , vol. 7, no. 7, pp. 6360–6368, Jul. 2020.[14] L. U. Khan, S. R. Pandey, N. H. Tran, W. Saad, Z. Han, M. N. H. Nguyen, and C. S. Hong, “Federated learning for edgenetworks: Resource optimization and incentive mechanism,”

IEEE Communications Magazine , vol. 58, no. 10, pp. 88–93,Oct. 2020.[15] D. Ye, R. Yu, M. Pan, and Z. Han, “Federated learning in vehicular edge computing: A selective model aggregationapproach,”

IEEE Access , vol. 8, pp. 23 920–23 935, 2020.[16] W. Y. B. Lim, Z. Xiong, C. Miao, D. Niyato, Q. Yang, C. Leung, and H. V. Poor, “Hierarchical incentive mechanismdesign for federated machine learning in mobile networks,”

IEEE Internet of Things Journal , vol. 7, no. 10, pp. 9575–9588,Oct. 2020.[17] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y. Liang, and D. I. Kim, “Incentive design for efﬁcient federated learning in mobilenetworks: A contract theory approach,” in

Proc. of IEEE VTS Asia Paciﬁc Wireless Communications Symposium (APWCS) ,Singapore, Singapore, Aug. 2019.[18] J. Koneˇcn`y, H. B. McMahan, D. Ramage, and P. Richt´arik, “Federated optimization: Distributed machine learning foron-device intelligence,” arXiv preprint arXiv:1610.02527 , 2016.[19] V. Smith, C.-K. Chiang, M. Sanjabi, and A. Talwalkar, “Federated multi-task learning,” in

Proc. of Advances in NeuralInformation Processing Systems (NeurIPS) , Long Beach, CA, USA, Dec. 2017.[20] F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, “BDD100K: A diverse driving datasetfor heterogeneous multitask learning,” in

Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition ,Seattle, WA, USA, Jun. 2020.[21] S. Moosavi, B. Tehrani, and R. Ramnath, “Trajectory annotation by discovering driving patterns,” in

Proc. of ACMSIGSPATIAL Workshop on Smart Cities and Urban Analytics , Redondo Beach, CA, USA, Nov. 2017.[22] L. dos Santos Coelho and D. L. de Andrade Bernert, “An improved harmony search algorithm for synchronization ofdiscrete-time chaotic systems,”

Chaos, Solitons & Fractals , vol. 41, no. 5, pp. 2526–2532, Sept. 2009.[23] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A joint learning and communications framework for federatedlearning over wireless networks,”

IEEE Transactions on Wireless Communications , vol. 20, no. 1, pp. 269–283, Jan. 2021.[24] Q. Song, J. C. Spall, Y. C. Soh, and J. Ni, “Robust neural network tracking controller using simultaneous perturbationstochastic approximation,”

IEEE Transactions on Neural Networks , vol. 19, no. 5, pp. 817–835, May 2008.[25] X. Liu and P. Lu, “Solving nonconvex optimal control problems by convex optimization,”

Journal of Guidance, Control,and Dynamics , vol. 37, no. 3, pp. 750–765, Apr. 2014.[26] N. H. Tran, W. Bao, A. Zomaya, M. N. H. Nguyen, and C. S. Hong, “Federated learning over wireless networks:Optimization model design and analysis,” in

Proc. of IEEE Conference on Computer Communications (INFOCOM) , Paris,France, May 2019.[27] M. Chen, H. V. Poor, W. Saad, and S. Cui, “Wireless communications for collaborative federated learning,”

IEEECommunications Magazine , vol. 58, no. 12, pp. 48–54, Dec. 2020.[28] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,”in

Proc. of Conference on Machine Learning and Systems (MLSys) , Austin, TX, USA, Mar. 2020.[29] L. Bottou, F. Curtis, and J. Nocedal, “Optimization methods for large-scale machine learning,”

SIAM Review , vol. 60,no. 2, pp. 223–311, May 2018.[30] P. Bolton, M. Dewatripont et al. , Contract theory . MIT press, 2005.[31] S. Boyd, S. P. Boyd, and L. Vandenberghe,

Convex optimization . Cambridge university press, 2004.[32] F. Zhou, Y. Wu, R. Q. Hu, and Y. Qian, “Computation rate maximization in UAV-enabled wireless-powered mobile-edgecomputing systems,”

IEEE Journal on Selected Areas in Communications , vol. 36, no. 9, pp. 1927–1941, Sept. 2018.[33] T. Zeng, O. Semiari, W. Saad, and M. Bennis, “Joint communication and control for wireless autonomous vehicular platoonsystems,”

IEEE Transactions on Communications , vol. 67, no. 11, pp. 7907–7922, Nov. 2019.[34] S. Xiong, H. Xie, K. Song, and G. Zhang, “A speed tracking method for autonomous driving via ADRC with extendedstate observer,”

Applied Sciences , vol. 9, no. 16, pp. 1–21, Aug. 2019.[35] G. Tagne, R. Talj, and A. Charara, “Higher-order sliding mode control for lateral dynamics of autonomous vehicles, withexperimental validation,” in

Proc. of IEEE Intelligent Vehicles Symposium , Gold Coast, QLD, Australia, Jun. 2013.[36] B. Chen, Y. Xu, and A. Shrivastava, “Fast and accurate stochastic gradient estimation,” in