Federated mmWave Beam Selection Utilizing LIDAR Data
Mahdi Boloursaz Mashhadi, Mikolaj Jankowski, Tze-Yang Tung, Szymon Kobus, Deniz Gunduz
FFederated mmWave Beam Selection UtilizingLIDAR Data
Mahdi Boloursaz Mashhadi ∗ , Mikolaj Jankowski ∗ , Tze-Yang Tung, Szymon Kobus, DenizG¨und¨uzDept. of Electrical and Electronic Eng., Imperial College London, UK Abstract
Efficient link configuration in millimeter wave (mmWave) communication systems is a crucial yetchallenging task due to the overhead imposed by beam selection on the network performance. For vehicle-to-infrastructure (V2I) networks, side information from LIDAR sensors mounted on the vehicles has beenleveraged to reduce the beam search overhead. In this letter, we propose distributed LIDAR aided beamselection for V2I mmWave communication systems utilizing federated training. In the proposed scheme,connected vehicles collaborate to train a shared neural network (NN) on their locally available LIDAR dataduring normal operation of the system. We also propose an alternative reduced-complexity convolutionalNN (CNN) architecture and LIDAR preprocessing, which significantly outperforms previous works interms of both the performance and the complexity.
Index terms—
Federated learning, mmWave beam selection, LIDAR.
I. I
NTRODUCTION
Millimeter wave (mmWave) is a promising technology for high data rate vehicular communications.However, efficient beam selection in mmWave vehicle-to-infrastructure (V2I) networks is challenging dueto the overhead imposed by the beam search process. Recently it was shown that side information fromsensors mounted on the connected vehicles can be exploited to reduce the beam-selection overhead formmWave links. The position information from vehicles is used in [1]–[4], while out-of-band measurementsare used in [5], [6] for efficient mmWave beam selection. Information from a radar located in theinfrastructure is shown to be beneficial for mmWave link establishment in [7].The use of light detection and ranging (LIDAR) technology is considered in [8], [9]. LIDAR usesa laser to scan the environment and generates a three-dimensional (3D) image with pixels indicating ∗ Indicates equal contribution.This work was supported by the European Research Council (ERC) through project BEACON (grant no 677854). a r X i v : . [ c s . I T ] F e b elative positions from the sensor [10]. Data from the LIDAR sensors mounted on vehicles can beexploited for improved beam-selection in mmWave V2I communications. On the other hand, the lackof analytical models that can relate LIDAR outputs to mmWave channels motivates employing a neuralnetwork (NN)-based approach to this problem. In [8], [9], a NN architecture is trained over simultaneousLIDAR and ray-tracing channel datasets with a top- K classification accuracy metric to identify K beamdirections that include the beam pair with the best channel condition between the vehicle and the basestation (BS) with the highest probability.The approach in [8], [9] is distributed, in the sense that, each vehicle uses the trained NN on themeasurements from its own LIDAR sensor to infer its top- K beam directions. It was shown in [9]that such a distributed approach outperforms centralized beam selection, where a NN at the BS infersthe best beams for all the vehicles in its coverage area either by combining LIDAR data from all thevehicles or using a single LIDAR sensor mounted at the BS. Although the NN performs beam selectioninference in a distributed fashion in [8], [9], it is trained offline on LIDAR and channel measurementsfrom all the vehicles gathered in a centralized dataset. However, in a practical scenario, gathering alarge centralized dataset of individual LIDAR measurements from connected vehicles is challenging asit requires communicating a large amount of LIDAR point cloud data over the uplink channel. Note alsothat a separate NN needs to be trained for the coverage area of each BS as the trained NN is site-specific,and will not perform well even for the same site if significant changes occur in the scattering environment.Therefore, continuous recollection of up-to-date LIDAR data and retraining or fine-tuning the NN weightsis necessary during normal operation of the system. This means that a centralized approach imposes acontinuous overhead on the system for transmitting up-to-date LIDAR measurements to the BS.In this work, we propose fully distributed LIDAR-aided beam selection in V2I mmWave communicationsystems. In the proposed approach, both the inference and training of the NN is performed in a distributedfashion at the vehicles in the coverage area of the BS. We propose a three-phase procedure, which enablesthe vehicles to periodically collect up-to-date data and retrain or fine-tune the NN in a federated mannerduring normal operation of the system. Federated training helps avoid the large communication overheadthat would be imposed by the transmission of LIDAR measurements from the connected vehicles to theBS to gather a centralized dataset for offline training. After the training phase, each vehicle leveragesthe trained NN and its locally available LIDAR data to infer a subset of beams that are most likelyto contain the best transmitter/receiver beam pair. We also propose an alternative reduced-complexityconvolutional NN (CNN) architecture along with LIDAR preprocessing which significantly outperformsprevious works. The proposed architecture achieves a top- classification accuracy of . on thebenchmark Raymobtime dataset [11], which is a significant improvement over the previous works in [8],9], while reducing the number of floating point operations (FLOPs) and parameter complexity of theNN by factors of 100 and 55, respectively. The reduction in the number of trainable NN parametersfacilitates efficient federated training of the proposed architecture during normal operation of the systemwith reduced communication overhead.The rest of the paper is organized as follows: Section II presents the system model. Section IIIpresents our proposed federated LIDAR-aided beam selection scheme. Simulation results are presentedin Section IV. Finally, Section V concludes the paper. For further reproduction of the reported results,our codes are available at: https://github.com/galidor/ITU Beam Selection TFII. S YSTEM M ODEL
We consider a downlink orthogonal frequency division multiplexing (OFDM) mmWave system, wherea BS located on the street curb serves connected vehicles in its coverage area over N c subcarriers. TheBS and the vehicles are equipped with N t and N r antennas, respectively. Denote by H n the downlinkchannel matrix from the BS to a vehicles over the n ’th subcarrier. We assume that both the BS andthe vehicles have antenna arrays with only one radio frequency (RF) chain and fixed beam codebooksand apply analog beamforming. We assume beam codebooks C t = { f i } C t i =1 and C r = { w j } C r j =1 at thetransmitter and the receiver sides, respectively.Utilizing a pair ( i, j ) ∈ C t × C r of precoder and combiner vectors, the resulting channel gain atsubcarrier n is w Hj H n f i , where ( · ) H denotes the conjugate transpose. For the ( i, j ) pair, the normalizedsignal power over all subcarriers is given by y ij = N c (cid:88) n =1 | w Hj H n f i | . (1)Hence, the optimum beam label is b ∗ = ( i ∗ , j ∗ ) = argmax ( i,j ) y ij . Without any side information, thetransmitter and reciever would search through all C t C r beam pairs to identify b ∗ . Our goal is to infera small subset of K beam pairs S = { ( i k , j k ) } Kk =1 ⊂ C t × C r using the available position and LIDARdata, such that b ∗ ∈ S . This results in a reduction of KC t × C r in the search space for beam selection. Inthe next section, we propose a novel NN architecture as well as a federated training approach for top- K beam classification from simultaneous position and LIDAR data.III. F EDERATED B EAM S ELECTION U TILIZING
LIDAR D
ATA
We propose a novel data-driven beam selection scheme, where connected vehicles in the coverage areaof a BS collaborate to train a shared NN for top- K beam classification using their position and LIDARig. 1: The proposed federated LIDAR-based beam selection scheme.data in a distributed manner. Collaborative training is orchestrated by the BS, and takes place duringnormal operation of the network as depicted in Fig. 1. A. Three-Phase Network Operation
Our proposed solution consists of three network operation phases: (i) data collection phase, (ii) federatedtraining phase, and (iii) distributed inference phase.During phase (i), a subset of connected vehicles in the coverage area of the BS, denoted by V = { v } Vv =1 ,each acquires a local dataset D v = { ( P v , B v ) } v ∈V , where P v = { P i } |D v | i =1 contains instances of the pointcloud P i recorded by the LIDAR sensor and B v = { b ∗ i } |D v | i =1 contains the corresponding best beam pairlabels b ∗ i ∈ C t × C r , i.e., index of the best beam pair. Note that during this phase the connected vehiclesemploy exhaustive beam search to identify the optimal beam pair. Although brute-force beam search inthis phase imposes an overhead on the network, it provides accurate beam labels required for training ofthe NN.During phase (ii), the vehicles collaborate in a federated learning scheme to train or fine-tune a sharedNN for top- K beam classification as depicted in Fig. 1. In particular, the vehicles employ federatedaveraging (FedAvg) [12], where a global model is sent to the vehicles by the BS at each round, and thevehicles perform mini-batch stochastic gradient descent (SGD) updates based on their local datasets. Thelocal updates are aggregated by the BS, and used to update the global model for the next round. Theduration of this phase is propotional to the number of global aggregation rounds required to train themodel, denoted by N a . Note that, the vehicles train a single site-specific NN, which learns the statisticalcharacteristics of the coverage area of the BS for efficient beam selection.ig. 2: Preprocessing of the LIDAR point cloud.Finally, in phase (iii), any vehicle in the coverage area of the BS utilizes the trained NN on its localLIDAR data to infer the K beams and reduce the beam search overhead. Note that, in this phase, the BScan use a low frequency control channel to transmit the trained NN model to any new vehicle enteringits coverage area.Each BS in a large network can orchestrate training of a site-specific NN for its own coverage areafollowing the above three phases. This three-phase process continues periodically to enable updating theNN parameters to adapt to the changes in the environment. B. LIDAR and Location Preprocessing
For each scene, the LIDAR sensor mounted on each vehicle outputs a point cloud P = { ( x p , y p , z p ) } | P | p =1 representing obstacles measured by the LIDAR sensor. Each vehicle v also has its own location infor-mation ( x v , y v , z v ) , and the BS location ( x BS , y BS , z BS ) , which is broadcast to all the vehicles. Wepreprocess this data to obtain a tensor of fixed size, which contains both the location and LIDAR dataand is input to the NN for each scene.To reduce both the NN dimension and the computation load, we propose a two-dimensional (2D)representation of the LIDAR measurements, where we partition the coverage area of the BS into a gridof equal-size square cells from the top view. We define the corresponding 2D tensor L , where the cellscontaining the vehicle and the BS are set to -1 and -2, respectively, while each of the remaining cells ispopulated with a 1 if it accommodates at least one of the cloud points, and with a 0 otherwise. We remarkthat this 2D representation discards the height data along the z-axis, resulting in a significant reduction inthe input size, and hence, the complexity of the NN, which in turn reduces the communication overhead forfederated training. Moreover, we observed through experiments that the proposed 2D representation evenimproves the performance in comparison with a 3D representation. Fig. 2 illustrates this preprocessingscheme. on v D ( , , ) B a t c h N o r m P R eL U C on v D ( , , ) B a t c h N o r m P R eL U C on v D ( , , ) B a t c h N o r m P R eL U C on v D ( , , ) B a t c h N o r m P R eL U C on v D ( , , ) B a t c h N o r m P R eL U C on v D ( , , ( , )) B a t c h N o r m P R eL U L i nea r ( , ) R eL U L i nea r ( , ) S o ft m a x Fig. 3: The proposed model architecture.
C. NN Architecture
Our NN architecture consists of 6 convolutional layers followed by 2 linear layers as depicted inFig. 3. In the convolutional layers, we vary the value of stride between 1 and 2, depending on whetherwe intend to downscale the intermediate features, or not. We apply batch normalization and parametricrectified linear unit (PReLU) activation after each convolutional layer. The first linear layer is followedby rectified linear unit (ReLU) activation, and softmax is used at the output to obtain the predictions. Toachieve better generalization, convolutional layers downscale the features and ensure that only essentialinformation is preserved. This helps avoid overfitting to the training data. Note that, to reduce thecommunication overhead for federated training, we have minimized the trainable model parametersutilizing a convolutional structure with limited kernel sizes. We denote the NN model function by π ( L ; θ ) ,which is a vector of length C t C r at the softmax output. L is the preprocessed LIDAR and location inputwhile θ denotes the trainable NN parameters. The best beam is predicted as ˆ b ∗ = arg max b ∈S y b , where theprediction set S is given by the top- K softmax outputs. D. Federated Training
Due to the individual characteristics of a specific vehicle (e.g., its trajectory, dimensions, speed, etc.),its local dataset may not capture all the subtleties of the coverage area. In such cases, the NN trainedsolely on a local dataset D v is highly biased and may not operate reliably for other vehicles enteringthe coverage area of the BS. We exploit the fact that, while each vehicle may capture a limited amountof training data that is biased towards its own specific circumstances, the overall dataset captured byseveral vehicles (i.e., D U = {D v } Vv =1 ) within the coverage area of the BS is more diverse to allowtraining a generalizable NN model for the coverage area in consideration. On the other hand, gatheringa large dataset of LIDAR measurements from various connected vehicles for centralized training at theBS increases the communication overhead, particularly due to the large size of LIDAR point cloudmeasurements. lgorithm 1: FedAvg for LIDAR-assisted beam selection
Init:
Initial parameters θ (0) v = θ (0) , ∀ v ∈ V . for each m = 1 , , . . . do Each vehicle performs a local epoch using mini-batch gradient decent iterations according to(3) ; if m is an integer multiple of N v then Each vehicle v sends g ( m ) v = θ ( m ) v − θ ( m − N v ) v to BS; BS computes θ ( m ) = θ ( m − N v ) + µ |V| (cid:80) g ( m ) v ; BS distributes θ ( m ) such that θ ( m ) v = θ ( m ) , ∀ v ∈ V ; end endOutput: Trained θ ( m ) shared among all vehicles.Based on the above insight, in our proposed scheme, the connected vehicles collaborate to train asingle NN architecture on the overall dataset captured by all the vehicles withing the cell area via theFedAvg algorithm [12].To train our NN classifier we use the empirical cross entropy loss, hence the local loss calculated atvehicle v is given by ψ v ( θ , D v ) = − |D v | |D v | (cid:88) i =1 log[ π ( L i ; θ )] b ∗ i , (2)where [ π ] b denotes the b ’th element of the model’s softmax output. Each connected vehicle performsmini-batch SGD iterations to update its local vector of model parameters, denoted by θ v , via θ ( l ) v = θ ( l − v − ρ l ∇ ψ v ( θ ( l − v , { ( b i l , L i l ) } i l ∈ ,..., |D v | ) , (3)where l is the local iteration index, ρ l > is the local step-size, and the set { ( b i l , L i l ) } i l ∈ ,..., |D v | is amini-batch of the local dataset with i l ∈ , . . . , |D v | . The training consists of N v local epochs at eachvehicle (i.e. N v cycles of training on the vehicle’s local dataset) and N a aggregation rounds at the BSas summarized in Algorithm 1.Such distributed learning orchestrated by the BS during phase (ii) requires the vehicles to periodicallyexchange and synchronize their local model parameters θ v through reliable low-rate communicationswith the BS. This imposes an overhead of communicating O UL = V × N a × | θ | float32 variables in theuplink and O DL = N a × | θ | in the downlink channel. Minimizing the number of trainable parameters | θ | is hence critical to reduce the communication overhead during phase (ii) of the network operation.ABLE I: Comparison between the proposed NN architecture and the baseline in [8], [9], both trainedin a centralized manner.Model Top- accuracy Top- throughput ratio FLOPs | θ | Baseline [8], [9] . ± .
93% 86 . ± .
82% 179 . × Proposed centralized . ± .
28% 94 . ± .
61% 1 . × IV. N
UMERICAL E VALUATIONS
We provide numerical evaluations on the benchmark Raymobtime datasets [11], where we train themodels on samples from dataset s008 and test on those from s009 (refer to [11], [13] for details on thesedatasets e.g. locations, frequencies, etc.). For performance comparison, we use the top- K classificationaccuracy defined as the probability of correctly identifying the optimal beam pair within the top- K outputof the network, and the top- K throughput ratio, R , defined as R (cid:44) ( (cid:80) Tt =1 log (1 + y ˜ i ˜ j )) / ( (cid:80) Tt =1 log (1 + y i ∗ j ∗ )) , where T is the number of test samples, and ( i ∗ , j ∗ ) and (˜ i, ˜ i ) denote the optimum beam pairindex and the best beam pair within the top- K prediction set S , respectively.In Table I, we compare the performance of the NN architecture presented in Subsection III-C with thebaseline architecture proposed in [8], [9], both trained in a centralized manner. In this experiment, wetrained our model using the Adam optimizer [14] with an initial learning rate of − and batch sizeof 16, and train the models for 20 epochs. Besides the learning rate adjustment imposed by the Adamoptimizer, we further reduce the learning rate by a factor of 10 after the 10th epoch. In Table I, wepresent 95% confidence intervals for Top- accuracy and throughput ratio of the models calculated from10 Monte Carlo simulations.According to Table I, our proposed architecture not only outperforms those in [8] and [9] in terms ofboth the top- accuracy and the throughput ratio, but also significantly reduces the complexity of themodel. Our architecture reduces the FLOPs and the number of trainable parameters roughly by factorsof 100 and 55, respectively. Such a significant reduction in the number of trainable model parametersis specifically desirable in federated training as it leads to a significant reduction of the communicationoverhead.Remember that the beam search complexity of these schemes depends on K , the size of the predictionset. Figure 4 plots the top- K accuracy and throughput ratio for the proposed and baseline architectures as afunction of K , when trained in a centralized fashion. It is observed that, our proposed model architecturesignificantly outperforms [8], [9], e.g., to achieve a throughput ratio R ≥ , our proposed modelarchitecture requires K ≥ while the baseline needs K ≥ . This is more than 5 times reduction inthe required search space for beam selection. Also, the proposed architecture can achieve close to of the optimal throughput with K = 1 ; that is, with no beam search at all.ig. 4: Top- K accuracy and throughput ratio as a function of K .We next evaluate the performance of our proposed federated beam selection scheme. To generate thelocal dataset at each connected vehicle v , we choose |D v | = 11000 /V samples from the training sets008 uniformly at random, where is the total number of samples in s008. We use mini-batch SGDwith an initial learning rate of . and exponential rate decay of . with a batch size of 16 for localoptimization at the vehicles. We set the learning rate µ = 0 . for aggregation at the BS.We provide the performance tradeoffs for our proposed federated beam selection scheme in Table II.The notation ( N a ) . in this table represents the number of global aggregation rounds required for thetraining to achieve a top- accuracy larger than . This is an important measure as it determines thecommunication overhead required to train the model to the specified accuracy. Notations ( O DL ) . and ( O UL ) . used in this table represent this overhead in terms of the number of float32 variables neededto be communicated over the downlink and uplink channels, respectively.According to Table II, the number of aggregation rounds required to achieve top- accuracy largerthan increases when more vehicles take part in federated training. A larger ( N a ) . increases thecommunication overhead and duration of the training phase. On the other hand, the duration of the datacollection phase decreases with V . This is because we keep the total number of samples the same acrossall cases, and hence, increasing V means that each vehicle needs to collect fewer samples. Here, weassume that there are always V vehicles in the cell that can participate in the training process. This leadsABLE II: Performance tradeoffs for federated beam selection. V N v ( N a ) . ( O DL ) . ( O UL ) . Top- Acc.5 | θ | | θ | . | θ | | θ | . | θ | | θ | . | θ | | θ | . | θ | | θ | . | θ | | θ | . | θ | | θ | . | θ | | θ | . NA NA NA . to a tradeoff between the duration of the data collection and training phases, and the communicationoverhead. For practical deployment, the number of vehicles participating in federated training should bedecided according to the requirements of the system and the amount of communication overhead that canbe afforded. We note here that, thanks to our simple NN architecture, which only has | θ | = 7462 trainableparameters, the maximum communication overhead required for federated training (i.e., × ∼ . × float32 communications for V = 20 , N v = 1 ) is orders of magnitude smaller than the overheadthat would be imposed by offloading the LIDAR point clouds to the BS for centralized training (i.e., ∼ × float32 communications for samples in s008).The last column in Table II reports the final top-10 accuracy achieved for each number of vehicles V and local epochs N v . This column shows a slight performance degradation when more users take part infederated training. This is due to the limited number of training samples available to each vehicle whenthe same number of training samples (e.g., 11K samples available in s008) are distributed among morevehicles. Increasing N v in this case tends to overfit to local datasets, which do not efficiently representthe true distribution of the data leading to some performance degradation. This can be mitigated if moredata can be collected by each vehicle. V. C ONCLUSIONS
We have studied efficient link configuration in mmWave V2I communication networks, and consideredexploiting side information in the form of LIDAR and position data in a supervised learning schemeto reduce the beam search overhead. In this letter, we first proposed LIDAR preprocessing and aconvolutional NN architecture that improves the state-of-the-art classification accuracy with a significantlyreduced model complexity. We have then proposed a federated training scheme that enables connectedvehicles to collaboratively train a shared NN on their locally available LIDAR data. Once the NN iscollaboratively trained, any vehicle entering the coverage area of the BS can employ it to reduce thebeam search overhead.
EFERENCES [1] V. Va, J. Choi, T. Shimizu, G. Bansal, and R. W. Heath, “Inverse multipath fingerprinting for millimeter wave V2I beamalignment,”
IEEE Trans. on Vehicular Technology , vol. 67, no. 5, pp. 4042–4058, 2018.[2] W. B. Abbas and M. Zorzi, “Context information based initial cell search for millimeter wave 5G cellular networks,” in
European Conf. on Networks and Comms. (EuCNC) , 2016, pp. 111–116.[3] J. C. Aviles and A. Kouki, “Position-aided mm-Wave beam training under NLOS conditions,”
IEEE Access , vol. 4, pp.8703–8714, 2016.[4] A. Loch, A. Asadi, G. H. Sim, J. Widmer, and M. Hollick, “mm-Wave on wheels: Practical 60 GHz vehicular communicationwithout beam training,” in
Int’l Conf. on Communication Systems and Networks (COMSNETS) , 2017, pp. 1–8.[5] T. Nitsche, A. B. Flores, E. W. Knightly, and J. Widmer, “Steering with eyes closed: Mm-Wave beam steering withoutin-band measurement,” in
IEEE Conf. on Computer Comms. (INFOCOM) , 2015, pp. 2416–2424.[6] A. Ali, N. Gonz´alez-Prelcic, and R. W. Heath, “Millimeter wave beam-selection using out-of-band spatial information,”
IEEE Trans. on Wireless Comms. , vol. 17, no. 2, pp. 1038–1052, 2018.[7] N. Gonz´alez-Prelcic, R. M´endez-Rial, and R. W. Heath, “Radar aided beam alignment in mmWave V2I communicationssupporting antenna diversity,” in
Info. Theory and Apps. Workshop (ITA) , 2016, pp. 1–7.[8] A. Klautau, N. Gonz´alez-Prelcic, and R. W. Heath, “LIDAR data for deep learning-based mmWave beam-selection,”
IEEEWireless Comms. Letters , vol. 8, no. 3, pp. 909–912, 2019.[9] M. Dias, A. Klautau, N. Gonz´alez-Prelcic, and R. W. Heath, “Position and LIDAR-aided mmWave beam selection usingdeep learning,” in
IEEE 20th Int’l Workshop on Signal Proc. Adv. in Wireless Comms. (SPAWC) , 2019, pp. 1–5.[10] J. Choi, V. Va, N. Gonzalez-Prelcic, R. Daniels, C. R. Bhat, and R. W. Heath, “Millimeter-wave vehicular communicationto support massive automotive sensing,”
IEEE Comms. Magazine
Proceedings of the 20th Int’l Conf. on Artificial Intelligence and Statistics , 2017, pp. 1273–1282.[13] A. Klautau, P. Batista, N. Gonz´alez-Prelcic, Y. Wang, and R. W. Heath, “5g mimo data for machine learning: Applicationto beam-selection using deep learning,” in
Info. Theory and Apps. Workshop (ITA) , 2018, pp. 1–9.[14] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980arXiv preprint arXiv:1412.6980