[PDF] Enhanced Beam Alignment for Millimeter Wave MIMO Systems: A Kolmogorov Model

Abstract

We present an enhancement to the problem of beam alignment in millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems, based on a modification of the machine learning-based criterion, called Kolmogorov model (KM), previously applied to the beam alignment problem. Unlike the previous KM, whose computational complexity is not scalable with the size of the problem, a new approach, centered on discrete monotonic optimization (DMO), is proposed, leading to significantly reduced complexity. We also present a Kolmogorov-Smirnov (KS) criterion for the advanced hypothesis testing, which does not require any subjective threshold setting compared to the frequency estimation (FE) method developed for the conventional KM. Simulation results that demonstrate the efficacy of the proposed KM learning for mmWave beam alignment are presented.

Full PDF

EEnhanced Beam Alignment for Millimeter WaveMIMO Systems: A Kolmogorov Model

Qiyou Duan , Taejoon Kim , and Hadi Ghauch Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS − , USA Department of COMELEC, Telecom-ParisTech, Paris, FranceEmail: [email protected] , [email protected] , [email protected] Abstract —We present an enhancement to the problem ofbeam alignment in millimeter wave (mmWave) multiple-inputmultiple-output (MIMO) systems, based on a modiﬁcation ofthe machine learning-based criterion, called Kolmogorov model(KM), previously applied to the beam alignment problem.Unlike the previous KM, whose computational complexity isnot scalable with the size of the problem, a new approach,centered on discrete monotonic optimization (DMO), is proposed,leading to signiﬁcantly reduced complexity. We also present aKolmogorov-Smirnov (KS) criterion for the advanced hypothesistesting, which does not require any subjective threshold settingcompared to the frequency estimation (FE) method developedfor the conventional KM. Simulation results that demonstratethe efﬁcacy of the proposed KM learning for mmWave beamalignment are presented.

I. I

NTRODUCTION

A fundamental bottleneck in operating large-dimensionalmillimeter wave (mmWave) array antenna systems is how toaccurately align beams between the transmitter and receiver inlow latency [1], [2]. The use of directional narrow beams forsearching the entire beam space (also called exhaustive beamsearch) is an extremely time-consuming operation; the exhaus-tive beam search has been used in existing mmWave WiFistandards including IEEE 802.15.3c [3] and IEEE 802.11ad[4], for example. For reduced overhead beam alignment,hierarchical codebooks [2], [5], compressed sensing-basedalgorithms [6], [7], overlapped beam pattern [8] and beamcoding [9] have been proposed over the years, establishing a“structured beam alignment” paradigm. Despite a plethora ofsuch beam alignment methods, the overhead issue still remainsa critical challenge in mmWave communications.Recently, the beam alignment problem has been approachedin a statistical-machine-learning point-of-view [10], with a pri-mary focus on an application of the Kolmogorov model (KM)[11]. In [10], Kolmogorov elementary representations (KERs)of the received signal power values that are associated with thebeam pairs in a training beam codebook are learned by solvinga constrained error minimization problem. In doing so, theKERs of unsounded beam pairs are predicted by exploiting thepredictive power of the KM, leading to a signiﬁcantly reducedbeam alignment overhead. However, there are two fundamen-tal limitations to the conventional KM learning in the beamalignment context. First, the computational complexity of theKM training algorithm in [10], [11] is prohibitively high; the complexity is not scalable with the number of antennasand the size of codebooks. Second, the initial work in [10]centers on a frequency estimation (FE) method to estimateempirical probabilities of the training set, which has to rely ona threshold setting for hypothesis testing; the threshold valueis treated as a hyper-parameter, which is determined basedon numerical simulations. Ultimately, the desired thresholdsetting must account for a speciﬁc performance criterion soas to improve the predictive power of KM.In fact, in mmWave-based systems, quality of service isprimarily dominated by latency [12]. In particular, the re-quirements of low latency and overhead are perhaps evenmore critical than those for high throughput. Motivated bythis, we propose an enhancement to the problem of mmWavemultiple-input multiple-output (MIMO) beam alignment byleveraging discrete monotonic optimization (DMO) frame-works [13], [14], leading to a signiﬁcantly reduced amountof computational complexity compare to the previous KM[10]. We also propose a new threshold approach to obtainingempirical probabilities of the training set, which improves theperformance of hypothesis testing for the FE of KM. Ourapproach is based on utilizing the Kolmogorov-Smirnov (KS)test criterion [15], [16], which is desired because it can set adetection threshold without access to a priori knowledge.The remainder of the paper is organized as follows. InSection II, we introduce the system model and brieﬂy reviewthe related work on the KM-based beam alignment. In SectionIII, we propose the DMO algorithm to solve the KM learningoptimization problem and provide a new method buildingthe empirical training statistics via the KS test. In SectionIV, simulation results are presented to illustrate the superiorperformance of the proposed algorithm. Finally, we concludethe paper in Section V.II. S

YSTEM M ODEL AND P REVIOUS W ORK

We present the beam alignment system model and providean overview of the previous work under consideration.

A. System Model

Suppose a point-to-point mmWave MIMO system wherean independent block fading channel with a coherence blocklength T B (channel uses) is assumed. The transmitter andreceiver are equipped with N t and N r antennas, respectively. a r X i v : . [ ee ss . SP ] J u l or simplicity, we adopt a low-complexity architecture whereonly one radio-frequency (RF) chain is employed at both thetransmitter and receiver sides.During a coherence block T B , the transmitter and receiverintend to spend K ( K (cid:28) T B ) channel uses to align thebest transmit and receive beam pair for data transmission.To be speciﬁc, the transmitter and receiver choose an analogbeamformer f t ∈ C N t × and combiner w r ∈ C N r × from thepre-designed beam sounding codebooks F and W such that f t ∈ F and w r ∈ W , respectively. We denote the index setsof F and W as I F and I W , respectively, with cardinalities |I F | and |I W | . Assume that f t and w r are unit-norm, i.e., (cid:107) f t (cid:107) = (cid:107) w r (cid:107) = 1 . The received signal associated with thebeam pair ( f t , w r ) is therefore given by y t,r = w ∗ r ( Hf t s t + n ) = w ∗ r Hf t s t + n r , ∀ ( t, r ) ∈ I F ×I W , (1)where H ∈ C N r × N t is the channel matrix and s t ∈ C isthe training symbol satisfying (cid:107) f t s t (cid:107) = 1 . n ∈ C N r × isthe additive complex white Gaussian noise vector with eachentry independently and identically distributed (i.i.d.) as zeromean and σ n variance according to CN (0 , σ n ) . n r (cid:44) w ∗ r n ∼CN (0 , σ n ) is the effective additive noise, and thus, the signal-to-noise ratio (SNR) is /σ n .Exhaustive beam alignment (beam sounding) is a widelyused method: the transmitter and receiver jointly sound allthe beams in F and W to ﬁnd the optimal beam pair thatmaximizes the received signal power ( f t (cid:63) , w r (cid:63) ) = argmax ( f t , w r ) , ( t,r ) ∈I F ×I W { η t,r (cid:44) | y t,r | } . In fact, the training overhead for the exhaustive methodis |I F × I W | . Since the size of the codebooks |I F | and |I W | is large in mmWave cellular networks, the drastictraining overhead of exhaustive beam alignment overwhelmsthe available coherent channel resources. To tackle this issue,a learning-based approach, KM, was proposed to reduce thebeam alignment overhead while maintaining appreciable beamalignment performance [10]. B. Previous Work: KM-Based Beam Alignment

A binary random variable X t,r ∈ { , } is introducedto indicate the “good” and “poor” quality of the beam pair ( f t , w r ) for ( t, r ) ∈ I F × I W as (cid:40) Pr( η t,r ≥ τ ) = Pr( X t,r = 1)Pr( η t,r < τ ) = Pr( X t,r = 0) , where Pr( E ) ∈ [0 , denotes the probability of the event E , τ is a pre-designed threshold value for the received signal power.We say that the beam pair ( f t , w r ) has a “good” SNR, if η t,r ≥ τ . Because Pr( X t,r = 1) + Pr( X t,r = 0) = 1 , it sufﬁces tofocus on the case when X t,r = 1 . The D -dimensional KERof X t,r is then deﬁned by [11] Pr( X t,r = 1) = θ Tt ψ r , ∀ ( t, r ) ∈ I F × I W , (2)where the probability mass function vector θ t is on the unitprobability simplex P , i.e., θ t ∈ R D + and T θ t = 1 , is the Tx Rx

Original codebooks:Subsampled training codebooks:Build the training set: Learn the KM parameters:Probabilities prediction:Optimal beam pair selection:

Training setTest setSelected beam pairUnsounded beam pairs

Fig. 1. Diagram of KM-based beam alignment ( |I F | = |I W | = 4) . all-one vector with dimension D , and ψ r ∈ B D denotes thebinary indicator vector of dimension D such that the d th entryof ψ r is ψ r,d ∈ { , } .The beam alignment using KM relies on the subsampledcodebooks with index sets I train F and I train W , such that I train F ⊂I F and I train W ⊂ I W , and have much smaller sizes, |I train F | (cid:28)|I F | and |I train W | (cid:28) |I W | [10]. We let the empirical probabilitythat beam pair ( f t , w r ) has a “good” SNR be p t,r . In [10], aFE method was proposed to build the training set of empiricalprobabilities of beam pairs in the subsampled codebooks forthe KM learning algorithm, i.e., { p t,r } , ∀ ( t, r ) ∈ I train F ×I train W .Given the FE interval T FE , the estimate of p t,r at time-slot ϕ ,i.e., p ( ϕ ) t,r , is provided by p ( ϕ ) t,r = 1 ϕ ϕ (cid:88) l =1 I ( η ( l ) t,r ≥ τ ) , ϕ ∈ { , . . . , T FE } , (3)where η ( l ) t,r is the received signal power obtained by soundingthe beam pair ( f t , w r ) at time-slot l ∈ { , . . . , ϕ } and I ( · ) denotes the indicator function. The best FE estimate comesfrom p ( T FE ) t,r , which is carried out at the end of the estimationinterval.Once the training set (of empirical probabilities) is con-structed, the KM learning algorithm proceeds to optimizethe KM parameter vectors { θ t } and { ψ r } by solving theconstrained error minimization problem: { ˆ θ t } , { ˆ ψ r } = argmin { θ t } , { ψ r } (cid:88) ( t,r ) ∈I train F ×I train W ( θ Tt ψ r − p t,r ) s.t. θ t ∈ P , ∀ t ∈ I train F , ψ r ∈ B D , ∀ r ∈ I train W . (4)In order to handle the coupled non-convex combinatorialoptimization in (4), a block-coordinate descent (BCD) method[10], [11] was proposed by dividing the problem in (4) intotwo subproblems: (i) linearly-constrained quadratic program(LCQP): min θ t ∈P θ Tt S t θ t − θ Tt v t + ρ t , (5)here S t (cid:44) (cid:80) r ∈I train W ψ r ψ Tr , v t (cid:44) (cid:80) r ∈I train W ψ r p t,r , and ρ t (cid:44) (cid:80) r ∈I train W p t,r , and (ii) binary quadratic program (BQP): min ψ r ∈ B D ψ Tr S r ψ r − v Tr ψ r + ρ r , (6)where S r (cid:44) (cid:80) t ∈I train F θ t θ Tt , v r (cid:44) (cid:80) t ∈I train F θ t p t,r and ρ r (cid:44) (cid:80) t ∈I train F p t,r . The KM solves the two subproblems in (5)and (6) in an alternative way and iteratively reﬁnes the KMparameters { θ t } and { ψ r } . More speciﬁcally, by exploitingthe fact that the optimization in (5) is carried out over theunit probability simplex, a simple iterative Frank-Wolfe (FW)algorithm [17] was proposed to optimally solve (5), whilethe semi-deﬁnite relaxation with randomization (SDRwR) wasemployed to optimally solve (6) asymptotically in D [18].We let { ˆ θ t , ˆ ψ r } be the learned KM parameters to theproblem in (4). The predictive power of KM is exploited toinfer the probabilities of the test set (i.e., beam pairs whichare not sounded) as ˆ p t,r (cid:44) ˆ θ Tt ˆ ψ r , ∀ ( t, r ) ∈ ( I F × I W ) \ ( I train F × I train W ) . (7)Finally, the optimal beam pair with the highest probabilityof having a “good” SNR is selected by evaluating both thetraining and test sets as ( t (cid:63) , r (cid:63) ) = argmax ( t,r ) ∈I F ×I W { ˆ p t,r = ˆ θ Tt ˆ ψ r } . (8)A diagram of the KM-based beam alignment, which concep-tually visualizes the system model and the framework, can befound in Fig. 1.

1) Desired Attributes of KM:

There are three main advan-tages of KM that make it superior to other data representationssuch as matrix factorization (MF) [19], SVD-based represen-tations [20], and nonnegative MF [21]: (i) the fact that the KMin (2) represents an actual probability is exploited to modelthe quality of beam pairs in terms of SNR, (ii) KM offersimproved prediction performance over nonnegative MF [22],and (iii) the interpretability of the KM in (2) , namely, theinsight that it exhibits about the data, which is not possiblewith other learning methods that fall under the black-box type.

2) Main Contribution of This Work:

While the SDRwRmethod in solving (6) is asymptotically optimal [10], [11] itdemands huge computational cost and thus violates the low-latency requirement in the mmWave communications [12].Moreover, the lack of an appropriate threshold design criterionof the FE method in [10] limits the beam alignment perfor-mance of the KM-based approach. To address the above lim-itations, we ﬁrst propose an enhanced KM learning algorithmfor beam alignment by leveraging DMO. A novel empiricalprobability estimation method based on the KS test is thenprovided with a proper threshold selection criterion. The pro-posed algorithm exhibits better beam alignment performancewith a signiﬁcantly reduced computational time compared tothe existing work. III. P

ROPOSED A LGORITHM

To reduce the prohibitively high computational cost ofSDRwR, in this section, a DMO framework is proposed.Moreover, a new method based on the KS test is presented.

A. Discrete Monotonic Optimization

Prior to delivering the proposed algorithm, we provide alemma showing an equivalent reformulation of the problemin (6).

Lemma 1:

The BQP problem in (6) is equivalent to themaximization of a difference of two monotonically increasingfunctions and the binary constraints ψ r ∈ B D in (6) isequivalently transformed to continuous monotonic constraints: max ψ r (cid:8) f ( ψ r ) = f + ( ψ r ) − f − ( ψ r ) (cid:9) s.t. g ( ψ r ) − h ( ψ r ) ≤ , ψ r ∈ [ , ] , (9)where f + ( ψ r ) (cid:44) v Tr ψ r , f − ( ψ r ) (cid:44) ψ Tr S r ψ r , g ( ψ r ) (cid:44) (cid:80) Dd =1 ψ r,d , h ( ψ r ) (cid:44) (cid:80) Dd =1 ψ r,d , and ψ r ∈ [ , ] indicatesthat ≤ ψ r,d ≤ for every d = 1 , . . . , D . Proof

Given the deﬁnition of f + and f − in (9), the objectivefunction f in (9) is attained by transforming the minimizationto the maximization and discarding the constant ρ r in (6).Also, f + and f − are both increasing functions with respect to ψ r ∈ [ , ] because v r > and S r is a positive semi-deﬁnitematrix. The binary constraints ψ r,d ∈ { , } , d = 1 , . . . , D ,can be equivalently rewritten as (cid:80) Dd =1 ψ r,d (1 − ψ r,d ) ≤ , ψ r,d ∈ [0 , , ∀ d , i.e., g ( ψ r ) − h ( ψ r ) ≤ , ψ r ∈ [ , ] in(9), where g and h are increasing on R D + . This completes theproof.The BQP problem in (6) cannot be directly handled due tothe discrete constraints. In [10], this nuisance has been tackledby using SDRwR, which incurs impractical computationalcomplexity. Unlike SDRwR, the equivalent problem formula-tion leveraging the difference of monotonic functions (DMF)in (9) disinvolves the intractable discrete constraints withoutany relaxation. Motivated by Lemma 1, we propose to usea branch-reduce-and-bound (BRB) approach [13] to directlysolves (9) without any relaxation and/or randomization. Aswill be seen in Fig. 2 in Section IV, the proposed DMO algo-rithm can substantially reduce the computational complexity(two-orders-of-magnitude improvement in time complexity).We introduce the following three main steps at each iterationin the proposed DMO algorithm, where the overall procedureis presented in detail in Algorithm 1.

1) Reduction:

We let M = [ a , b ] be one of the boxes thatcontain feasible solutions to (9) and ν be the current maximumvalue of the objective function f in (9). The reduced box M (cid:48) = [ a (cid:48) , b (cid:48) ] ⊂ [ a , b ] can be deﬁned by new lower andupper vertices a (cid:48) and b (cid:48) , respectively, without excluding any lgorithm 1 DMO Algorithm

Input: S r , v r , and D . Output: ψ (cid:63)r . Initialization: Set iteration number i = 1 . Let P i = { M } , M = [ , ] , R i = φ , and ν = f ( ) = 0 . Reduction: Reduce each box in P i according to (10) and(11) to obtain P (cid:48) i = { [ a (cid:48) , b (cid:48) ] | [ a , b ] ∈ P i } . Bounding: Calculate µ ( M (cid:48) ) in (12) for each M (cid:48) ∈ M i (cid:44) P (cid:48) i ∪ R i . Find the feasible solution: ψ ( i ) r = argmax ψ r { f ( ψ r ) >ν | ψ r = (cid:100) ( a (cid:48) + b (cid:48) ) / (cid:101) , M (cid:48) = [ a (cid:48) , b (cid:48) ] ∈ M i } . Update current best value: If ψ ( i ) r in Step 4 exists, update ν as ν = f ( ψ ( i ) r ) ; otherwise, ψ ( i ) r = ψ ( i − r and ν doesn’tchange. Discarding: Delete every M (cid:48) ∈ M i such that µ ( M (cid:48) ) < ν and let R i +1 be the collection of remaining boxes. if R i +1 = φ then terminate and return ψ (cid:63)r = ψ ( i ) r . else Let M ( i ) = argmax M (cid:48) { µ ( M (cid:48) ) | M (cid:48) ∈ R i +1 } . if ν ≥ εµ ( M ( i ) ) then ε -accuracy is reached and return ψ (cid:63)r = ψ ( i ) r . else Branching: Divide M ( i ) into M ( i )1 and M ( i )2 ac-cording to (13) and (14). Update R i +1 and P i +1 : R i +1 = R i +1 \ M ( i ) and P i +1 = { M ( i )1 , M ( i )2 } . end if end if i = i + 1 and return to Step 2.feasible solution ψ r ∈ [ a , b ] , while maintaining f ( ψ r ) ≥ ν [13] as a (cid:48) = b − D (cid:88) d =1 α d ( b d − a d ) e d , (10) b (cid:48) = a (cid:48) + D (cid:88) d =1 β d ( b d − a (cid:48) d ) e d , (11)where α d = sup { α | α ∈ [0 , , g ( a ) − h ( b − α ( b d − a d ) e d ) ≤ , f + ( b − α ( b d − a d ) e d ) − f − ( a ) ≥ ν } and β d = sup { β | β ∈ [0 , , g ( a (cid:48) + β ( b d − a (cid:48) d ) e d ) − h ( b ) ≤ , f + ( b ) − f − ( a (cid:48) + β ( b d − a (cid:48) d ) e d ) ≥ ν } for d = 1 , . . . , D , where e d is the d thcolumn of the D -dimensional identity matrix I D . Note that theoptimal values of α d and β d can be found by referring to thecompactness of α, β ∈ [0 , and utilizing the monotonicity of f + , f − , g , and h (for instance, by using a bisection method)[14].

2) Bounding:

For every reduced box M (cid:48) , an upper boundof ν ( M (cid:48) ) (cid:44) max { f ( ψ r ) | g ( ψ r ) − h ( ψ r ) ≤ , ψ r ∈ M (cid:48) ∩ [ , ] } is calculated such that ν ( M (cid:48) ) ≤ µ ( M (cid:48) ) = f + ( b (cid:48) ) − f − ( a (cid:48) ) . (12)The upper bound µ ( M (cid:48) ) in (12) holds because f + and f − are monotonically increasing functions. Furthermore, µ ( M (cid:48) ) Algorithm 2

Enhanced KM Learning for Beam Alignment

Input: F , W , I train F , I train W , D , L , α , and T KS . Output: ( t (cid:63) , r (cid:63) ) . Estimate the empirical probabilities via KS test: for each ϕ = 1 , . . . , T KS do for each beam-index pair ( t, r ) ∈ I train F × I train W do Train the beam pair ( f t , w r ) and obtain Z ( l ) t,r , l ∈{ , . . . , ϕ } as in (15) based on [ η (1) t,r , · · · , η ( L ) t,r ] . Compute the empirical probabilities according to(16). end for end for Learn the KM parameters: for i = 1 , . . . , I do Update θ ( i ) t via the FW algorithm [17]; Update ψ ( i ) r via Algorithm 1. end for Obtain the ﬁnal estimate { ˆ θ t = θ ( I ) t , ˆ ψ r = ψ ( I ) r } . Compute the predicted probability for the beam pairswhich are not trained yet based on (7).

Determine the optimal beam index pair as in (8). return ( t (cid:63) , r (cid:63) ) .ensure lim k →∞ µ ( M (cid:48) k ) = f ( ψ (cid:63)r ) , where { M (cid:48) k } stands forany inﬁnite nested sequence of boxes and ψ (cid:63)r is the optimalsolution to (9). At each iteration, any box M (cid:48) with µ ( M (cid:48) ) < ν is deleted because such a box does not contain ψ (cid:63)r anymore.

3) Branching:

At the end of each iteration, the box withthe maximum upper bound, denoted by M (cid:63) = [ a (cid:63) , b (cid:63) ] , isselected and branched to accelerate the convergence of thealgorithm. The box M (cid:63) is divided into two boxes M (cid:63) = { ψ r ∈ M (cid:63) | ψ r,j ≤ (cid:98) c (cid:63)j (cid:99)} , (13) M (cid:63) = { ψ r ∈ M (cid:63) | ψ r,j ≥ (cid:100) c (cid:63)j (cid:101)} , (14)where j = argmax d =1 ,...,D ( b (cid:63)d − a (cid:63)d ) , c ∗ j = ( a (cid:63)j + b (cid:63)j ) / , (cid:98)·(cid:99) and (cid:100)·(cid:101) represent the element-wise ﬂoor and ceiling operations,respectively.The DMF optimization problem in (9) is solved by itera-tively executing the latter three procedures until it convergeswithin ε -accuracy as shown in Algorithm 1. B. Kolmogorov-Smirnov Test

The choice of τ in (3) has a profound impact on thebeam alignment performance of the KM-based approach.The threshold value τ has been chosen subjectively basedon numerical simulations [10], which can substantially varydepending on the channel conditions and operating SNR.There lacks an appropriate selection criterion due in part tothe fact that the statistics of η t,r are unknown in practice.We overcome this difﬁculty by proposing, in this subsection,to estimate the trained empirical probabilities { p t,r } by ap-plying the detection-theoretic criterion for threshold settingintroduced by Kolmogorov and Smirnov [15], [23].e ﬁrst deﬁne the binary hypotheses of a beam pair ( f t , w r ) , ∀ ( t, r ) ∈ I train F × I train W according to the signalmodel in (1) as H : η t,r = | n r | H : η t,r = | w ∗ r Hf t s t + n r | , where the null hypothesis H is declared when η t,r relies onnoise only and the alternative hypothesis H is true when η t,r is a function of both the signal and noise. While, under H ,given n r ∼ CN (0 , σ n ) , the theoretical cumulative distributionfunction (CDF) of η t,r is given by F ( η t,r |H ) = 1 − e − ηt,rσ n , the test statistics under H is unknown. To circumvent thisdifﬁculty, KS test forms the empirical CDF of η t,r from theobserved data samples η (1) t,r , · · · , η ( L ) t,r , F L ( x ) = 1 L L (cid:88) (cid:96) =1 I ( η ( (cid:96) ) t,r ≤ x ) , where L denotes the number of the data samples in the KStest, which is distinguished from the time interval T FE in (3).The KS criterion to estimate the best sample point is givenby Z t,r = max x ∈ R | F L ( x ) − F ( x |H ) | . (15)The binary hypothesis test is then Z t,r H ≷ H (cid:15) , where (cid:15) is the KSthreshold value. Similar to conventional Neyman-Pearson, thethreshold (cid:15) is chosen to meet the target false alarm rate α such that α (cid:44) Pr( Z t,r ≥ (cid:15) |H ) ≈ e − L(cid:15) , where the last step is due to the Kolmogorov approxima-tion [24]. The approximation becomes tight as L tends tolarge such that the KS threshold can be determined by (cid:15) = (cid:113) − ln( α/ L . Finally, similar to (3), the KS-estimatedempirical probability at time-slot ϕ for any beam index pair ( t, r ) ∈ I train F × I train W is, therefore, given by p ( ϕ ) t,r = 1 ϕ ϕ (cid:88) l =1 I ( Z ( l ) t,r ≥ (cid:15) ) , ϕ ∈ { , . . . , T KS } , (16)where Z ( l ) t,r is the detection statistic obtained by (15) at time-slot l ∈ { , . . . , ϕ } and T KS denotes the KS estimationinterval. Remark 1:

The key implication of the KS criterion in (15)is three folds: (i) the maximum value Z t,r converges to almost surely when L tends to inﬁnity if the data samplesfollows the distribution F ( η t,r |H ) , (ii) the distribution of Z t,r does not depend on the underlying CDF being tested, and(iii) the maximum of difference between the CDFs stands fora jump/concentration in probability and thus becomes morerepresentative to tell the difference of distribution comparedto other statistics such as minimum and median. SR = 25% SR = 10% T i m e C on s u m p t i on ( s ) KM learning with SDRwR [10, Algorithm1] (D=4)KM learning with SDRwR [10, Algorithm1] (D=8)KM learning with DMO (Algorithm 2) (D=4)KM learning with DMO (Algorithm 2) (D=8)

Fig. 2. Time consumption comparison between the conventional KM learningwith SDRwR [10, Algorithm 1] and the proposed KM learning with DMO(i.e., Algorithm 2) ( N t = N r = |I F | = |I W | = 16 , T FE = T KS = 8 , L = 5 , α = 0 . and τ = 12 dB). SNR (dB)-15 -10 -5 0 5 10 15 B ea m f o r m i ng G a i n ( d B ) -10-505101520253035 Exhaustive search (SR = 100%)Conventional KM learning [10, Algorithm 1] ( τ = 12 dB)Conventional KM learning [10, Algorithm 1] ( τ = 6 dB)KM learning with DMO (Algorithm 2) ( α = 0.05) Fig. 3. Beamforming gain comparison between the conventional KM learning[10, Algorithm 1] and Algorithm 2 ( N t = N r = |I F | = |I W | = 16 , D = 8 , T FE = T KS = 8 , L = 5 , and SR = 25% ). Incorporating Algorithm 1 to solve the BQP in (6) and theKS test in (16) to estimate the empirical probabilities in (3)instead of FE, we are ready to elucidate the overall proposedbeam alignment procedure in Algorithm 2.IV. S

IMULATION R ESULTS

In this section, we provide the numerical results of theproposed beam alignment approach in mmWave MIMO chan-nels. We adopt the physical representation of sparse mmWaveMIMO channels [1], [5] and assume that the rank of thechannel matrix is . We set N t = N r = |I F | = |I W | , T FE = T KS = 8 , and L = 5 throughout the simulation.The sampling rate, deﬁned as the ratio of the number ofbeam pairs in the subsampled training codebook to the totalnumber of the beam pairs in the original codebook, is givenby SR = |I train F × I train W | / |I F × I W | . We obtain the numericalresults by conducting Monte Carlo simulations.In Fig. 2, the average time (in seconds) consumed to executeAlgorithm 2 (i.e., the proposed KM learning with DMO) iscompared with the conventional KM learning with SDRwR

NR (dB)-15 -10 -5 0 5 10 15 B ea m f o r m i ng G a i n ( d B ) Exhaustive search (SR = 100%)KM learning with KS test ( α = 0.05)KM learning with KS test ( α = 0.1)KM learning with FE ( τ = 12 dB) Fig. 4. Beamforming gain comparison between Algorithm 2 and the KMlearning with FE ( N t = N r = |I F | = |I W | = 64 , D = 8 , T FE = T KS =8 , L = 5 , and SR = 25% ). (i.e., Algorithm in [10]) for SR = 25% , and D = 4 , ,respectively. Notice that we measure the running time byusing “cputime” function in MATLAB. We set the target falsealarm rate α = 0 . for the KS test in Algorithm 2 and τ = 12 dB for the FE in [10, Algorithm 1] to obtain theempirical probabilities for the training set. We further assume N t = N r = |I F | = |I W | = 16 here. It is clear fromFig. 2 that the proposed Algorithm 2 substantially acceleratesthe computational speed compared to the conventional KMlearning with SDRwR [10, Algorithm 1]; more than timesof improvement is observed.In Fig. 3, the average beamforming gains of the con-ventional KM learning algorithm [10, Algorithm 1] and theproposed Algorithm 2 are evaluated for N t = N r = |I F | = |I W | = 16 , D = 8 , and SR = 25% , where given the selectedbeam pair ( f t (cid:63) , w r (cid:63) ) , based on each algorithm, the beamform-ing gain is calculated from (1) by G t (cid:63) ,r (cid:63) = (cid:107) w ∗ r (cid:63) Hf t (cid:63) (cid:107) /σ n .In Fig. 3, the curves of the conventional KM learning areevaluated for different threshold values τ = 6 , dB, whilethe curve of Algorithm 2 is evaluated for α = 0 . . More-over, the performance of the exhaustive search, a benchmark,consuming |I F ×I W | channel uses for the beam alignment, isalso presented. As can be seen from Fig. 3, Algorithm 2 showsan improvement compared to the conventional KM learningwith substantially reduced complexity.The efﬁcacy of the proposed KS test in improving theproposed KM learning capability is further evaluated. In Fig.4, we show the beamforming gain of Algorithm 2 and theone by replacing the KS test in Algorithm 2 with the FE asshown in (3) for N t = N r = |I F | = |I W | = 64 , D = 8 ,and SR = 25% . Fig. 4 illustrates that, with a false alarm rateguarantee, the proposed KS test substantially improves thelearning capability of the KM.V. C ONCLUSIONS

In this paper, we proposed an enhanced KM learningalgorithm for beam alignment in mmWave MIMO channels. Based on DMO, one key step in learning the KM parameters,i.e., the BQP, was substantially accelerated. By considering theuncertainty brought by FE due to subjective threshold setting,the KS test was proposed to obtain the empirical probabilitiesof the training set, based on the detection-theoretic criterion.The simulation results demonstrate that the proposed KMlearning with DMO and KS shows better beam alignmentperformance with a substantially reduced computational com-plexity compared to the conventional KM algorithm.R

EFERENCES[1] R. W. Heath, N. Gonzlez-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed,“An overview of signal processing techniques for millimeter waveMIMO systems,”

IEEE Journal of Selected Topics in Signal Processing ,vol. 10, no. 3, pp. 436–453, 2016.[2] S. Hur, T. Kim, D. J. Love, J. V. Krogmeier, T. A. Thomas, andA. Ghosh, “Millimeter wave beamforming for wireless backhaul andaccess in small cell networks,”

IEEE Transactions on Communications ,vol. 61, no. 10, pp. 4391–4403, 2013.[3]

IEEE Std 802.15.3c-2009 . IEEE Standard, Oct 2009.[4]

IEEE Std 802.11ad-2012 . IEEE Standard, Dec 2012.[5] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channelestimation and hybrid precoding for millimeter wave cellular systems,”

IEEE Journal of Selected Topics in Signal Processing , vol. 8, no. 5, pp.831–846, 2014.[6] S. Sun and T. S. Rappaport, “Millimeter wave MIMO channel estimationbased on adaptive compressed sensing,” in , 2017, pp.47–53.[7] W. Zhang, T. Kim, D. J. Love, and E. Perrins, “Leveraging the restrictedisometry property: Improved low-rank subspace decomposition for hy-brid millimeter-wave systems,”

IEEE Transactions on Communications ,vol. 66, no. 11, pp. 5814–5827, 2018.[8] M. Kokshoorn, H. Chen, P. Wang, Y. Li, and B. Vucetic, “Millimeterwave MIMO channel estimation using overlapped beam patterns andrate adaptation,”

IEEE Transactions on Signal Processing , vol. 65, no. 3,pp. 601–616, 2017.[9] Y. Shabara, C. E. Koksal, and E. Ekici, “Beam discovery using linearblock codes for millimeter wave communication networks,”

IEEE/ACMTransactions on Networking , vol. 27, no. 4, pp. 1446–1459, 2019.[10] W. M. Chan, H. Ghauch, T. Kim, E. De Carvalho, and G. Fodor, “Kol-mogorov model for large millimeter-wave antenna arrays: Learning-based beam-alignment,” in , 2019, pp. 411–415.[11] H. Ghauch, M. Skoglund, H. Shokri-Ghadikolaei, C. Fischione, andA. H. Sayed, “Learning Kolmogorov models for binary random vari-ables,” in

ICML , 2018.[12] G. Yang, M. Xiao, and H. V. Poor, “Low-latency millimeter-wavecommunications: Trafﬁc dispersion or network densiﬁcation?”

IEEETransactions on Communications , vol. 66, no. 8, pp. 3526–3539, 2018.[13] H. Tuy, M. Minoux, and N. T. Hoai-Phuong, “Discrete monotonicoptimization with application to a discrete location problem,”

SIAMJournal on Optimization , vol. 17, no. 1, pp. 78–97, 2006.[14] T. Kim, D. J. Love, M. Skoglund, and Z. Jin, “An approach tosensor network throughput enhancement by PHY-aided MAC,”

IEEETransactions on Wireless Communications , vol. 14, no. 2, pp. 670–684,2015.[15] G. Zhang, X. Wang, Y. Liang, and J. Liu, “Fast and robust spectrumsensing via Kolmogorov-Smirnov test,”

IEEE Transactions on Commu-nications , vol. 58, no. 12, pp. 3410–3416, 2010.[16] A. C. Marcum, Joon Young Kim, D. J. Love, and J. V. Krogmeier,“Interference detection using time-frequency binary hypothesis testing,”in

MILCOM 2015 - 2015 IEEE Military Communications Conference ,2015, pp. 1485–1490.[17] M. Jaggi, “Revisiting Frank-Wolfe: Projection-free sparse convex op-timization,” in

Proceedings of the 30th International Conference onMachine Learning , vol. 28, no. 1, 2013, pp. 427–435.18] M. Kisialiou and Z. Luo, “Probabilistic analysis of semideﬁnite relax-ation for binary quadratic minimization,”

SIAM Journal on Optimiza-tion , vol. 20, no. 4, pp. 1906–1922, 2010.[19] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques forrecommender systems,”

Computer , vol. 42, no. 8, pp. 30–37, 2009.[20] Y. Koren, “Factorization meets the neighborhood: A multifaceted col-laborative ﬁltering model,” in

Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining ,2008, pp. 426–434.[21] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrixfactorization,” in

Advances in Neural Information Processing Systems13 . MIT Press, 2001, pp. 556–562.[22] C. J. Stark, “Expressive recommender systems through normalizednonnegative models,” in

Proceedings of the Thirtieth AAAI Conferenceon Artiﬁcial Intelligence , 2016, pp. 1081–1087.[23] J. Millard and L. Kurz, “The Kolmogorov-Smirnov tests in signaldetection (corresp.),”

IEEE Transactions on Information Theory , vol. 13,no. 2, pp. 341–342, 1967.[24] A. Papoulis and S. U. Pillai,