A General Method for Calibrating Stochastic Radio Channel Models with Kernels
11 Calibration of Stochastic Radio ChannelModels with Kernels
Ayush Bharti, Franc¸ois-Xavier Briol, Troels Pedersen
Abstract —Calibrating stochastic radio channel models to newmeasurement data is challenging when the likelihood functionis intractable. The standard approach to this problem involvessophisticated algorithms for extraction and clustering of multi-path components, following which, point estimates of the modelparameters can be obtained using specialized estimators. Wepropose a likelihood-free calibration method using approximateBayesian computation. The method is based on the maximummean discrepancy, which is a notion of distance between prob-ability distributions. Our method not only by-passes the needto implement any high-resolution or clustering algorithm, butis also automatic in that it does not require any additionalinput or manual pre-processing from the user. It also has theadvantage of returning an entire posterior distribution on thevalue of the parameters, rather than a simple point estimate. Weevaluate the performance of the proposed method by fitting twodifferent stochastic channel models, namely the Saleh-Valenzuelamodel and the propagation graph model, to both simulated andmeasured data. The proposed method is able to estimate theparameters of both the models accurately in simulations, as wellas when applied to 60 GHz indoor measurement data.
Index Terms —radio channel modeling, machine learning, ap-proximate Bayesian computation, kernel methods, maximummean discrepancy, likelihood-free inference, calibration
I. I
NTRODUCTION
Stochastic channel models are used to simulate the behaviorof the radio channel in order to test the performance of com-munication and localization systems. Often models are flexibleenough to be applied to different scenarios, provided that theirparameters can be adjusted accordingly. Adjustment of themodel parameters based on data collected from measurementcampaigns is called calibration (or inference). Calibration isusually challenging since most state-of-the-art stochastic radiochannel models have intractable likelihood functions. This ren-ders usual inference techniques such as maximum likelihoodestimation or standard Bayesian inference inapplicable.Instead of solving the whole calibration problem at once,it is wide-spread practice (e.g. [1]–[9]) to split the task intointermediate steps as outlined in Fig. 1(a). The first stepinvolves resolving the multipath components, i.e. estimating
Ayush Bharti and Troels Pedersen are with the Department of ElectronicSystems, Aalborg University, 9220 Aalborg East, Denmark (e-mail: { ayb,troels } @es.aau.dk).Franc¸ois-Xavier Briol is affiliated with the Department of StatisticalScience at University College London and the Data-Centric EngineeringProgramme at The Alan Turing Institute, London, United Kingdom (email:[email protected]).This work is supported by the Danish Council for Independent Research,grant no. DFF 7017-00265. Franc¸ois-Xavier Briol was supported by theLloyds Register Foundation Programme on Data-Centric Engineering at TheAlan Turing Institute under the EPSRC grant [EP/N510129/1]. path parameters including delays and complex gains. Thistask can be carried out using high-resolution algorithms suchas MUSIC, SAGE, and RiMAX, among others, see e.g [10,Ch. 5] for an overview. The second step is clustering of theextracted multipath components in the case of cluster-basedmodels. Clustering is either performed manually, as in [2], orusing automated algorithms such as [11]–[13]. In a final step,the model parameters are estimated from the extracted andclustered multipath components.Despite being widely applied, the multi-step approach suf-fers from a range of issues, owing to the composite nature ofthe methodology. In particular, high-resolution and clusteringmethods, although very useful in analyzing and understandingthe radio channel, are problematic when it comes to modelcalibration. These methods require implementation of sophis-ticated and specialized algorithms at each step, which involvesa number of heuristic choices and settings which might beconflicting. An emblematic example is the assumption of“well separated” paths while extracting multipath components.The high-resolution methods are prone to estimation artifacts,especially if paths are not “well separated”. However, thisconflicts with the inherent assumption in the clustering stepthat multipaths arrive “close” to each other. Consequently, eventhough the performance of high-resolution and clustering al-gorithms are thoroughly investigated in isolation, the accuracyof the applied multi-step calibration techniques is unknown.Moreover, the calibration technique needs to be tailored tothe particular model at hand. While attempting to calibrateand compare different ultra-wideband models using a largedatabase, Greenstein et al. in [14] noted that “the problemin doing so is that there is no simple, clear and establishedmethod for extracting cluster model parameters from measureddata” . As a result, they were unable to fit the cluster modelto their calibration data.Calibration methods that by-pass the need to resolve themultipath components have been recently proposed. Theyhave been used to calibrate the Turin model [1], the Saleh-Valenzuela (S-V) model [2] and the polarized propagationgraph (PG) model [15]. These calibration methods rely eitheron a Monte Carlo approximation of the likelihood [16],[17], the method of moments [18], [19], or a summary-based likelihood-free inference framework [20]–[22] such asapproximate Bayesian computation (ABC). First developed inthe field of population genetics in 1997, ABC has since be-come a popular method for calibrating models with intractablelikelihoods in various fields, see [23] for an overview. Themain drawback of the calibration methods [17]–[19] is theirreliance on equations that explicitly link the moments of the a r X i v : . [ ee ss . SP ] D ec Measurement
Campaign
Multipath extraction
Parameter estimation
Data Point
Estimate y ( t ) t t y ( t ) α τ α τ MUSIC,SAGE,ESPRIT... ^ Maximum likelihood,Least-squares fitting... θ ClusteringMultipath
Parameters α τ α τ K-power means, Manual... (a)
Measurement
Campaign
Compute log temporal
Moments
Parameter estimation
Data Y Log Temporal
Moments
Z Posterior
Estimate y ( t ) tt y ( t ) z z Kernel-based
ABC
Temporal moments ^ (b) p(θ|Z) Fig. 1. Methodologies for calibration of stochastic radio channel models: (a) State-of-the-art methodology based on multipath extraction and clustering; (b)proposed method based on generic summaries (here exemplified by log-temporal moments) and approximate Bayesian computation (ABC). summaries with the model parameters, or in case of [16],on the model-specific point process. These methods shouldtherefore be re-derived for each new model. We encounterthis to be a non-trivial task, and it may not even be possiblefor the more elaborate channel models. Similar problemsexist in [20]–[22] where a low-dimensional vector of statisticsshould be redesigned for the channel model at hand, whichis not always trivial and may not generalize to other models.Moreover, summarizing the data leads to information loss thatcan hamper the accuracy of the parameter estimates.The aim of the present contribution is to propose a genericmethod which can be applied to stochastic channel modelsof very different mathematical structure. This will be donewithout the need for specializing summaries, or extractionand clustering of multipaths. To achieve this, we follow theproposed calibration methodology depicted in Fig. 1(b). First,we map the channel measurements into easily computablelog temporal moments. These moments are then used forcalibration in an ABC framework, where we use the maximummean discrepancy (MMD) [24] to compare the distribution ofsimulated and measured data. The MMD has previously beenused for frequentist inference in [25], [26], and in a Bayesiansense in [27]. Specific ABC methods using kernels include[28]–[31], and the MMD has also been used to train generativeadversarial networks in [32]–[34]. These papers have shownMMD to be a powerful way to represent either data-sets ordistributions, and as a result calibrate complex models. Theyhave acted as inspiration for our work, but our algorithm spe- cializes the approach to the problem of calibration of stochasticchannel models. Our calibration method is automatic sinceit can be applied to different models without the need forfurther pre- or post- processing. Additionally, the method isable to account for model misspecification, which occurs whenthe model is not able to represent the data for any parametersetting.The rest of the paper is organized as follows. Section IIpresents the model calibration problem. Section III gives anoverview of the MMD, and Section IV describes the proposedkernel-based ABC method. We demonstrate the method’sgenerality by calibrating the seminal S-V model, which is aclustered multipath model, and the propagation graph model,which is based on a different principle, using exactly the samedata and procedure. Indeed, no other method able to do this isavailable in the open literature. In Section V, the performanceis evaluated on simulated data and in Section VI on datafrom a 60 GHz indoor measurement campaign. Discussionand concluding remarks are given in Sections VII and VIII,respectively.II. S
TOCHASTIC C HANNEL M ODEL C ALIBRATION
Consider the transfer function measurement of a linear,time-invariant radio channel in a single-input, single-output(SISO) setup using a vector network analyzer (VNA). Thetransfer function is measured at N s equidistant frequencypoints in the bandwidth B , resulting in a frequency separation of ∆ f = B/ ( N s − . The measured complex signal at the n th frequency point, Y n , is modeled as Y n = H n + W n , n = 0 , , · · · , N s − , (1)where H n is the transfer function sampled at the n th frequencyand W n is the complex measurement noise. The additive noisesamples are assumed independent and identically distributed(iid) at each frequency point, and are usually modeled as zero-mean circular symmetric complex Gaussian variables withvariance σ W . The time-domain signal, y ( t ) , is obtained bytaking the discrete-frequency, continuous-time inverse Fouriertransform of Y n as y ( t ) = 1 N s N s − (cid:88) n =0 Y n exp( j πn ∆ f t ) , (2)periodic with a period of t max = 1 / ∆ f . Multiple realizationsof the channel can be obtained by repeating the measurements N obs times, yielding an N obs × N s complex data matrix Y . Thedata can be thought of as iid realizations from some unknowndistribution, Y , which is the true state of nature.A stochastic model can be seen as a parametric familyof distributions { P θ } with a p -dimensional parameter vector θ defined on some Euclidean space . In the case of gen-erative models such as the stochastic channel models, it isstraightforward to simulate realizations of Y from the model,even though the distribution P θ is unknown. Calibration thenamounts to finding the θ for which the model output fitsthe observed data Y well, or in other words, to find the θ such that P θ is “closest” to Y . Standard calibration techniquesinvolve the likelihood function of the model given Y . Foriid realizations, the likelihood function, denoted as p ( Y | θ ) , isthe product of the probability density or mass function of P θ evaluated at each of the data points in Y .For most stochastic radio channel models, p ( Y | θ ) is ei-ther intractable or cannot be approximated within reasonablecomputation time. Intractability here refers to the inability tonumerically evaluate the likelihood function for a given valueof θ . For intractable likelihood, the posterior, p ( θ | Y ) , alsobecomes intractable as it is proportional to p ( Y | θ ) p ( θ ) , where p ( θ ) is the prior assumed on the parameters. An intractablelikelihood prevents maximum likelihood estimation of θ aswell as Bayesian inference via sampling of the posterior. Thisis the case for stochastic multipath models, such as the Turinand the S-V model, which were constructed with the ease ofsimulation in mind.Since stochastic channel models are easy to simulate fromgiven an arbitrary θ value, likelihood-free inference is possibleby comparing simulated data-sets to the observed data. There-fore, we need a method to compute distances between the data-sets which is challenging as the data-sets are high-dimensional,and may have possibly different sizes. We tackle this problemusing distance metrics based on kernels, in particular themaximum mean discrepancy (MMD). The restriction to parameters in R p is only needed in the adjustmentmethod described in Section IV-B. The remaining part of the method canbe used for more general parameter spaces, e.g. discrete, complex or subsetsof R p . In this case, the adjustment algorithm should be modified to eitheraccommodate or ignore such parameters. III. T HE M AXIMUM M EAN D ISCREPANCY BETWEEN P ROBABILITY D ISTRIBUTIONS
We now introduce the MMD which is a notion of distancebetween arbitrary probability distributions P and Q or data-sets. We aim to use MMD as a similarity measure withinan ABC framework to compare simulated and observed data-sets. Note that we can identify any data-set { x , . . . , x n } to an empirical distribution n (cid:80) ni =1 δ x i where δ x i denotes adistribution with mass one at x i and otherwise. We restrictour discussion to distributions defined on R d . This section willprovide further details on constructing the MMD [24], [35]. A. Kernels and The Maximum Mean Discrepancy (MMD)
The MMD consists of first mapping the distributions toa function space H k , then using the distance in that spaceto compare the mapped distributions. See Fig. 2 for anillustration. The mapping enables the use of distance definedon H k .The spaces of functions to which we will map distribu-tions are called reproducing kernel Hilbert space (RKHS).We denote the RKHS with H k , and (cid:104)· , ·(cid:105) H k and (cid:107) · (cid:107) H k for its inner product and norm, respectively. Associated toeach RKHS, there exists a symmetric and positive definitefunction k : R d × R d → R called a reproducing kernel [36].This function satisfies two properties: (i) for all f ∈ H k , f ( x ) = (cid:104) f, k ( x , · ) (cid:105) H k (called the reproducing property), and(ii) k ( x , · ) ∈ H k for all x ∈ R d .It is straightforward to map probability distribution P to H k through what is called a kernel mean embedding defined as µ P ( · ) = E X ∼ P [ k ( X, · )] = (cid:90) R d k ( x , · ) P ( d x ) , (3)under mild regularity conditions satisfied for all kernels inthis paper, see [24, Lemma 3]. Note that µ P ∈ H k . In thecase where the probability distribution P has a probabilitydensity function p , the integral in (3) can be written in themore wide-spread form (cid:82) R d k ( x , · ) p ( x ) d x . Alternatively, when P is an empirical distribution corresponding to a data-set, thenthe kernel mean embedding is given by N X (cid:80) N X i =1 k ( x i , x ) .The MMD between probability distributions P and Q em-bedded in H k is defined as the supremum taken over the meanof all functions in the unit ball in an RKHS, i.e. [35] MMD k [ P , Q ] = sup (cid:107) f (cid:107) H k ≤ | E X ∼ P [ f ( X )] − E X ∼ Q [ f ( X )] | . (4)As the name suggests, the MMD is the maximum distancebetween means of (unit norm) functions computed with respectto the distributions P and Q . As shown in [24], the MMD in(4) can equivalently be expressed as MMD k [ P , Q ] = (cid:107) E X ∼ P [ k ( X, · )] − E Y ∼ Q [ k ( Y, · )] (cid:107) H k (5) = (cid:107) µ P − µ Q (cid:107) H k This gives an alternative interpretation of the MMD as thedistance between mean embeddings in H k as Fig. 2 illustrates. x D e n s i t y Function Space (RKHS) p q m m P Q Fig. 2. Given a kernel k , the distributions P and Q are mapped to their kernelmean embeddings µ P and µ Q using Equation 3. The MMD is obtained bycomputing the distance between µ P and µ Q in the function space H k , asexpressed in Equation 5. This figure is inspired by [35]. A third expression for the MMD appears upon expandingthe squared norm in (5) and using the reproducing propertyof k which yields an expression in terms of k as MMD k [ P , Q ] = E X,Y ∼ P [ k ( X, Y )] − E X ∼ P ,Y ∼ Q [ k ( X, Y )] + E X,Y ∼ Q [ k ( X, Y )] . (6)The latter expression is computationally more appealing thanthe two former as it only calls for computation of expectationsof the kernel. Thus, computation of the supremum in (4) isnot required to compute the MMD. As discussed later inSection III-C, the expression (6) forms the basis for estimationof the MMD from data. B. Selecting a Kernel
The choice of kernel defines the RKHS and thus the proper-ties of its distance, the MMD. In addition to being reproducing,it is a great advantage if the kernel is characteristic [37],[38]. This implies that the kernel mean embedding is aninjective mapping, meaning that each distribution is mapped toa unique function. Thus, in the case of characteristic kernels,the kernel mean embedding captures all the information aboutthe distribution. As a result,
MMD k [ P , Q ] = (cid:107) µ P − µ Q (cid:107) H k = 0 if and only if P = Q . In this case, the MMD is capableof comparing infinitely many moments of two probabilitydistributions without ever having to compute these momentsexplicitly. Consequently, the MMD is able to distinguish prob-ability distributions even when these coincide in finite numberof moments. This gives a great advantage over methods basedon comparison of finitely many moments which are potentiallyblind to differences between distributions.A very popular characteristic reproducing kernel is thesquared-exponential (or Gaussian) kernel, defined as k SE ( x , x (cid:48) ) = exp (cid:18) − (cid:107) x − x (cid:48) (cid:107) l (cid:19) , (7)for x , x (cid:48) ∈ R d . Here, (cid:107)·(cid:107) is the Euclidean norm and l > is aparameter called the lengthscale of the kernel. The norm insidethe exponent can be chosen based on the specific data andapplication. For additional examples of characteristic kernels,see [37], [38].We now give a simple example comparing Gaussian distri-butions, in which case the MMD can be derived analytically. MM D =0, =0 = 1, l = 0.1 = 1, l = 1 = 2, l = 0.1 = 2, l = 1 -5 0 5 MM D =0, = 1 Fig. 3. MMD between a N ( µ , σ ) distribution with µ = 0 and σ = 1 and a N ( µ , σ ) varying parameters µ and σ . Plots are shown for twodifferent values of the lengthscale l . Example:
Let P = N ( µ , σ ) and Q = N ( µ , σ ) betwo Gaussian distributions on R . For the squared-exponentialkernel in (7), the MMD takes the form (see [39, Appendix C]): MMD k SE [ P , Q ] = ll + 2 √ σ + ll + 2 √ σ − ll + √ σ + √ σ exp (cid:18) − ( µ − µ ) l + 2 σ + 2 σ (cid:19) . (8)It is apparent from (8) that the MMD is zero if and only if µ = µ and σ = σ (as guaranteed by using a characteristickernel). Fig. 3 illustrates how the MMD increases as theparameters of these distributions increasingly differ. Varyingthe lengthscale, l , of the kernel scales the overall MMD curve,but does not affect the point at which the MMD is minimised.The overall behaviour of the curves do not vary significantlyon changing the lengthscale by an order of the magnitude. C. Maximum Mean Discrepancy Between Data-sets
Unlike in the previous example, it is, in most realisticcases, not feasible to analytically calculate (6). Moreover,numerical integration is problematic, as the dimension of X and Y may be large and P or Q unavailable. Fortunately, itis straightforward to estimate the MMD if it is an empiricaldistribution, such as in the case of data-sets.Imagine that we do not have access to P and Q , but thatwe instead have two data-sets consisting of realisations fromthese distributions. More precisely, suppose we have access to X = { x , . . . , x N X } iid ∼ P and Y = { y , . . . , y N Y } iid ∼ Q .Then, an unbiased empirical estimate of MMD k [ P , Q ] can beobtained as [24] (cid:92) MMD k [ X , Y ] = (cid:80) i (cid:54) = i (cid:48) k ( x i , x i (cid:48) ) N X ( N X − − (cid:80) N Y j =1 (cid:80) N X i =1 k ( x i , y j ) N Y N X + (cid:80) j (cid:54) = j (cid:48) k ( y j , y j (cid:48) ) N Y ( N Y − . (9)Note that N X and N Y are permitted to differ, i.e. the twodata-sets are not limited to be of the same size. To use thisestimator with the kernel in (7), the lengthscale should bespecified. Following [24], the lengthscale can be set based onthe data-set X using the median heuristic l = (cid:112) med / , (10) where med denotes the median of the set of squared two-normdistances (cid:107) x i − x j (cid:107) for all pairs of distinct data points in X .This setting of l scales the kernel with the spread of the data,and is robust to outliers.Concentration bounds for MMD, such as [25, Lemma 1] or[35, Theorem 3.4], imply that with high probability, (cid:12)(cid:12)(cid:12) (cid:92) MMD k [ X , Y ] − MMD k [ P , Q ] (cid:12)(cid:12)(cid:12) ≤ C (cid:18) N X + 1 N Y (cid:19) , (11)for some C > . This tells us that the accuracy of the estimateconverges linearly in both N X and N Y . The computationalcost of computing this estimate is O ( N X + N Y ) due to theneed to compute double sums in both N X and N Y . In orderto best balance computational cost and accuracy, N X and N Y should be chosen to be commensurate. These two results onaccuracy and computational cost can be used to determinehow to make default choices for the parameters of our ABCalgorithm. D. Kernels for Radio Channel Measurements
In order to use the MMD for calibrating stochastic radiochannel models, we need a kernel defined on the space oftransfer function measurements: k Y : Y × Y → R . Givensuch a kernel, we could then estimate the MMD between ameasured data-set Y and a data-set Y sim simulated from themodel.A significant challenge with this approach is that, in thecontext of stochastic radio channel models, Y is usually ahigh-dimensional space. This is especially the case for largebandwidth measurements where N s can be in the order ofthousands. Such high-dimensional problems are challengingfor kernel methods based on default kernels such as thesquared-exponential kernel [40]. These kernels indeed sufferfrom the curse-of-dimensionality, a phenomenon implying thatthe distance between points increases exponentially with thedimension of the space.To tackle this issue, there exist kernels specialised to certaintime-series or functional data models in the literature [41]–[46]. These use specific properties of the type of data inorder to avoid the curse-of-dimensionality. In this paper, wecontribute to this literature and construct a kernel specificallytailored to transfer function measurements. We base the kernelon the temporal moments of y ( t ) , defined as m ( i ) = (cid:90) t max t i | y ( t ) | dt , i = 0 , , , . . . , I. (12)The integral in (12) is easy to compute numerically. Thetemporal moments can be seen as an expansion of | y ( t ) | into the basis of monomials. Since the monomials form acomplete basis for finite energy time-limited signals [47], noinformation is lost compared to | y ( t ) | if I → ∞ . Referringto [48], [49], the first few moments are well modeled by alog-normal distribution. Thus, taking the entry-wise logarithm z ( i ) = ln m (0) brings the moments to the same scale andgives an approximately Gaussian vector z = [ z (0) , . . . , z ( I − ] .Multiple channel realizations yield Z = ( z , z , . . . , z N obs ) . Define the mapping A I : Y → R I from Y to the I -dimensional space of log temporal moments. We propose toconstruct a kernel k Y for transfer function data as k Y ( y , y (cid:48) ) := k SE ( A I ( y ) , A I ( y (cid:48) )) , for all y , y (cid:48) ∈ Y , (13)where k SE denotes the squared-exponential kernel in dimen-sion I . We note that this is the composition of a reproducingkernel and a map, and thus according to [50, Lemma 4.3]is a reproducing kernel on Y . We also note that the MMDwith kernel k Y computed on the original data can be obtainedthrough the MMD with kernel k SE on the log temporal mo-ments. Similarly, the empirical estimators of these quantitiesare also identical, i.e. (cid:92) MMD k Y [ Y , Y sim ] = (cid:92) MMD k SE [ Z , X ] , (14)where X is the simulated log temporal moments data-set.In practice, we will have to limit ourselves to a finite I forcomputational reasons. This, however, is not a problem sincewe can expect the signal energy to be concentrated on thelowest moments. In fact, taking I to be small also allows usto by-pass issues with the curse-of-dimensionality.From a theoretical viewpoint, since the squared-exponentialkernel is characteristic, we should be able to recover anydistribution on the space of log temporal moments. However,since the mapping A I leads to loss of information when I isfinite, k Y will not be characteristic on Y , and we may notbe able to uniquely identify the distribution on | y ( t ) | . Thishowever is not an issue for the considered channel models, aswill be shown in Section V.IV. P ROPOSED K ERNEL - BASED A PPROXIMATE B AYESIAN C OMPUTATION M ETHOD
ABC methods rely on simulation from the model to ap-proximate the posterior, and can be used to estimate θ suchthat the model fits to the observed data Y . Let ρ ( · , · ) besome notion of distance between data-sets. The basic formof ABC, called rejection ABC, proceeds by sampling M parameter values from p ( θ ) and generating the correspondingsimulated data Y sim from the model. The values of θ forwhich ρ ( Y , Y sim ) is less than some pre-defined threshold (cid:15) ,form a sample from the approximate posterior distribution, ˜ p ( θ | Y ) = p ( θ | ρ ( Y , Y sim ) < (cid:15) ) . The tolerance thresholdimpacts the degree of approximation in ABC methods. Setting (cid:15) = 0 would lead to exact Bayesian inference, however,achieving equality for continuous-valued data is not possible.Hence, (cid:15) should be small but non-zero in order to be compu-tationally feasible.We now propose an ABC method based on the MMDas the distance metric to calibrate stochastic radio channelmodels. Our method combines the Population Monte Carlo(PMC) ABC method [51] with the local-linear regressionadjustment [52] to improve the posterior approximation. Thecomplete algorithm is depicted in Fig. 4 and outlined in Alg. 2.Individual steps of this algorithm will be highlighted in thenext few subsections. Prior, p ( θ ) Model, P θ Rejectionbased onMMD Data Z Computemeans andcovariances RegressionAdjustmentGenerateNewPopulation
Θ ( X , Θ) ( X ∗ , Θ ∗ ) ( S ∗ , Θ ∗ ) s obs ˜ΘΘ Fig. 4. Diagram depicting steps in the proposed kernel-based ABC algorithm with regression adjustment. Here the term “Data” can be either obtained fromphysical measurements or as in Sec. V by simulation.
A. Rejection based on MMD
The proposed ABC method uses the MMD between data-sets as a rejection criteria. Instead of setting the threshold (cid:15) interms of the distance, we specify the proportion of acceptedsamples, i.e. (cid:15) = M (cid:15) /M where M (cid:15) is the number of parametersamples accepted out of M . This is particularly convenient asit avoids the need to manually find a threshold, which maylead to unknown run-time of the algorithm.The method computes (cid:92) MMD k SE [ X , Z ] , where X =( x , . . . , x N sim ) is the simulated log temporal moments data-set, as this is identical to estimating the MMD between Y and Y sim (see Eq. 14). First, M independent parameter samples Θ = ( θ , . . . , θ M ) are drawn from the prior p ( θ ) . For each θ i ,the log temporal moments data-set, X i ∼ P θ i , is simulated.The simulated data-sets are gathered in X = ( X , . . . , X M ) .The (cid:92) MMD k SE [ X i , Z ] is computed for each i using (9), settingthe lengthscale of k SE as per (10). The parameter samplesresulting in the M (cid:15) smallest MMD values are then accepted.In principle, the MMD could be computed between thesamples of the temporal moments instead of their logarithm.However, the magnitudes of the different temporal momentsmay vary strongly and using a single lengthscale may lead topoor performance. Using a log transformation helps mitigatethis issue. Alternatively, the lengthscale should be defined foreach dimension of θ . B. Local-linear Regression Adjustment
As proposed in [52], it is possible improve the posteriorapproximation by adjusting the accepted samples using amodel of the relationship between a low-dimensional vectorof statistics and the parameter vector. Let s be a vectorof summary statistics of X such that s = S ( X ) for afunction S ( · ) . Similarly, the observed summary statistics aredenoted s obs = S ( Z ) . We begin by fitting a function, g ,between the accepted parameters Θ ∗ = ( θ ∗ , . . . , θ ∗ M (cid:15) ) andthe corresponding statistics S ∗ = ( s , . . . , s M (cid:15) ) as [23, Ch. 3] θ i = g ( s i ) + ε, i = 1 , . . . , M (cid:15) , (15)where g ( s ) is the conditional expectation of θ given s , and ε is the residual. Here, θ should belong to a subset of R p .Considering that the log of the temporal moments are well Algorithm 1
ABC with MMD and Regression Adjustment
Input : Parameter values Θ , corresponding simulated data X , ob-served Z & number of accepted samples M (cid:15) .Compute (cid:92) MMD k SE ( X i , Z ) for all data-sets X i ∈ X using 9.Accept the M (cid:15) parameters with the smallest MMD distance anddenote these Θ ∗ = ( θ ∗ , . . . , θ ∗ M (cid:15) ) .Compute S ∗ and s obs = S ( Z ) , then solve the optimisationproblem in (17) with Θ ∗ , S ∗ , and s obs to get ˆ β .Adjust Θ ∗ using (16) to obtain ˜Θ . Output : Adjusted samples ˜Θ from the Rejection-ABC posterior. modeled by a Gaussian distribution, we take s obs to be thevector consisting of the sample means and sample covariancesof the elements of z , similar to [22]. In total, s obs consists of ( I + 3 I ) / elements for I temporal moments. The statistics s is computed in the same manner for X . Note that s and s obs arenormalized by an estimate of their median absolute deviationto account for the difference in magnitude of the statistics. Incase the prior distributions are bounded, a logit transformationis applied to the parameters before the adjustment.For simplicity reasons, we assume g to be linear as in [52]and adjust the accepted parameters as ˜ θ i = θ ∗ i − ( s i − s obs ) (cid:62) ˆ β , i = 1 , . . . , M (cid:15) , (16)where ˆ β is the solution to the weighted least-squares problem arg min α , β M (cid:15) (cid:88) i =1 (cid:104) θ ∗ i − α − ( s i − s obs ) (cid:62) β (cid:105) W (cid:16) (cid:92) MMD k SE [ X i , Z ] (cid:17) . (17)The weighting function W applies weights to each θ i basedon the estimated MMD value. This guarantees that parameterswhich yield simulated log moments “closer to” Z are weightedmore heavily. We take W to be the Epanechnikov function, W ( δ ) = 1 − ( δ/δ max ) for | δ | ≤ δ max and zero otherwise, asproposed in [52]. Here, δ max is the maximum estimated MMDassociated to the accepted parameters. Note that choosing aconstant regression function, i.e. β = 0 , and assigning equalweights to all θ i ’s results in the basic rejection ABC algo-rithm. The regression adjustment therefore gives the adjustedparameter values ˜Θ = (˜ θ , . . . , ˜ θ M (cid:15) ) . s obs regression modeloriginaladjusted s max s min s q q q minmax Fig. 5. Local linear regression adjustment of parameter θ inspired from [53].First, the regression model is fitted based on accepted parameter and statisticvalues. Then, the parameters are adjusted based on the fitted model, whichcan move them outside the prior range if ( s i − s obs ) is large. C. Population Monte Carlo ABC
As a means to explore the posterior distribution over theparameter space efficiently, we employ a sequential MonteCarlo technique called PMC ABC [20], [22], [51]. Thepool of parameter samples in the initial iteration, ˜Θ (1) =(˜ θ (1)1 , . . . , ˜ θ (1) M (cid:15) ) , are assigned equal weights representing theprobability of drawing that particular sample for the nextiteration.The next population occurs by drawing M values from ˜Θ (1) and perturbing these according to a probability distribution,called proposal. For simplicity reason, we perturb indepen-dently in each dimension using a Gaussian distribution, andreject values outside the prior range. Thus, the proposal reads ϕ ( θ ; ˜ θ , Σ ) = ( θ ∈ R ) e − ( θ − ˜ θ ) (cid:62) Σ − ( θ − ˜ θ ) (18)where is an indicator function, R ⊂ R p is the priorrange, and Σ is a diagonal matrix with variances σ j > corresponding to parameter θ j along the diagonal. We set thediagonal elements of Σ to twice the empirical variance of theadjusted parameter samples. This is denoted as Σ = 2 (cid:100) Var( ˜Θ) .The population of M parameter samples at iteration t , Θ ( t ) , is then used to simulate X ( t ) from the model forMMD computation and regression adjustment. In subsequentiterations, weights are assigned as w ( t ) j ∝ p (cid:16) θ ( t ) j (cid:17) / M (cid:15) (cid:88) i =1 w ( t − i ϕ (cid:16) θ ( t ) j ; ˜ θ ( t − i , Σ ( t − (cid:17) , (19) j = 1 , . . . , M (cid:15) . The adjusted parameter values after iteration T are taken as samples from the approximate posterior distri-bution. Point estimates of θ , such as the approximate posteriormean, ˆ θ ( T ) = 1 M (cid:15) M (cid:15) (cid:88) i =1 ˜ θ ( T ) i , (20)are straightforward to compute from the samples. D. Handling Model Misspecification
The framework of ABC relies on the implicit assumptionthat there exist parameter values in the prior support that
Algorithm 2
PMC-ABC with MMD
Input : Prior p ( θ ) , model P θ , observed data Z , M (cid:15) , M and T .Initialize t = 1 , draw Θ (1) iid ∼ p ( θ ) and simulate X (1) using theparameters in Θ (1) .Apply Algorithm {X (1) , Θ (1) } to obtain ˜Θ (1) .Set w (1) j = 1 for j = 1 , . . . , M (cid:15) , and set Σ (1) = 2 (cid:100) Var (cid:0) ˜Θ (1) (cid:1) . for t = 2 , . . . , T do Compute q j = w ( t − j / (cid:80) M (cid:15) i =1 w ( t − i for j = 1 , . . . , M (cid:15) . for i = 1 , . . . , M do Sample θ ∗ i from ˜Θ ( t − s.t. ˜ θ ( t − j is selected with prob q j .Generate θ ( t ) i ∼ ϕ (cid:0) · ; θ ∗ i , Σ ( t − (cid:1) .Simulate X ( t ) i from the model with parameter θ ( t ) i . end for Apply
Algorithm {X ( t ) , Θ ( t ) } to obtain ˜Θ ( t ) .Set w ( t ) j using (19) for j = 1 , . . . , M (cid:15) .Set Σ ( t ) = 2 (cid:100) Var (cid:0) ˜Θ ( t ) (cid:1) . end forOutput : Samples (cid:0) ˜ θ ( T )1 , . . . , ˜ θ ( T ) M (cid:15) (cid:1) from the PMC-ABC posterior. yield simulated data “close” to the measured data. However,this assumption may not always hold if the model parameterscannot be set in any way to reproduce the data well. In thiscase, we say that the model is misspecified for the data.Misspecification can be detected and accounted for in thealgorithm as explained in the following.Consider a univariate parameter θ in the range [ θ min , θ max ] resulting in a univariate statistic s in [ s min , s max ] simulatedfrom the model. If the observed statistic s obs / ∈ [ s min , s max ] ,then the model is likely to be misspecified. This is a challengesince under model misspecification, the local-linear regressionadjustment has been shown to concentrate posterior mass ona completely different value than the rejection ABC [54].In fact for parameters with bounded support, the regressionadjustment moves the parameter samples outside the priorrange as illustrated in Fig. 5. Hence, if s obs lies outside therange of statistics that the model can simulate, then there isno guarantee that the adjusted samples of θ will lie inside theprior range.We check for model misspecification by observing whethereach element of s obs lies within the range of correspondingstatistics simulated from the model using Θ (1) . If any elementof s obs lies outside the range of values simulated from themodel, then the model is deemed misspecified. In such a case,we replace s obs by an alternative term, ˘ s obs , computed fromthe model instead of the data using the parameter ˘ θ = arg max θ f ( θ ; Θ ∗ ) , (21)where f ( θ ; Θ ∗ ) is the kernel density estimate computed fromthe samples Θ ∗ , and ˘ θ is the parameter corresponding to themode of f ( θ ; Θ ∗ ) . Another choice for ˘ θ could be the posteriormean of rejection ABC [54]. However, we found the meanestimate to be unstable, especially in the initial iterations ofthe algorithm. Hence, in case of model misspecification, weset s obs = ˘ s obs in each iteration of the PMC ABC algorithm,thus ensuring that the adjustment does not lead to parametersamples outside the prior range. . . . . . N sim MM D Fig. 6. The estimated MMD between X true ( N obs = 1000 ) and X (cid:48) as afunction of N sim with standard deviation error bars computed by repeatingthe experiment 100 times for each value of N sim . X (cid:48) is generated from θ (cid:48) = [2 × − , × , , × − , − , × − ] (cid:62) and X true from θ true = [5 × − , × , , − , × − , − ] (cid:62) . The dashedgreen line corresponds to the estimated MMD for N obs = N sim = 10 , . V. S
IMULATION E XPERIMENTS
We test the performance of the proposed calibration methodon two different channel models, namely the Saleh-Valenzuela(S-V) and the propagation graph (PG) model. We chose mod-els which differ significantly in their mathematical structureto highlight the generality of our approach. We first studyin depth the advantages and drawbacks of our algorithm onsimulated data. Then, in Section VI, we calibrate these modelsto data from an indoor measurement campaign [55].For ease of comparison, we use the same measurementsettings as in [55] for both simulations and measurements,i.e. B = 4 GHz, N s = 801 , and t max = 200 ns. We map thechannel measurements to the first I = 4 temporal moments.In each iteration of the ABC algorithm, M = 2000 parametersamples are generated, out of which M (cid:15) = 100 are acceptedto estimate the posterior distribution. A. Application to the Saleh-Valenzuela model
The seminal S-V model [2] is widely used as it is easy tosimulate from, but is notoriously difficult to calibrate due toits structure. Even though the model can be analyzed usingthe theory of spatial point processes [56], [57] and momentsderived [58], its likelihood function is unavailable. Recentdiscussions of the physical interpretation of the S-V model,also outlining some difficulties with the model calibration, isgiven in [59]–[61]. These difficulties have inspired the use ofmany different heuristic calibration methods, as outlined in theintroduction.In the S-V model, the multipath components are assumedto arrive in clusters. The arrival time of the clusters and thatof the rays within the clusters are modeled as one-dimensionalhomogeneous Poisson point processes with arrival rates Λ and λ , respectively. The gains of the multipath components aremodeled as iid zero-mean complex Gaussian random variableswith conditional variance that depends on three parameters; theaverage power of the first arriving multipath component Q , andthe cluster and ray power decay constants Γ , γ , respectively. . . . . Q MM D . . . . L MM D . . . . l MM D . . . . G MM D . . . . . g MM D . . . . s W2 MM D Fig. 7. Estimated MMD values plotted against parameters of the S-V model.The parameters are uniformly sampled 200 times from the prior range one ata time, keeping the others fixed to the true values denoted by the green lines.See Tab. I for the prior ranges. We refer the readers to [2] and [56] for a detailed description ofthe model. Including the noise variance, the parameter vectorbecomes θ = [ Q, Λ , λ, Γ , γ, σ W ] (cid:62) .We begin by finding a reasonable value of N sim . To thatend, we generate pseudo-observed log moments, X true , with N obs = 1000 realizations from the model by setting θ toa “true” value. Using another value of the parameter vector,say θ (cid:48) , we simulate X (cid:48) from the model with varying N sim and compute the estimated MMD between X (cid:48) and X true . Thisprocess is repeated times to create error bars as shownin Fig. 6. Although the MMD estimate gets more accurate as N sim increases, the improvement however is small. Choosinga higher N sim improves the MMD estimate, but increasesthe run-time of the of the algorithm significantly (since thecomputational cost is quadratic in N sim , and simulating fromthe model can also be slow). Therefore, we set N sim = 100 as a reasonable compromise considering the trade-off betweenaccuracy and computational cost.We first verify that the MMD computed from the temporalmoments reacts to changes in the S-V model parameters.To that end, we generate simulated data-sets by varying oneparameter uniformly in the prior support while keeping theothers fixed to their true value. As can be seen from Fig. 7,the estimated MMD values increase as each of the parametersmove away from their true value, and the minimum is (ap-proximately) achieved when both the data-sets are generatedfrom approximately the same parameters. The MMD reactsto changes in all the parameters, albeit more for some thanothers, as can be seen from the different scales of the y-axis.We therefore conclude that the distribution of the first fourlog temporal moments is informative about the S-V modelparameters.We now use the proposed method to calibrate the S-V modelusing X true . We assume uninformative (flat) priors in therange given in Tab. I for all the parameters to ensure that theirmarginal posteriors are unaffected by any prior beliefs. Theprior ranges were set according to the measurement settings asdone in [20]. The plots indicating convergence of the algorithmand the marginal posterior distributions are shown in Fig. 8. TABLE IP
ARAMETER ESTIMATES OBTAINED FOR MEASURED DATA . T
HESTANDARD DEVIATION OF THE APPROXIMATE POSTERIOR SAMPLES ISGIVEN IN PARENTHESIS . θ Prior range Estimate (std. deviation) S - V m od e l Q [10 − , − ] 4 . × − ( . × − ) Λ [5 × , ] 8 . × ( . × ) λ [5 × − , × ] 1 . × ( . × ) Γ [5 × − , × − ] 8 . × − ( . × − ) γ [5 × − , × − ] 4 . × − ( . × − ) σ W [2 × − , × − ] 3 . × − ( . × − ) P G m od e l g [0,1] 0.50 (0.019) N scat [5,35] 18 (1.73) P vis [0,1] 0.99 ( . × − ) σ W [2 × − , × − ] 4 . × − ( . × − ) The approximate posterior samples concentrate around the truevalue for all the parameters. The algorithm converges ratherquickly and the posteriors taper as the iterations proceed. Thealgorithm gives a reasonable estimate for the parameters evenin the first iteration. The proposed method is able to estimate Λ accurately as well, unlike in [20] where some post-processingwas required to estimate Λ . B. Application to the Propagation Graph model
As our second example, we demonstrate the performanceof our proposed method on the PG model. The PG model wasfirst introduced in [62], and since then has been applied to awide range of scenarios in [63]–[66]. Recently, it has been ex-tended to account for polarization in [15], [67], [68]. Althoughthe model is easy to simulate from, its likelihood function isunknown. A method of moments based estimator was appliedto calibrate the model in [15], but the moments equations werebased on approximation and it required manually fixing oneof the parameters.The PG model [62] represents the radio channel as adirected graph with the transmitters, receivers and scatterers asvertices. Edges model the wave propagation between the ver-tices. Edges are defined randomly depending on the probabilityof visibility, P vis . Other parameters of the model include thenumber of scatterers, N scat , and the reflection gain, g , resultingin the parameter vector θ = [ g, N scat , P vis , σ W ] (cid:62) . Note that N scat is assumed to be real-valued during the regressionadjustment, following which, its adjusted samples are roundedoff to the nearest integer. We used the antenna positions androom geometry for the model according to the measurementconditions given in [55]. Hence, N obs = N sim = 625 for thePG model. For each call of the model, the scatterer positionsare drawn uniformly across the room, and all 625 realizationsare generated based on those positions.We again use uniform priors for the parameters (see Tab. I)and apply the proposed method to calibrate the PG model tothe pseudo-observed data-set generated from θ true . To preventbiased results due to a particular configuration of the scatterers,we generate the pseudo-observed data by combining data fromfour different calls of the model using θ true . From Fig. 9, weobserve that the algorithm converges very quickly, and gives Iterations Q Iterations L Iterations l Iterations G Iterations g Iterations s W2 Fig. 8. Violin plots of ABC posterior samples of S-V model parameters asa function of PMC iterations. Note that a violin plot is similar to a box plotwith the addition of a rotated kernel density plot on each side. The green linesdenote the true parameter values θ true = [5 × − , × , , − , × − , − ] (cid:62) . posteriors which are highly concentrated around the true valuefor P vis , N scat , and σ W . The approximate posterior for g startsoff very wide and then gets narrower as the iterations proceed.The method is therefore able to accurately calibrate the PGmodel. VI. A PPLICATION TO M EASURED D ATA
We now attempt to fit both the S-V and the PG models tomillimetre-wave radio channel measurements obtained from[55]. The measurements of the channel transfer function wereperformed in the bandwidth 58 GHz to 62 GHz with aVNA, using N s = 801 equally spaced frequency points. Thebandwidth of B = 4 GHz means the frequency separation was ∆ f = 5 MHz and t max = 200 ns. We use measurements takenin a small conference room of dimension × × m in a non-line-of-sight scenario. At both transmitter and receiver sides, × antenna arrays were used. Although the antenna elementsused in the measurement were dual polarized, we focus on thevertical-vertical polarization since both the models are uni-polarized. This gives N obs = 5 × × × . We keep Iterations g Iterations N scat Iterations P vis Iterations s W2 Fig. 9. Violin plots of ABC posterior samples of PG model parametersas a function of PMC iterations. A violin plot is similar to a box plotwith the addition of a rotated kernel density plot on each side. θ true =[0 . , , . , − ] (cid:62) is denoted by the green dashed line. the settings M = 2000 and M (cid:15) = 100 of the algorithm sameas in the simulation experiments. A. Calibrating the Saleh-Valenzuela model
Upon applying Alg. 2 to the measured data, regressionadjustment yielded parameter samples outside the prior range.This indicated that the model is misspecified. That is indeedevident from Fig. 10 where we plot elements of the vector s ,namely the mean and variance of z and z , obtained from themeasurements and the S-V model. The simulated summariescorrespond to 2000 parameter values drawn from the prior.We observe that varying the parameters of the S-V model inthe prior range generated mean values that overlap the meanvalue from the measurements. However, the variance valuesfrom the S-V model does not capture the value observed inthe measurements. That is, there exists no such θ in the priorrange that leads to s “close” to s obs in terms of the varianceof the temporal moments. Hence, the model is misspecifiedfor this data and so we obtain s obs from the model as perSec. IV-D.The posteriors obtained from the measured data are shownin Fig. 11 for T = 15 iterations. The marginal approximateposteriors for λ , Γ , and σ W are highly concentrated. Posteriorsfor Γ and σ W appear to converge from the second iterationitself, indicating that these parameters affect the MMD themost. The posterior for λ becomes narrow and convergesafter the first few iterations. The posteriors for Q , Λ and γ take around eight or nine iterations to converge to a different −60−58−56−54−52 −42 −40 −38 −36 −34 mean ( z ) m ean ( z ) Observed summarySimulated summaries 0.000.250.500.751.00 0.0 0.5 1.0 1.5 var ( z ) v a r ( z ) Fig. 10. Mean (left) of z versus z simulated from the S-V model, alongwith the corresponding observed summary computed from the measured data(red). The observed summary lies in the point cloud generated by the model.In contrast, the S-V model is not able to replicate the higher moments of thedata, as seen from the variance plot (right), indicating model misspecification.Each of the 2000 simulated summaries correspond to one parameter drawnfrom the prior. location in the prior range than where they began from, unlikethe simulation experiment. This is potentially due to the modelbeing misspecified for the data, and so parameters that affectthe distribution of the log temporal moments the most convergefirst. The approximate estimates after 15 iterations are reportedin Tab. I. Considering that the regression adjustment in thefirst few iterations are done based on a coarse estimate of s obs from the model, the algorithm seems to work very well.The estimate of Λ is high, indicating arrival of around 17clusters on an average, while that of λ is quite low. Themodel is therefore forced to the case with many clusters havingvery few multipath components each, thus approaching the“unclustered” Turin model with constant rate. B. Calibrating the Propagation Graph model
The results obtained on calibration of the PG model onmeasured data after T = 10 iterations is shown in Fig. 12.In this case, the model is not misspecified for the considereddata. The approximate marginal posterior distributions for allthe parameters start off wide and then seem to converge afteraround four or five iterations. The posteriors are also quiteconcentrated for all the four parameters, especially P vis and σ W . Overall, the results are similar to what is observed in thesimulation experiment. See Tab. I for approximate estimatesof the parameters after T = 10 iterations. The estimates arevery similar to the ones reported in [22] where the polarizedPG model was calibrated on data from the same measurementcampaign. The estimate of P vis is almost one, indicating thatnearly all scatterers are connected. The estimates of g and P vis are consistent with the values reported from measurements[15] in other in-room scenarios for the PG model. Moreover,these values are close to those used in simulations with thePG model in [62], [69]. C. Model Validation
While the proposed method easily calibrates both the S-Vand the PG models to measured data, there is no guaranteethat the fitted models replicate the data well. This effect is Iterations Q Iterations L Iterations l Iterations G Iterations g Iterations s W2 Fig. 11. Violin plots of ABC posterior samples of S-V model parameters asa function of PMC iterations for measured data. of course not specific to the proposed method, but pertains toany calibration method. Thus, an extra step, termed modelvalidation, should be performed where predictions of thecalibrated models are compared to the data, and possibly otherdata-sets not used in the calibration process. Performing a fullmodel validation is out of scope of this paper, as our focusis on the calibration method itself. Instead, as a final step wecheck how well the two calibrated models fit the input data-set.To this end, we simulate 625 channel realizations from bothmodels with parameters set according to Tab. I. We comparethe outputs from the models to the measured data in termsof the average power delay profile (APDP) and the empiricalcumulative distribution function (cdf) of root mean square(rms) delay spread τ rms , mean delay ¯ τ , and received power P computed per channel realization, according to P = m , ¯ τ = m m , and τ rms = (cid:115) m m − (cid:18) m m (cid:19) . (22)It appears from Fig. 13 that both the models are able to fitthe APDP of the measurements well. The slope is capturedwell by both the models, along with the noise floor, although Iterations g Iterations N scat Iterations P vis Iterations s W2 Fig. 12. Violin plots of the ABC posterior samples of PG model parametersas a function of PMC iterations for measured data. the S-V model slightly underestimates it. The S-V model,however, is not able to replicate the peaks in the APDP ofthe measurements, while the PG model represents the initialpeaks better. This effect is to be expected for the particularsettings of the S-V model with many clusters and very fewwithin-cluster components. The peaks from the S-V model areaveraged out since the channel realizations are independent.This is unlike the PG model where positions of the antennasin the virtual array are included, thus simulating correlatedchannel realizations.Even though the APDPs are similar, the two models yieldsvery different empirical cdfs of τ rms , ¯ τ , and P as reported inFig. 13. The PG model captures the behavior of the cdfs verywell, while the S-V model clearly fails to do so, especiallyfor the mean delay and the received power. The means ofthe rms delay spread from both the models are fairly close tothe measured data, but the spread differs for the S-V model.As noticed theoretically in [60], [61], multipath models canyield temporal moments with similar means while differingvastly in variance. Indeed, for a stochastic multipath model,the covariance structure of the temporal moments depends onboth first- and second-order properties of the underlying pointprocess [70]. Thus, we conjecture that the poor prediction ofthe distribution of log temporal moments for the S-V model isdue to an inappropriate underlying point process. At the sametime, the PG model captures this distribution well.VII. D ISCUSSION
The proposed method makes certain choices such as thenumber of temporal moments to use. We found that using the −120−110−100−90−80 0.0e+00 5.0e−08 1.0e−07 1.5e−07 2.0e−07 Time [s] P o w e r [ d B ] DataPG modelS−V model
RMS delay spread [s] CD F Mean delay [s] CD F Received power CD F Fig. 13. S-V and PG model fit to the measured data in terms of the APDPand empirical cdfs for rms delay spread, mean delay, and received power.Note that the first multipath component of the S-V model arrives at t = 0 asin [2]. I = 4 temporal moments, ( m , m , m , m ), gives accurateestimates with narrow posteriors after the first couple ofiterations itself, while slight degradation in the performance isobserved with I = 3 moments m , m and m . Although themethod permits the use of arbitrarily many moments, we didnot see significant improvements in performance when includ-ing more than four moments. Although the temporal momentsseem adequate for calibration, the channel measurements couldin principle be summarized into other statistics as long as theyare informative about the model parameters.Other choices for the method include the prior distributionand the settings of the ABC algorithm. We used uninfor-mative priors to demonstrate the accuracy of the methodbased on data alone. However, including informative priorswould speed up the convergence of the algorithm. For areasonable approximation to the posterior distribution fromsamples, we suggest setting M (cid:15) = 100 or more. Dependingon the computational budget, (cid:15) can be set around 5% or less.We do not provide a stopping criterion for the algorithm,but instead encourage monitoring the posterior distributionsfor convergence. Potentially a stopping criterion could beimplemented where the iterations are stopped if the MMSEestimate changes less than some tolerance over iterations.As the MMD compares infinitely many summaries of thetwo data-sets, it works better than comparing only the low-order moments such as the means and covariances of thetemporal moments as in [19], [20], [22]. When choosing acharacteristic kernel, the MMD also guarantees that distri-butions are uniquely identified by these moments, unlike the case when comparing a finite number of moments. The MMDis a strong notion of distance in the sense that recovery ofthe true parameter value is guaranteed as the number of datapoints grows. The MMD also leads to robust estimators; i.e.estimators which will return reasonable estimates even in thepresence of outliers in the data or mild model misspecification[25], [26]. The median heuristic is a reasonable choice forbalancing robustness and efficiency as discussed in [25].The choice of kernel is not as impactful as the choice ofthe lengthscale, and the proposed squared-exponential kernelseems to work well.The proposed method is computationally lightweight andcan be run on standard laptops with reasonable run-time. Inthe experiments, the algorithm ran on a Lenovo ThinkPad withIntel Core i7 processor having 24 GB RAM. This gave a run-time of 5.5 hours for the PG model and around 2 days forthe S-V model for ten iterations of the algorithm. In our tests,the computation time is dominated by the particular modelevaluation time, while computation of temporal moments andthe MMD is negligible. Thus, the computational cost dependsheavily on the specific model and its implementation. Further-more, the run-time is impacted by specific settings of someparameters, e.g. Λ and λ in S-V model and N scat in PG model.For higher “true” values of these parameters, the model, andin turn, the calibration algorithm, takes considerably longertime to run. An obvious way to reduce the run-time is to runthe algorithm on hardware with more processing power or bymaking parallel calls to the model during each iteration.The proposed method relies solely on the ability to simulatefrom the model being calibrated, and not on the tractabilityof the likelihood or moment functions. Moreover, the methoddoes not depend on the particular mathematical construction ofthe model, which enables calibration of very different modelsusing the exact same procedure. This presents the opportunityto compare and select the best fitting model for a given data-set. Additionally, the proposed method inherently estimatesthe uncertainty of the fitted parameters, which is lacking in thestate-of-the-art calibration approaches. In contrast to the rathercomplex state-of-the-art calibration methods, the proposedmethod is simple to implement in R, MATLAB or Pythonand requires very few settings. This has the clear advantagethat results obtained from the method are easy to reproduce.Moreover, the method can detect and calibrate misspecifiedmodels as well. This is usually ignored or treated heuristicallyin standard algorithms.The proposed method can be used for a broad class ofmodels where the likelihood is not known or difficult tocompute. This is a great advantage in the model developmentas models can potentially be calibrated before their derivationis finalized. If the model is deemed worthy of further study,effort may be devoted to derive its likelihood function. Theproposed method may also be used in cases where sucha likelihood is in fact available, or available up to someintractable normalization constant. In such cases the ABCapproach may, however, be less effective than methods basedon the likelihood. In those cases, other distances could beused; see for example Stein discrepancies for cases where thelikelihood is unnormalised [71]. Similarly, if factorization of the likelihood is possible and some factors can be evaluated,more efficient inference methods than ABC may be derivedrelying e.g. on message passing techniques. Such methods relyextensively on the particular models and the structure of theirlikelihoods. Thus, the gain in efficiency comes at a cost inthe form of a loss in generality compared to the proposedABC method. Finally, we remark that distance metrics such asthe Wasserstein or the Hellinger distance could potentially beused instead of the MMD. However, future studies are requiredto assess their applicability for calibrating stochastic channelmodels. VIII. C ONCLUSIONS
The proposed ABC method based on MMD is able toaccurately calibrate wideband radio models of very differentmathematical structure. The proposed method relies on com-puting temporal moments of the received signal, and therebycircumvents the need for multipath extraction or clustering. Asa result, the method is automatic as no pre- or post-processingof the data and estimates or additional input from the user arerequired. We find that the method is able to fit models to bothsimulated and measured data. This work opens possibilitiesof developing similar methods for calibrating directional andtime-dependent channel models. Potentially, maximum meandiscrepancy could be used for other problems in propagationand communication studies that involve comparing data-sets.A
CKNOWLEDGMENT
The authors would like to thank Dr. Carl Gustafson andProf. Fredrik Tufvesson (Lund University) for providing themeasurement data. R
EFERENCES[1] G. L. Turin, F. D. Clapp, T. L. Johnston, S. B. Fine, and D. Lavry,“A statistical model of urban multipath propagation,”
IEEE Trans. Veh.Technol. , vol. 21, pp. 1–9, Feb 1972.[2] A. A. M. Saleh and R. Valenzuela, “A statistical model for indoormultipath propagation,”
IEEE J. Sel. Areas Commun. , vol. 5, pp. 128–137, February 1987.[3] K. Haneda, J. J¨arvel¨ainen, A. Karttunen, M. Kyr¨o, and J. Putkonen,“A statistical spatio-temporal radio channel model for large indoorenvironments at 60 and 70 GHz,”
IEEE Trans. Antennas Propag. , vol. 63,no. 6, pp. 2694–2704, 2015.[4] L. Raschkowski, P. Ky¨osti, K. Kusume, and E. T. J¨ams¨a,
METIS channelmodels, deliverable D1.4 V3 .[5] P. Ky¨osti,
WINNER II channel models, deliverables D1.1.2 V1.2, partI: Channel models .[6] C. Gustafson, K. Haneda, S. Wyne, and F. Tufvesson, “On mm-wavemultipath clustering and channel modeling,”
IEEE Trans. AntennasPropag. , vol. 62, no. 3, pp. 1445–1455, 2014.[7] J. Poutanen, K. Haneda, L. Liu, C. Oestges, F. Tufvesson, andP. Vainikainen, “Parameterization of the COST 2100 MIMO channelmodel in indoor scenarios,” in
Eur. Conf. on Antennas and Propag. ,pp. 3606–3610, 2011.[8] J. Li, B. Ai, R. He, M. Yang, Z. Zhong, and Y. Hao, “A cluster-basedchannel model for massive mimo communications in indoor hotspotscenarios,”
IEEE Trans. on Wireless Commun. , vol. 18, no. 8, pp. 3856–3870, 2019.[9] M. Yang, B. Ai, R. He, G. Wang, L. Chen, X. Li, C. Huang, Z. Ma,Z. Zhong, J. Wang, Y. Li, and T. Juhana, “Measurements and cluster-based modeling of vehicle-to-vehicle channels with large vehicle ob-structions,”
IEEE Trans. on Wireless Commun. , vol. 19, no. 9, pp. 5860–5874, 2020. [10] X. Yin and X. Cheng,
Propagation Channel Characterization, ParameterEstimation, and Modeling for Wireless Communications . John Wiley &Sons Singapore Pte. Ltd, Feb 2018.[11] N. Czink, P. Cera, J. Salo, E. Bonek, J. P. Nuutinen, and J. Ylitalo, “Aframework for automatic clustering of parameteric MIMO channel dataincluding path powers,” in
Proc. IEEE 64th Veh. Technol. Conf.-Fall ,pp. 1–5, 2006.[12] C. Gentile, “Using the kurtosis measure to identify clusters in wirelesschannel impulse responses,”
IEEE Trans. Antennas Propag. , vol. 61,no. 6, pp. 3392–3396, 2013.[13] R. He, W. Chen, B. Ai, A. F. Molisch, W. Wang, Z. Zhong, J. Yu, andS. Sangodoyin, “On the clustering of radio channel impulse responsesusing sparsity-based methods,”
IEEE Trans. Antennas Propag. , vol. 64,no. 6, pp. 2465–2474, 2016.[14] L. Greenstein, S. Ghassemzadeh, S.-C. Hong, and V. Tarokh, “Compar-ison study of UWB indoor channel models,”
IEEE Trans. on WirelessCommun. , vol. 6, pp. 128–135, Jan 2007.[15] R. Adeogun, T. Pedersen, C. Gustafson, and F. Tufvesson, “PolarimetricWireless Indoor Channel Modelling Based on Propagation Graph,”
IEEETrans. on Antennas and Propag. , vol. 67, no. 10, pp. 6585–6595, 2019.[16] C. Hirsch, A. Bharti, T. Pedersen, and R. Waagepetersen, “Maximumlikelihood calibration of stochastic multipath radio channel models,” accepted IEEE Trans. on Antennas and Propag. , vol. 69, 2020.[17] A. Bharti, R. Adeogun, and T. Pedersen, “Parameter Estimation forStochastic Channel Models using Temporal Moments,” in
Proc. 2019IEEE Int. Symp. on Antennas and Propag. and USNC-URSI Radio Sci.Meeting , 2019.[18] W.-D. Wu, C.-H. Wang, C.-C. Chao, and K. Witrisal, “On parameterestimation for ultra-wideband channels with clustering phenomenon,” in
IEEE 68th Veh. Technol. Conf. , IEEE, Sep 2008.[19] A. Bharti, R. Adeogun, and T. Pedersen, “Estimator for StochasticChannel Model without Multipath Extraction using Temporal Moments,”in , 2019.[20] A. Bharti and T. Pedersen, “Calibration of stochastic channel modelsusing approximate Bayesian computation,” in
Proc. IEEE Global Com-mun. Conf. Workshops , 2019.[21] R. Adeogun, “Calibration of stochastic radio propagation models usingmachine learning,”
IEEE Antennas and Wireless Propag. Lett. , vol. 18,pp. 2538–2542, Dec 2019.[22] A. Bharti, R. Adeogun, and T. Pedersen, “Learning parameters ofstochastic radio channel models from summaries,”
IEEE Open J. ofAntennas and Propag. , pp. 1–1, 2020.[23] S. A. Sisson,
Handbook of Approximate Bayesian Computation . Chap-man and Hall/CRC, Sep 2018.[24] A. Gretton, K. Borgwardt, M. J. Rasch, and B. Scholkopf, “A kerneltwo-sample test,”
J. of Mach. Learn. Res. , vol. 13, pp. 723–773, 2012.[25] F.-X. Briol, A. Barp, A. B. Duncan, and M. Girolami, “Statisticalinference for generative models with maximum mean discrepancy,” arXiv:1906.05944 , 2019.[26] B.-E. Ch´erief-Abdellatif and P. Alquier, “Finite sample properties ofparametric MMD estimation: robustness to misspecification and depen-dence,” arXiv:1912.05737 , 2020.[27] B.-E. Cherief-Abdellatif and P. Alquier, “MMD-Bayes: Robust bayesianestimation via maximum mean discrepancy,” vol. 118 of
Proc. of Mach.Learn. Res. , pp. 1–21, PMLR, 08 Dec 2020.[28] S. Nakagome, K. Fukumizu, and S. Mano, “Kernel approximateBayesian computation in population genetic inferences,”
Stat. Appl. inGenet. and Mol. Biol. , vol. 12, no. 6, pp. 667–678, 2013.[29] M. Park, W. Jitkrittum, and D. Sejdinovic, “K2-ABC: approximateBayesian computation with kernel embeddings,”
Proc. of the 19th Int.Conf. on Artif. Intell. and Statistics , vol. 51, pp. 398–407, 2015.[30] J. Mitrovic, D. Sejdinovic, and Y. W. Teh, “DR-ABC: ApproximateBayesian computation with kernel-based distribution regression,” , vol. 3, pp. 2209–2218, 2016.[31] K. Kisamori, M. Kanagawa, and K. Yamazaki, “Simulator calibrationunder covariate shift with kernels,” in
Proc.s of the 23rd Int. Conf. onArtif. Intell. and Statistics , vol. 108, pp. 1244–1253, PMLR, Aug 2020.[32] G. K. Dziugaite, D. M. Roy, and Z. Ghahramani, “Training generativeneural networks via maximum mean discrepancy optimization,” in
Proc.of 31st Conf. on Uncertain. in Artif. Intell. , pp. 258–267, 2015.[33] D. J. Sutherland, H.-Y. Tung, H. Strathmann, S. De, A. Ramdas,A. Smola, and A. Gretton, “Generative models and model criticismvia optimized maximum mean discrepancy,” in
Int. Conf. on Learn.Represent. , 2017. [34] Y. Li, K. Swersky, and R. Zemel, “Generative moment matchingnetworks,” in Proc. of the 32nd Int. Conf. on Mach. Learn. - Vol. 37 ,ICML’15, p. 1718–1727, JMLR.org, 2015.[35] K. Muandet, K. Fukumizu, B. Sriperumbudur, and B. Sch¨olkopf, “Kernelmean embedding of distributions: A review and beyond,”
Found. andTrends in Mach. Learn. , vol. 10, no. 1-2, pp. 1–141, 2017.[36] A. Berlinet and C. Thomas-Agnan,
Reproducing Kernel Hilbert Spacesin Probability and Statistics . New York: Springer Science+BusinessMedia, 2004.[37] B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Sch¨olkopf, andG. R. G. Lanckriet, “Hilbert space embeddings and metrics on prob-ability measures,”
J. of Mach. Learn. Res. , vol. 11, 2010.[38] C.-J. Simon-Gabriel and B. Sch¨olkopf, “Kernel Distribution Embed-dings: Universal Kernels, Characteristic Kernels and Kernel Metrics onDistributions,”
J. of Mach. Learn. Res. , vol. 19, no. 44, pp. 1–29, 2018.[39] F.-X. Briol, C. J. Oates, M. Girolami, and M. A. Osborne, “Frank-wolfe bayesian quadrature: Probabilistic integration with theoreticalguarantees,” in
Proc. of the 28th Int. Conf. on Neural Inf. Process. Syst.- Volume 1 , NIPS’15, p. 1162–1170, 2015.[40] S. Reddi, A. Ramdas, B. Poczos, A. Singh, and L. Wasserman, “Onthe High Dimensional Power of a Linear-Time Two Sample Test underMean-shift Alternatives,” in
Proc. of the 18th Int. Conf. on Artif. Intell.and Stat. , vol. 38, pp. 772–780, PMLR, May 2015.[41] S. R¨uping, “SVM kernels for time series analysis,” tech. rep., 2001.[42] M. Cuturi, J.-P. Vert, Ø. Birkenes, and T. Matsui, “A kernel for timeseries based on global alignments,”
IEEE Int. Conf. on Acoust,, Speechand Signal Process. , vol. 2, pp. 413–416, 2007.[43] M. Cuturi, “Fast global alignment kernels,”
Proc. of the 28th Int. Conf.on Mach. Learn. , pp. 929–936, 2011.[44] I. Chevyrev and H. Oberhauser, “Signature moments to characterize lawsof stochastic processes,” arXiv:1810.10971 , 2018.[45] F. J. Ki´raly and H. Oberhauser, “Kernels for sequentially ordered data,”
J. of Mach. Learn. Res. , vol. 20, pp. 1–45, 2019.[46] G. Wynne and A. B. Duncan, “A kernel two-sample test for functionaldata,” arXiv:2008.11095 , 2020.[47] L. E. Franks,
Signal Theory . Englewood Cliffs, N. J., Prentice-Hall,1969.[48] A. Bharti, L. Clavier, and T. Pedersen, “Joint statistical modeling ofreceived power, mean delay, and delay spread for indoor wideband radiochannels,” in , pp. 1–5, 2020.[49] A. Bharti, R. Adeogun, X. Cai, W. Fan, F.-X. Briol, L. Clavier, andT. Pedersen, “Joint modeling of received power, mean delay, and delayspread for wideband radio channels,” arXiv:2005.06808 , 2020.[50] I. Steinwart and A. Christmann,
Support Vector Machines . Springer,2008.[51] M. A. Beaumont, J.-M. Cornuet, J.-M. Marin, and C. P. Robert, “Adap-tive approximate bayesian computation,”
Biometrika , vol. 96, pp. 983–990, Oct 2009.[52] M. A. Beaumont, W. Zhang, and D. J. Balding, “Approximate bayesiancomputation in population genetics,”
Genetics , vol. 162, no. 4, pp. 2025–2035, 2002.[53] J. Lintusaari, M. U. Gutmann, R. Dutta, S. Kaski, and J. Corander,“Fundamentals and recent developments in approximate bayesian com-putation,”
Syst. Biol. , vol. 66, pp. 66–82, Jan 2017.[54] D. T. Frazier, C. P. Robert, and J. Rousseau, “Model misspecificationin approximate bayesian computation: consequences and diagnostics,”
J. of the R. Stat. Soc.: Ser. B (Stat. Methodol.) , vol. 82, pp. 421–444,Jan 2020.[55] C. Gustafson, D. Bolin, and F. Tufvesson, “Modeling the polarimetricmm-wave propagation channel using censored measurements,” in , IEEE, Dec 2016.[56] M. L. Jakobsen, T. Pedersen, and B. H. Fleury, “Analysis of thestochastic channel model by saleh & valenzuela via the theory of pointprocesses,”
Int. Zurich Seminar on Commun. , 2012.[57] J. A. Gubner, B. N. Bhaskar, and K. Hao, “Multipath-cluster channelmodels,” in , pp. 292–296,2012.[58] M. S. Derpich and R. Feick, “Second-order spectral statistics for thepower gain of wideband wireless channels,”
IEEE Trans. on Veh.Technol. , vol. 63, no. 3, pp. 1013–1031, 2014.[59] A. Meijerink and A. F. Molisch, “On the physical interpretation of thesaleh–valenzuela model and the definition of its power delay profiles,”
IEEE Trans. on Antennas and Propag. , vol. 62, no. 9, pp. 4780–4793,2014.[60] T. Pedersen, “Modeling of path arrival rate for in-room radio channelswith directive antennas,”
IEEE Trans. on Antennas and Propag. , vol. 66,no. 9, pp. 4791–4805, 2018. [61] T. Pedersen, “Stochastic Multipath Model for the In-Room RadioChannel Based on Room Electromagnetics,”
IEEE Trans. on Antennasand Propag. , vol. 67, pp. 2591–2603, April 2019.[62] T. Pedersen and B. H. Fleury, “Radio channel modelling using stochasticpropagation graphs,” in
IEEE ICC , pp. 2733–2738, June 2007.[63] L. Tian, X. Yin, Q. Zuo, J. Zhou, Z. Zhong, and S. X. Lu, “Channelmodeling based on random propagation graphs for high speed railwayscenarios,” in
IEEE PIMRC , pp. 1746–1750, Sept 2012.[64] L. Tian, V. Degli-Esposti, E. M. Vitucci, and X. Yin, “Semi-deterministicradio channel modeling based on graph theory and ray-tracing,”
IEEETrans. on Antennas and Propag. , vol. 64, pp. 2475–2486, June 2016.[65] J. Chen, X. Yin, L. Tian, and M. Kim, “Millimeter-wave channelmodeling based on a unified propagation graph theory,”
IEEE Commun.Lett. , vol. 21, pp. 246–249, Feb 2017.[66] R. O. Adeogun, A. Bharti, and T. Pedersen, “An iterative transfer matrixcomputation method for propagation graphs in multi-room environ-ments,”
IEEE Antennas and Wireless Propag. Lett. , vol. 18, pp. 616–620,April 2019.[67] R. Adeogun and T. Pedersen, “Propagation graph based model formultipolarized wireless channels,” in
IEEE WCNC , April 2018.[68] R. Adeogun and T. Pedersen, “Modelling polarimetric power delayspectrum for indoor wireless channels via propagation graph formalism,”in , May 2018.[69] T. Pedersen, G. Steinb¨ock, and B. H. Fleury, “Modeling of reverberantradio channels using propagation graphs,” vol. 60, pp. 5978–5988, Dec2012.[70] T. Pedersen, “First- and second order characterization of temporalmoments of stochastic multipath channels,” in , pp. 1–4, 2020.[71] A. Barp, F.-X. Briol, A. Duncan, M. Girolami, and L. Mackey, “Mini-mum stein discrepancy estimators,” in