Blind Estimation of Reflectivity Profile Under Bayesian Setting Using MCMC Methods
11 Blind Estimation of Reflectivity Profile UnderBayesian Setting Using MCMC Methods
Burak Cevat Civek and Emre Ertin
Abstract —In this paper, we study the problem of inverseelectromagnetic scattering to recover multilayer human tissueprofiles using ultrawideband radar systems. We pose the re-covery problem as a blind deconvolution problem, in whichwe simultaneously estimate both the transmitted pulse andthe underlying dielectric and geometric properties of the onedimensional tissue profile. We propose comprehensive BayesianMarkov Chain Monte Carlo methods, where the sampler pa-rameters are adaptively updated to maintain desired acceptanceratios. We present the recovery performance of the proposedalgorithms on simulated synthetic measurements. We also derivetheoretical bounds for the estimation of dielectric properties andprovide minimum achievable mean-square-errors for unbiasedestimators.
Index Terms —Bayesian Methods, Markov Chain Monte Carlo,Blind Deconvolution, Inverse Problems, Multilayer Tissue Pro-files.
I. I
NTRODUCTION
Remote sensing of human physiology is of growing im-portance in medical research for the diagnosis and treat-ment of chronic diseases [1], [2]. Monitoring the alterationsin internal tissue composition provides valuable informationabout the progression of life-threatening diseases, includingbut not limited to, brain tumor, pulmonary edema, and cardiacdisorders [3]. However, traditional imaging modalities, such asMagnetic Resonance Imaging (MRI), Computed Tomography(CT), or Ultrasound, are not feasible for monitoring variationsregularly, e.g., on a daily basis, due to their high cost andaccessibility issues. Therefore, more efficient, low-cost, andpossibly mobile sensing schemes are needed for frequent andlong-term measurements on the human body.Following the advancements in sensor technologies, reliablecharacterization of tissue profiles is becoming viable for bothclinic and home environments at much lower costs with easyaccess [4]. Specifically, ultrawideband (UWB) radar sensorsemitting electromagnetic (EM) waves, which can penetratethrough most of the biological tissues including skin, fat, mus-cle, etc., provide a promising alternative to those conventionalsensing modalities [5], [6]. In principle, a UWB radar systemtransmits a short duration pulse and records the backscatteredsignal composed of reflections from the target object. In humanbody, each tissue exhibits distinct dielectric properties, i.e.,permittivity and conductivity, causing impedance mismatchesat the interfaces and creating multiple reflection points for theimpinging transmitted pulse. Therefore, a rich backscatteredsignal, which is strongly affected by the dielectric properties,
B. C. Civek and E. Ertin are with the Department of Electrical andComputer Engineering, The Ohio State University, Columbus, OH, 43210,USA. Contact e-mail: [email protected] is observed and can be processed to make inferences aboutthe tissue composition underneath the skin.UWB radar systems present practical advantages which en-able their use in medical applications [7]. Due to the extremelyshort pulse duration (typically less than a nanosecond) withbroadband frequency occupation (typically from 2 to 10 GHz),UWB systems offer the considerably high range resolution (inthe order of centimeters) needed for detecting multiple layersof tissues [8]. In addition, within the limits of Federal Com-munications Commission’s (FCC) spectral emission mask [9],which is − . dBm/MHz for devices operating in between . − . GHz, the total radiation power is in the order oftens of microwatt, constituting a low power, harmless sensingscheme for the human body. Moreover, it enables contactlesssensing of the body, since it does not necessarily requirephysical contact for performing measurements.The emergence of UWB radar as a medical sensing technol-ogy occurred when McEwan described the physical principleof the UWB system which was able to detect movements ofthe heart wall in the two patents awarded to him [10], [11].Since then, several different studies have investigated the EMwave propagation in human tissues at microwave frequencies,revealing the attenuation coefficients due to both reflectionand conductivity losses [7], [12], [13]. Although the reflectedpulses from deeper tissues, such as lungs, heart, or brain, areexposed to strong attenuation, it has been shown [14] that theeffect of variations in these tissues can be observed in thebackscattered signal given sufficiently high Signal-to-Noiseratio (SNR). Therefore, adequate signal processing schemesare needed for extracting information related to deeper layersof human tissues from noisy radar measurements.Detecting vital signs of human body, such as respirationand heart rate, is one of the most widely studied problem inmedical UWB sensing [6], [15]. Many studies successfullyrecovered vital signs in a non-invasive manner due to thesensitivity of the backscattered signal to movements of theinner tissues, such as lungs or heart [16], [17]. In this work,however, instead of measuring vital signs, we focus on de-tecting the anomalies, or tracking the variations, in sub-skintissue composition, which has growing interest in the liter-ature. Possible applications include detecting or monitoringthe evolution of breast cancer, brain tumor, water retention inlungs, or pulmonary edema. In the next section, we review thetechniques employed for making inferences about the targetsilluminated by UWB radar sensors. a r X i v : . [ ee ss . SP ] J a n II. P
RIOR A RT ON I NFERENCE M ETHODS FOR
UWBR
ADAR M EASUREMENTS
In general, the inference methods for detecting alterationsin tissue compositions can be classified as indirect and direct approaches. Indirect approaches concentrate on monitoringthe changes either in the received backscattered signal orin the extracted reflectivity profile. The reflectivity profile isa function of the dielectric and spatial properties associatedwith the underlying tissue composition, and represents theimpulse response of the target body area. In time domain,it is convolved with the transmitted pulse to produce thebackscattered signal. Direct approaches, on the other hand,focus on the explicit recovery of the dielectric properties, suchas permittivity and conductivity, as well as the geometricalproperties, such as thicknesses, of the target tissues based onthe extracted reflectivity profile. In medical UWB sensing liter-ature, the attention is mostly on the indirect approaches. How-ever, there is a rich literature on direct inference approaches aswell, especially in ground penetrating radar (GPR) applicationsinvestigating variations on subsurface earth layers. Based onthe intrinsic similarities of the problem settings, e.g., bothsubsurface earth layers and human tissues are conventionallymodeled as multilayer homogeneous media, the literature onGPR applications can be employed in medical applicationsas well. Therefore, we review the indirect approaches withinthe framework of medical sensing applications and then focusmore on GPR literature for direct inference approaches.
A. Indirect Inference Approaches
Studies on indirect inference methods for medical sensingapplications commonly put particular emphasis on detectingvariations on the backscattered signal without employing anyadvanced signal processing techniques. As a result, manyof those are currently limited to feasibility studies withoutproviding quantitative analysis about the absolute changesin the dielectric properties. In [18], authors investigated theproblem of detecting water accumulation in human bladder viaUWB radar. Their analyses showed that the reflected pulsesfrom the inner bladder-muscle interface are visible in thebackscattered signal, which differs in scale and time basedon the water level in the bladder. However, the observationsare limited to visual inspections on the backscattered signal,restricting the scope of this work to pure feasibility analysis. In[19], [20], authors extended this work by providing thicknessestimations for the bladder based on the time-of-arrival ofthe reflected pulses. The proposed methods, however, neglectthe multiple reflections and completely rely on sufficientseparation of the reflected pulses in time-domain. Moreover,the dielectric properties of the tissues are assumed to be knownand the selected layered tissue model is oversimplified withonly three layers. As a result, application of these methodsto more complicated scenarios, such as monitoring the waterretention in the lungs, is prohibited due to significant effect ofmultiple reflections, overlapping pulse returns and variabilityof dielectric properties. O’Halloran et al . [21] presented adifferent approach and treated the problem in a classificationframework, where a k-NN classifier is trained to assign a given measurement into one of the three states of the bladder, e.g.,small, medium or full. Although the obtained classificationaccuracy was considerably high, the phantom model used forcollecting measurements was relatively simple and was notrepresentative of real-life, where high inter-subject variabil-ity is expected. As another application area, more recently,Lauteslager et al . [22] used a UWB impulse radar for detectinga volume of blood in the cerebral cortex and demonstrated theconsistency between the observed and expected time-delays ofthe reflected signal peaks corresponding to targets located atdifferent distances. The results showed promise by enablingdetection of sub-millimeter variations in the target location.However, detection of the reflection points was performed byvisual inspection.The algorithms discussed above directly work on the mea-sured backscattered signal without removing the effect of thetransmitted pulse. As a result, they are limited to scenarios inwhich the effect of multiple reflections are negligible and thepulses reflected from different interfaces are clearly separated.In general, however, neither of these assumptions hold foractual human tissue compositions due to significant energy ofthe multiple reflections at shallow layers and the overlappingechos caused by these reflections. In addition, despite thehigher resolution provided by UWB radars, the bandwidth ofthe pulses are usually limited due to significant attenuationat high frequencies, which prevents achieving the requiredresolution for discriminating all tissues. Moreover, since thereflections coming from deeper layers, such as lungs, arehighly attenuated, these effects become even more significant.An attempt to resolve these issues is to deconvolve thetransmitted pulse from the backscattered signal to extract thepure reflectivity profile and improve the theoretical resolution.The deconvolution process usually requires a regularizationon the reflectivity profile, since the problem at hand is ill-posed due to non-unique solutions. A natural choice, whichhas been extensively studied in geophysics, is sparsity, whichprovides a parsimonious reconstruction of the reflectivity pro-file representing the locations of significant discontinuities inthe dielectric properties. Despite its prevalent applications toseismic deconvolution problems [23], [24], it has a relativelyrecent history in medical UWB sensing applications. In [16],Gao et al . demonstrated a promising sparse reconstructionscheme for tracking the movements of the heart wall by meansof (cid:96) regularization. Although this was an attempt to estimatepurely the heart rate, the variations in the magnitude andlocation of discontinuities can reveal useful information aboutthe alterations in subsurface tissue composition.Overall, although there are substantial evidences showingthe feasibility of UWB systems on detecting variations oninner tissue composition, advanced signal processing schemesare needed for quantitative assessment of the variations. B. Direct Inference Approaches
Unlike indirect approaches, the aim of direct approachesis to estimate the value of constitutive parameters, i.e., thepermittivity, conductivity and thickness, of the target tissues.A multilayer planar model is a reasonable, widely studied model in the literature to describe the anatomical structureof the human body [7], [12], [14], [25]. One of the commontechniques for inverse EM scattering problems targeting mul-tilayer homogeneous mediums is the layer stripping, whichis extensively studied in GPR systems using UWB pulsesto evaluate the physical and geometric properties of thesubsurface earth layers [26], [27], [28], [29]. Layer strippingis a time domain approach that estimates the constitutiveparameters of each layer in a sequential manner, i.e., at eachiteration, the algorithm estimates the properties of the top-most layer and removes its effect from the backscatteredsignal, progressively reconstructing each layer until all layersare reconstructed. The estimation procedure is usually basedon the amplitude and time-of-arrival of the echos reflectedfrom the layer interfaces. Therefore, success of the techniqueis closely related to accurate estimation of reflected pulseamplitudes and corresponding time delays, which requiresclearly separated echos in time domain [29], [30]. Althoughthis requirement is satisfied for many geophysical applicationsdue to greater thicknesses of earth layers, such clear separationis usually not possible for human tissues. Moreover, typicallayer stripping techniques assume the multiple reflections arenegligible as in [26], [27], [31], illustrating the validity ofthis assumption for geophysical applications such as roadpavement evaluation and ice sheet reconstruction. However,multiple reflections have a dominating effect when the targetmedium is human body [7], [14]. Recently, Caorsi et al. [32], proposed a comprehensive layer stripping techniquewhich uses a binary decision tree approach [33] to detect andremove the pulses caused by multiple reflections to eliminateambiguities. The proposed technique successfully classifieseach echo as a direct or multiple reflection in the case of well-separated pulses with loss-less mediums (zero conductivities),but the performance significantly degrades if overlaps existor the mediums have non-zero conductivities. As a result,application of layer stripping is limited for medical UWBsensing due to overlapping pulses, multiple reflections, andnon-negligible conductivity losses.An alternative to the time-domain layer stripping approachis the EM inversion, which constructs a least squares problem(usually in frequency domain) to minimize the mean squarederror between the actual and reconstructed measurements. Thereconstructed measurement is obtained through a problemspecific forward model governing the EM wave propagation inlayered media and antenna responses. The optimization is per-formed on the constitutive parameters, i.e., permittivity, con-ductivity and thickness, to find the set of parameters achievingthe best fit to the actual measurement. In [34], Spagnolini com-pared EM inversion with layer stripping and demonstrated itspromising capabilities in radar inverse problems. Unlike layerstripping, which only concerns the time delay and amplitudeinformation, EM inversion completely utilizes the underlyingphysical interactions in EM wave propagation. Therefore, iteliminates the need for the strong simplifying assumptions andfacilitates successful recovery even for the cases where thereexist overlapping pulses, multiple reflections and non-zeroconductivities. However, the success of EM inversion methodscompletely relies on formulating an accurate forward model that appropriately describes the antenna radiation and wavepropagation in layered media. To address this issue, Lambot etal. [35] presented a rigorous system model, which consists oflinear system responses representing antenna transfer functionsand closed form solutions of Maxwell’s equation for three-dimensional wave propagation in horizontally layered homo-geneous media. The presented analytical expressions for thesolutions of Maxwell’s equations enabled more efficient im-plementation of the EM inversion technique by eliminating theneed for time consuming numerical solutions such as the finite-difference time-domain (FDTD) method. In [36], the authorsfurther improved the system model by considering the multiplereflections between the antenna and earth surface, yielding asubstantially accurate forward model for EM inversion. Thedispersive, i.e., frequency dependent, structure of the dielectricproperties is modeled by the well-known Debye relaxationmodel [37]. The proposed forward models were designed forfar-field measurements, where the antenna was modeled asan infinitesimal electric dipole away from the earth surface.More recently, an extended model was proposed for near-fieldmeasurements, which models the antenna using superpositionof equally distanced infinitesimal electric dipoles [38]. Thelatter constitutes a more realistic model for wearable deviceapplications designed for medical UWB sensing due to theclose proximity of sensors to the skin.Solution methods for the inverse EM scattering problemhave a rich literature in the GPR applications, however, theirapplication to medical setting is limited. In this work, weparticularly concentrate on the problem of monitoring thetissue composition in the thoracic cavity to detect, for example,the existence of pulmonary edema, or water retention using aUWB radar sensor system. However, we should note that thepresented methodologies are generic such that they can beeasily applied to investigate other parts of the human body,such as the head for brain tumor detection.For mobile sensing systems like wearable devices, theantenna must be placed on top of the skin (or at least withina couple of centimeters away from the skin). However, near-field on-body measurement with a UWB radar sensor raisesadditional significant technical challenges. The transmittedpulse from the antenna becomes highly dependent on theantenna-skin interface and can have high inter-subject variabil-ity. Even for the same subject antenna transfer function canchange based on the placement and skin conditions. Therefore,we pose the problem as a blind deconvolution problem andsimultaneously estimate both the antenna responses and thereflectivity profile.We follow a direct inference approach and present a blindBayesian EM inversion method, where we explicitly recoverthe dielectric properties in one-dimensional setting.III. M
EASUREMENT M ODEL FOR M ULTILAYER R EFLECTIVITY P ROFILE
A. Multilayer Reflection Model
We consider an UWB system where we transmit a shortduration UWB pulse and collect the backscattered signalswhich are reflections from an object composed of multiple planar layers. The layers are assumed to be homogeneousmediums and have distinct dielectric properties such thatthe interfaces between them can be considered as reflectivesurfaces. The backscattered signal can be expressed as acombination of scaled, shifted and distorted versions of thetransmitted waveform. The distortion occurs due to materialseither being dispersive or having non-zero conductivity. Thesefactors are completely determined by the reflectivity profileof the target being monitored. In general, for an M -layerstructure, as illustrated in Fig. 1, where the last layer isassumed to have infinite depth, the 1D downward reflectivityprofile X i ( w ) in frequency domain has the following recursiveform [39] X i ( ω ) = r i + X i +1 ( ω ) e − α i d i e − j β i d i r i X i +1 ( ω ) e − α i d i e − j β i d i , (1)for each interface I i for i = 1 , , . . . , M − , with X M ( ω ) = r M and ω representing the angular frequency in rad/sec. Here, r i denotes the downward reflection coefficient at interface I i , α i , β i and d i respectively represent the attenuation coefficient,phase constant and thickness of the medium i . The definitionsfor r i , α i and β i are explicitly given in terms of the dielectricconstant ε i and conductivity σ i in S/m of the mediums: α i = ω (cid:20) µ o ε o ε i (cid:0) ζ i − (cid:1)(cid:21) / , β i = ω (cid:20) µ o ε o ε i (cid:0) ζ i + 1 (cid:1)(cid:21) / (2)where ζ i = (cid:112) σ i /ωε o ε i ) . Here, µ o and ε o are constantsrepresenting the vacuum permeability in H/m and vacuumpermittivity in F/m respectively. The reflection coefficients atinterfaces are given in terms of the complex valued intrinsicimpedance η i of the mediums: r i = η i − η i − η i + η i − where η i = (cid:115) jωµ o σ i + jωε o ε i . (3)The multilayer reflection model given in (1) accounts for thereflection paths caused by multiple bounces in between theinterfaces, shown by the gray arrows in Fig. 1, along withthe primary reflection paths, shown by the black arrows. Italso incorporates the conductivity property of the layers, whichprovides the ability of modeling lossy mediums. These providea more accurate modeling framework compared to the studiesconsidering only the primary reflections with lossless mediums[]. B. Measurement Model
In this work, we consider the scenario in which the sourceof the transmitted pulse is d meters away from the in-terface I with normal incidence. Therefore, for any spe-cific frequency ω , the corresponding frequency componentof the transmitted pulse, H ( ω ) , is multiplied by X ( ω ) = X ( ω ) e − α d e − j β d , yielding the following backscatteringmodel Y ( ω ) = H ( ω ) X ( ω ) , (4)where Y ( ω ) represents the frequency domain representation ofthe backscattered signal. In practice, we observe the sampled Fig. 1:
Illustration of reflection paths for an M -layer structure. Blackarrows represent the primary reflection paths associated with eachinterface. Gray arrows represent the multiple bounces between theinterfaces. Inclined arrows are used only for the illustration purposes.We consider the normal incidence scenario. real valued time domain sequence { y n } N − n =0 , which is con-verted back to frequency domain by applying Discrete FourierTransform (DFT) and modeled as y = diag ( F T h ) x + v (5)where y = [ Y ( ω ) , . . . , Y ( ω N − )] T is the measurementvector and x = [ X ( ω ) , . . . , X ( ω N − )] T is the reflectivityprofile in frequency domain with ω n = 2 πn/N . Here, wemodel the transmitted waveform in time domain using a realvalued sequence h ∈ R T , where T (cid:28) N , and construct thepartial DFT matrix F T ∈ C N × T using the first T columns offull DFT matrix. Lastly, we model the measurement noise byincluding a complex valued additive noise term v ∈ C N .Since the actual measured sequence { y n } N − n =0 is real val-ued, the frequency domain model given in (5) is conjugatesymmetric, i.e., Y ( ω n ) = Y ∗ ( ω N − n ) , hence, we only workon the first half of the frequencies corresponding to theindexes n = 0 , . . . , N/ for even N . In time domain, (5)corresponds to a circular convolution model, which is validas long as the measurement length N is sufficiently large,because the backscattered signal energy significantly dropsafter a certain number of reflections. Another point to noteis that the reflection model in (1) contains infinitely manyreflections, which are not possible to capture with a finiteduration measurement vector. However, since the reflectedenergy is almost negligible after a certain time delay, this doesnot cause a problem in practice.IV. P ROBLEM D ESCRIPTION
Our goal in this work is to estimate the multilayer modelparameters { ε i } Mi =1 , { σ i } Mi =1 , and { d i } M − i =0 along with thetransmitted pulse h solely based on the measurement vector y . We note that dielectric constant ε (not to be confused (a) (b) (c) Fig. 2:
Contour plots representing example 2D cross sections of the high dimensional log-posterior distribution log p ( θ , γ , σ v | y ) for (a) n - n , (b) d - d and (c) n - d planes. Measurement is generated using (5) for a 4-layer structure. For each example, the remaining modelparameters are fixed at their true values used to generate the measurement. with vacuum permittivity ε o ) and conductivity σ of the firstmedium, where the source is located, are assumed to beknown, but the distance d between the transmitter and thefirst interface is also unknown and to be estimated. In total,the number of parameters to be estimated is M + T for an M -layer structure with a length T pulse. This problem is knownas the blind deconvolution problem in the literature, since boththe transmitted pulse and the reflectivity profile are unknown.The multilayer reflection model used for the reflectivity profileeliminates the well-known ill-posed characteristic of the blinddeconvolution problems by constraining the solution spacesignificantly. Moreover, constraints on the pulse shape furthershrink the solution space. We assume that the transmittedpulse has a relatively short time duration compared to themeasurement, i.e., T (cid:28) N , and is nearly bandlimited to thepassband Ω p .We follow a Bayesian framework, where the unknownvariables are assumed to be random quantities with specificprior distributions reflecting our prior knowledge. We nowdescribe the prior distributions assigned for each variables.
1) Prior Distributions for Multilayer Model Parameters:
The multilayer reflectivity profile X is already regularized bythe explicit use of propagation model given in ( ). Therefore,we assign a uniform distribution for each model parame-ters, where only the boundaries of the parameter space arespecified a priori. For notational convenience, we collectthe multilayer model parameters in a single vector θ =[ ε , . . . , ε M , σ , . . . , σ M , d , . . . , d M − ] T and assign uniformdistribution over the multidimensional parameter space Λ θ ,i.e., p ( θ ) = (cid:40) k, if θ ∈ Λ θ , otherwise , (6)where the parameter space Λ θ is defined as Λ θ = { θ | ε min ≤ ε i ≤ ε max for i = 1 , . . . , M,σ min ≤ σ i ≤ σ max for i = 1 , . . . , M,d min ≤ d i ≤ d max for i = 0 , . . . , M − } . (7)
2) Prior Distribution for Pulse Sequence:
We considertwo different scenarios for modeling the transmitted pulsesequence. In the first scenario, we assume the signal energy is strictly limited to the first T (cid:28) N samples in time domain, i.e.,we explicitly set the pulse length as T . The second scenarioconsiders a more general case where significant amount ofthe signal energy is within the first T samples, but there stillexist considerable amount of energy on the remaining N − T samples. For both scenarios, we also assume that the frequencysupport is restricted to the passband Ω p , where the signalenergy is negligible outside the passband.Following the first scenario, we represent the pulse h ∈ R T using a subspace A ∈ R T × L , i.e., h = Aγ , where γ ∈ R L represents the random coefficient vector. Here, A is selectedto reflect the frequency domain restrictions, i.e., it can beconstructed by selecting the first L sequence of either DiscreteProlate Spheroidal (DPS) Sequences or Hermite Functions[40]. This also generalizes the case where there is no specificfrequency domain constraints by setting A = I T . Instead ofdirectly solving for h , we solve for the coefficient vector γ ,which is assigned a zero mean i.i.d. Gaussian distribution p ( γ ) = (cid:18) πσ γ (cid:19) L/ exp (cid:18) − γ T γ σ γ (cid:19) , (8)with known variance σ γ .The second scenario is more suitable for real life ap-plications, since real life pulse sequences are usually notstrictly limited in time domain. In this case, we sample thepulse coefficients directly in frequency domain and applyinverse Discrete Fourier Transform (IDFT) to convert it backinto time domain. Without loss of generality, let the pulsesequence be bandlimited to the passband defined by Ω p =[ ω l , ω u ] , where < ω l < ω u < π are the normalizedfrequencies in radians per sample representing the lower andupper bound of the passband respectively. Defining the IDFTmatrix F − ∈ C N × N as [ F − ] n,m = N − / e j πnm/N for n, m = { , , . . . , N − } with j = − , we construct thepartial IDFT matrix F − p ∈ C N × L by taking the columns of F − corresponding to the indexes given by S +Ω p ∪ S − Ω p , where S +Ω p = { i | w l ≤ iπ/N ≤ w u , i ∈ Z } and S − Ω p = { i | w l ≤ iπ/N − π/ ≤ w u , i ∈ Z } . Here, L = |S +Ω p | = |S − Ω p | , where | · | denotes the cardinality of its argument set. Since we onlyconsider the real valued sequences, it will suffice to solve for (a) (b) (c) Fig. 3:
Illustration of different tempering approaches for 8 temperature levels, T < T < . . . < T . Usually, the lowest temperature T isset as 1, corresponding to the original distribution. (a) Simulated Annealing: A single chain with monotonically decreasing temperature levelwith deterministic cooling schedule. (b) Simulated Tempering: A single chain with stochastic temperature level shifts in both directions. (c)Parallel Tempering: Multiple chains running simultaneously and independently with random temperature level swaps. Fig. 4:
An example demonstrating the effect of tempering on the log-posterior distribution. Represented distribution is the horizontal crosssection of n - n plane shown in Fig. 2(a) where n = 5 . Irrelevantscaling factors are dropped for better visualization. only the positive frequencies. Therefore, defining the matrix Q ∈ C L × L as Q = 1 √ (cid:20) I L j I L ˜ I L − j ˜ I L (cid:21) , (9)where I L and ˜ I L are L × L dimensional identity and theexchange, i.e., row-reversed identity, matrices respectively,we can represent the extended bandlimited pulse sequence ˜ h ∈ R N in time domain as ˜ h = F − p Qγ = Aγ , where γ ∈ R L corresponds to the real and imaginary parts ofthe frequency domain coefficients associated with the positivefrequencies (the first L elements correspond the real partsand the last L elements correspond the imaginary parts).Here, it is straightforward to show that A = F − p Q is areal valued, unitary matrix, i.e., A T A = I N . Note thatthe scaling of / √ in (9) is to make sure A is unitary.The bandlimited structure of ˜ h is explicitly enforced by thesubspace A . In order to restrict most of the pulse energy to the first T samples, we let ˜ h to have a covariance matrix of Σ h = diag ( σ h , . . . , σ h T − , σ h T , . . . , σ h N − ) , where σ h i = (cid:40) σ h p , for i ∈ { , . . . , T − } σ h s , for i ∈ { T, . . . , N − } (10)with σ h s and σ h p are known and σ h s (cid:28) σ h p . Since A is aunitary matrix, the covariance matrix of γ is given by Σ γ = A T Σ h A . Hence, the prior for γ becomes p ( γ ) = 1(2 π ) N/ | Σ γ | / exp (cid:18) − γ T Σ − γ γ (cid:19) . (11)With this modeling scheme, the estimated pulse sequence willbe of length N , but the significant portion of the total energywill remain in the first T samples. Since it belongs to thesubspace defined by A , its bandlimited structure is enforcedexplicitly.
3) Prior Distribution for Noise Variance:
We model themeasurement noise v with a circularly symmetric complexGaussian law, CN ( v ; , σ v I ) , where its variance, σ v , is an-other unknown and to be estimated along with the other systemparameters. We assign Inverse-Gamma distribution to the noisevariance since it is the analytically tractable conjugate priorfor the unknown variance of Gaussian distribution. Giventhe shape parameter α v and the scale parameter β v , thedistribution has the following form p ( σ v ) = β α v v Γ( α v ) (cid:18) σ v (cid:19) α v +1 exp (cid:18) − β v σ v (cid:19) for σ v > , (12)where Γ( · ) denotes the Gamma function.Given the prior distributions for each of the variables,and assuming θ , γ and σ v are independent, the posteriordistribution has the following expression p ( θ , γ , σ v | y ) ∝ p ( y | θ , γ , σ v ) p ( θ ) p ( γ ) p ( σ v ) , (13)where we drop the irrelevant scaling factor p ( y ) . The like-lihood term has the form of circularly symmetric complexGaussian distribution p ( y | θ , γ , σ v ) = (cid:18) πσ v (cid:19) N/ exp (cid:18) − (cid:107) Y − diag ( Bγ ) X (cid:107) σ v (cid:19) (14) TABLE I:
Proposed Gibbs sampler for partially tempered posteriordistribution p ( θ , γ , σ v | y ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( θ , γ , σ v ) for agiven temperature T .Step 1. Draw σ v from p ( σ v | y , θ , γ ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( σ v ) Step 2. Draw γ from p ( γ | y , θ , σ v ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( γ ) Step 3. Draw θ from p ( θ | y , γ , σ v ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( θ ) where Y = F + y , B = F + A and (cid:107) · (cid:107) represents the (cid:96) normof a vector.The posterior distribution given in (13) is highly complex,having multimodal structure with many local maxima, asillustrated in Fig. 2, where we present 2D cross sections ofthe multidimensional log-posterior distribution. This makesthe Maximum A Posteriori (MAP) estimator a more appro-priate choice compared to, for example, the Minimum MeanSquare Error (MMSE) or the Minimum Mean Absolute Error(MMAE) estimators. Therefore, we use the MAP estimator forthe estimation of the variables, which is defined by ( θ ∗ , γ ∗ , σ ∗ v ) = arg max θ , γ ,σ v p ( θ , γ , σ v | y ) . (15)The probability space is well-defined and does not have anydiscontinuities, hence, we can employ any off-the-shelf gradi-ent ascent method for maximizing the posterior distribution.However, due to multimodality of the posterior, initializationplays a critical role on finding the global maximum. It isvery likely to get stuck on a local maximum if the initialpoint is selected poorly. Therefore, we propose a two stepsolution to (15), where we first explore the parameter spaceusing MCMC simulations to find a good initialization withhigh posterior probability, and then utilize the gradient ascentmethods hopingly to converge to the global maximum.V. P ROPOSED G IBBS S AMPLER WITH P ARALLEL T EMPERING
The MCMC simulations are widely used in complexBayesian inference problems to achieve numerical solutions.The core of the MCMC methods is the samplers, which areused to draw independent samples from a target distribution,which is the posterior distribution given in (13) in this case.The drawn samples construct a Markov Chain whose station-ary distribution eventually converges to the target distribution.These samples can then be used to approximate the statisticsof the target distribution, for example, the MMSE estimationcan be approximated by the mean average of the samplesdrawn from the posterior distribution. In this work, since weare interested in the MAP estimation, our goal is to findregions with high posterior density. Thus, we will make use ofthe effective exploration power of the MCMC simulations toidentify high probability regions. However, the multimodalityof the posterior distribution significantly reduces the efficiencyof the MCMC samplers, i.e., although the probability of jumpfrom one mode to another is not zero, it is generally smallenough, causing the sampler to get stuck on one mode of thedistribution for a long time. In order to resolve this issue, weadopt a tempering approach, i.e., Parallel Tempering, whichsubstantially improves the exploration power when combinedwith the standard MCMC samplers.
Algorithm 1:
Proposed Gibbs Sampler with PT
Input : T , T , . . . , T L Output: { ( θ (1 ,j ) , γ (1 ,j ) , σ ,j ) v ) } Jj =1 Draw σ (cid:96), v from p ( σ v ) for (cid:96) = 1 , , . . . , L Draw γ ( (cid:96), from p ( γ ) for (cid:96) = 1 , , . . . , L Draw θ ( (cid:96), from p ( θ ) for (cid:96) = 1 , , . . . , L for j = 1 to J dofor (cid:96) = 1 to L do Draw σ (cid:96),j ) v from p ( σ v | y , θ ( (cid:96),j − , γ ( (cid:96),j − ; T (cid:96) ) Draw γ ( (cid:96),j ) from p ( γ | y , θ ( (cid:96),j − , σ (cid:96),j ) v ; T (cid:96) ) Draw θ ( (cid:96),j ) from p ( θ | y , γ ( (cid:96),j ) , σ (cid:96),j ) v ; T (cid:96) ) end Draw a level (cid:96) uniformly from { , , . . . , L − } Compute acceptance probability α (cid:96) using (18) if U [0 , < α (cid:96) then Swap parameters σ (cid:96),j ) v (cid:10) σ (cid:96) +1 ,j ) v Swap parameters γ ( (cid:96),j ) (cid:10) γ ( (cid:96) +1 ,j ) Swap parameters θ ( (cid:96),j ) (cid:10) θ ( (cid:96) +1 ,j ) endend In this section, we first briefly discuss the general idea oftempering and specifically the Parallel Tempering, followedby the description of our proposed MCMC sampler. Overall,we consider three different samplers, i.e., the standard, well-known Metropolis-Hastings sampler, a Gibbs sampler whichincorporates Slice sampling for the intermediate steps, andfinally the Hamiltonian Monte Carlo approach, which is quitesuitable specifically for the cases where the posterior distribu-tion is fully differantiable.
A. Tempering Approaches for Multimodal Distributions
Consider a high dimensional target probability distribution π ( z ) , from which we aim to draw samples. When the targetdistribution π ( z ) is highly multimodal, the standard MCMCsamplers such as MH and Gibbs, or even more sophisticatedmethods like HMC, fail to explore the probability spaceefficiently, due to the low probability regions acting likebarriers in between the modes of the distribution. The mainidea of tempering is to augment the original target distribution π ( z ) with an additional temperature variable T to create thetempered distribution π ( z ; T ) = K ( T ) π ( z ) /T , where K ( T ) denotes the normalization constant. As illustrated in Fig. 4,tempering, when T > , has a flattening effect on the originaldistribution, which removes the low probability barriers be-tween the modes. Therefore, jumps between different modesbecome much more likely for the distributions with hightemperatures.The first idea for tempering is the Simulated Annealing(SA) approach, where a temperature ladder, T = 1 < T <. . . < T L , is created and the MCMC chain is initialized atthe highest temperature level T L such that it starts samplingfrom the hottest distribution π ( z ; T L ) . The temperature levelis then gradually decreased until the original distribution π ( z ) = π ( z ; 1) is reached at T = 1 . The process is illustrated Algorithm 2:
Metropolis-Hastings Sampling
Input : γ ( (cid:96),j ) , σ (cid:96),j ) v , θ ( (cid:96),j − , T (cid:96) Output: θ ( (cid:96),j ) Propose a new point ˜ θ using q (˜ θ | θ ( (cid:96),j − ) Compute acceptance probability α using (25) if U [0 , < α then Set θ ( (cid:96),j ) = ˜ θ else Set θ ( (cid:96),j ) = θ ( (cid:96),j − end in Fig. 3(a) for L = 8 different temperature levels. The numberof iterations spend on a specific temperature level, which isknown as the cooling schedule, has a critical effect on thealgorithm performance. However, it is usually problem specificand needs to be adjusted carefully.Another tempering idea is called the Simulated Tempering(ST), as shown in Fig. 3(b), which allows the MCMC chain toeither increase or decrease the temperature level in a stochasticmanner based on a specific MH acceptance criterion in orderto maintain the detailed balance. Specifically, a temperatureshift from T (cid:96) to T (cid:96) +1 is accepted with probability α (cid:96) → (cid:96) +1 ,which is defined by α (cid:96) → (cid:96) +1 = min (cid:26) , π ( z ) /T (cid:96) +1 K ( T (cid:96) +1 ) π ( z ) /T (cid:96) K ( T (cid:96) ) q (cid:96) +1 → (cid:96) q (cid:96) → (cid:96) +1 (cid:27) , (16)where q (cid:96) → (cid:96) +1 is the proposal probability for shifting from T (cid:96) to T (cid:96) +1 . However, for complex posterior distributions,calculation of the scaling factors K ( T (cid:96) ) and K ( T (cid:96) +1 ) requiresanalytically intractable integrations, limiting the applicabilityof the ST in many real life inverse problems.Different from SA and ST, the idea of Parallel Tempering(PT), as shown in Fig. 3(c), is to run multiple MCMC chainsindependently and simultaneously at each temperature levelwith stochastic temperature swaps between the neighbouringtemperature levels. Unlike ST, the target distribution in PT isa joint distribution over all chains given by (cid:81) L(cid:96) =1 π ( z ( (cid:96) ) ; T (cid:96) ) ,where z ( (cid:96) ) denotes the variables for the chain running attemperature level T (cid:96) . Therefore, the acceptance probability α (cid:96),(cid:96) +1 that maintains the detailed balance in the case of atemperature swap between the chains at T (cid:96) and T (cid:96) +1 is givenby α (cid:96),(cid:96) +1 = min (cid:26) , π ( z ( (cid:96) ) ) /T (cid:96) +1 K ( T (cid:96) +1 ) π ( z ( (cid:96) +1) ) /T (cid:96) +1 K ( T (cid:96) +1 ) × π ( z ( (cid:96) +1) ) /T (cid:96) K ( T (cid:96) ) π ( z ( (cid:96) ) ) /T (cid:96) K ( T (cid:96) ) q (cid:96) +1 ,(cid:96) q (cid:96),(cid:96) +1 (cid:27) = min (cid:26) , π ( z ( (cid:96) ) ) /T (cid:96) +1 π ( z ( (cid:96) +1) ) /T (cid:96) π ( z ( (cid:96) +1) ) /T (cid:96) +1 π ( z ( (cid:96) ) ) /T (cid:96) (cid:27) , (17)where the proposal distribution q (cid:96),(cid:96) +1 is symmetric, i.e., q (cid:96),(cid:96) +1 = q (cid:96) +1 ,(cid:96) , with q (cid:96),(cid:96) +1 = q (cid:96) → (cid:96) +1 q (cid:96) +1 → (cid:96) . Note that α (cid:96),(cid:96) +1 is independent of the scaling factors K ( T (cid:96) ) and K ( T (cid:96) +1 ) . Thus, PT eliminates the need for calculating scalingfactors, which makes it applicable for variety of real lifeinverse problems. Algorithm 3:
Slice Sampling
Input : γ ( (cid:96),j ) , σ (cid:96),j ) v , θ ( (cid:96),j − , T (cid:96) Output: θ ( (cid:96),j ) Draw η from U [0 , p ( θ ( (cid:96),j − | y , γ ( (cid:96),j ) , σ ( (cid:96),j ) v ; T (cid:96) )] Randomly position hyper-rectangle around θ ( (cid:96),j − Uniformly draw ˜ θ within hyper-rectangle while p (˜ θ | y , γ ( (cid:96),j ) , σ (cid:96),j ) v ; T (cid:96) ) < η do Shrink the hyper-rectangleUniformly draw ˜ θ within shrunk hyper-rectangle end Set θ ( (cid:96),j ) = ˜ θ B. Proposed Gibbs Sampler with Parallel Tempering
We begin with introducing the general structure of ourproposed sampler and discussing its connection to the ParallelTempering approach. We employ a Gibbs sampler scheme,which is a powerful MCMC tool for sampling from highdimensional distributions especially when the conditional pos-teriors are analytically tractable and straightforward to samplefrom. Here, note that the multimodality of the posterior ismainly due to the likelihood function given in (14). The priordistributions assigned to the pulse shape and the noise variancedo not contribute to the multimodality of the target posterior.Therefore, we follow an alternative tempering approach, wherewe partially temper the posterior distribution by applyingtempering only to the likelihood. With this approach, thechains running at high temperatures will sample from the priordistributions, instead of a flat distribution over the parameterspace. This is quite useful when the prior distributions areunimodal, which is the case for the Gaussian and Inverse-Gamma distributions.One iteration of the proposed Gibbs sampler for samplingfrom the partially tempered posterior p ( θ , γ , σ v | y ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( θ , γ , σ v ) for a given temperature T isgiven in Table I. This is a valid Gibbs sampler, whichsamples each variable at least once within one iteration.The validity of the sampler is established in Appendix Aby showing that the MH acceptance probability is always1 for each step. Here, due to our selection of conju-gate priors for σ v and γ , the partially tempered posteriorconditionals p ( σ v | y , θ , γ ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( σ v ) and p ( γ | y , θ , σ v ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( γ ) in Steps 1 and 2have well-known forms in which the sampling is straight-forward. However, the posterior conditional of the multilayermodel parameters p ( θ | y , γ , σ v ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( θ ) ,given in Step 3, is highly complex and does not have a well-known form, which prevents direct sampling of θ . Therefore,we will utilize different within-Gibbs sampling methods, in-cluding Metropolis-Hastings, Slice and Hamiltonian MonteCarlo samplers, to draw samples from p ( θ | y , γ , σ v ; T ) . Wepresent the details of the proposed within-Gibbs samplingmethods in Section V-C. We now describe how the ParallelTempering approach is incorporated with the proposed Gibbssampler, followed by the derivation of sampling distributionsfor Steps 1 and 2. Fig. 5:
Illustration of Slice sampling for two-dimensional case. Sampled parameter values for i th iteration is denoted by θ ( i ) . As the first step,a density level η i is randomly selected from U [0 , p ( θ ( i ) | y , γ , σ v ; T )] , which creates the shaded regions in I, corresponding to the parameterspace satisfying p ( θ | y , γ , σ v ; T ) ≥ η i . Then, a rectangle (or hyper-rectangle for larger dimensions) with predefined widths, w = [ w , w ] T ,is randomly positioned around θ ( i ) and a point, ˜ θ , is drawn uniformly within the rectangle, as shown in II. If the selected point is outsidethe shaded regions, i.e., p (˜ θ | y , γ , σ v ; T ) < η i , the rectangle is shrunk in both directions by keeping θ ( i ) within the rectangle. The shrinkageprocess, also known as stepping-in procedure, continues until a point within the shaded regions is selected, as shown in III. Once such apoint is selected, it is assigned as the next sample θ ( i +1) and a new level η i +1 is drawn from U [0 , p ( θ ( i +1) | y , γ , σ v ; T )] , which updatesthe shaded regions as shown in IV. Considering a Parallel Tempering scheme with L tem-perature levels, each MCMC chain samples from a specificpartially tempered version of the posterior distribution, i.e.,the chain at level T (cid:96) samples from p ( θ , γ , σ v | y ; T (cid:96) ) ∝ p ( y | θ , γ , σ v ) /T (cid:96) p ( θ , γ , σ v ) for (cid:96) = 1 , , . . . , L . After oneiteration of the Gibbs sampler is completed at all chains, aparameter exchange between the neighbouring levels, say, T (cid:96) and T (cid:96) +1 , is proposed, where (cid:96) is randomly selected from theuniformly distributed proposal distribution q (cid:96) = 1 / ( L − for (cid:96) ∈ { , , . . . , L − } . The proposal is accepted with thefollowing acceptance probability α (cid:96) = min (cid:26) , p ( y | θ ( (cid:96),j ) , γ ( (cid:96),j ) , σ (cid:96),j ) v ) /T (cid:96) +1 − /T (cid:96) p ( y | θ ( (cid:96) +1 ,j ) , γ ( (cid:96) +1 ,j ) , σ (cid:96) +1 ,j ) v ) /T (cid:96) +1 − /T (cid:96) (cid:27) , (18)where ( θ ( (cid:96),j ) , γ ( (cid:96),j ) , σ (cid:96),j ) v ) and ( θ ( (cid:96) +1 ,j ) , γ ( (cid:96) +1 ,j ) , σ (cid:96) +1 ,j ) v ) represent the current parameter values at j th MCMC iterationwhich are to be exchanged between the chains running atlevel T (cid:96) and T (cid:96) +1 respectively (See Appendix B for deriva-tion of the acceptance probability). Therefore, one completeMCMC cycle consists of L regular Gibbs sampling stages,followed by a single parameter exchange step. Each cycle j produces a new set of samples for each temperature level, { ( θ ( (cid:96),j ) , γ ( (cid:96),j ) , σ (cid:96),j ) v ) } L(cid:96) =1 , but in the end, we are only inter-ested in the samples generated at the first level, T = 1 , whichcorresponds to the original posterior distribution. We providea more detailed description of the sampler in Algorithm 1.Next, we present the sampling distributions for the first twosteps of our sampler, associated with each temperature level.The derivations are provided in Appendix C.
1) Sampling Distribution for Step 1:
The partially temperedposterior conditional distribution for the noise variance σ v fora given temperature level T is given by p ( σ v | y , θ , γ ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( σ v ) ∝ (cid:18) σ v (cid:19) ˜ α v +1 exp (cid:18) − ˜ β v σ v (cid:19) , (19)which is an Inverse-Gamma distribution, after proper nor-malization, with ˜ α v = α v + N/ T and ˜ β v = β v + (cid:107) Y − diag ( Bγ ) X (cid:107) /T . Sampling from (19) is straightforward dueto its well-known form. Note that as T → ∞ , we have ˜ α v → α v and ˜ β v → β v , which corresponds to the priordistribution given in (12).
2) Sampling Distribution for Step 2:
This step requires thepartially tempered posterior conditional of the pulse coefficient γ for a given temperature level T . With proper normalization,the distribution has the form of a multivariate Gaussian law,i.e., p ( γ | y , θ , σ v ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( γ ) ∝ exp (cid:18) −
12 ( γ − ˜ µ γ ) T ˜ Σ − γ ( γ − ˜ µ γ ) (cid:19) , (20)where the mean ˜ µ γ and covariance ˜ Σ γ is given by ˜ µ γ = 2 T σ v ˜ Σ γ (cid:60){ D H Y } , ˜ Σ γ = (cid:18) T σ v (cid:60){ D H D } + Σ − γ (cid:19) − , (21)with D = diag ( X ) B and (cid:60){·} denoting the real part of itsargument. Due to its well-known Gaussian form, samplingfrom (20) is straightforward. Similar to Step 1, as T → ∞ ,the distribution converges to the prior distribution given in (11)since ˜ µ γ → and ˜ Σ γ → Σ γ . C. Sampling Multilayer Model Parameters
The multidimensional sampling distribution for the multi-layer model parameters θ does not have a well-known formthat enables direct sampling. Therefore, we construct a hierar-chical sampling scheme that incorporates different samplingapproaches for Step 3 in Table I. Note that the samplingdistribution in Step 3 corresponds to the tempered likelihoodfunction over Λ θ since p ( θ ) is uniform over Λ θ , i.e., p ( θ | y , γ , σ v ; T ) ∝ p ( y | θ , γ , σ v ) /T for θ ∈ Λ θ . (22)The tempering approach helps resolving the multimodalityissue of the likelihood, but the selected sampling scheme (a) (b) Fig. 6:
Illustration of reflective HMC for two-dimensional case.Shaded regions represent outside of the boundaries. (a) demonstratesthe case where original trajectory violates the boundary of only onedimension. (b) demonstrate the case where original trajectory violatesthe boundaries of both dimensions. also plays an important role on performance enhancement.Therefore, we consider three different sampling schemes andprovide a comparison of their performance. We begin with thewell-known, standard MH sampler, which . We then considera Slice sampling approach
1) Metropolis-Hastings Sampler:
The first sampling ap-proach we consider is the MH sampler, in which a candidatepoint ˜ θ in the probability space Λ θ is proposed using a specificproposal distribution q (˜ θ | θ ) and the proposed point is eitheraccepted or rejected based on the MH acceptance criterion.Since the parameter space Λ θ is bounded from both belowand above, we use independent Beta distributions for eachdimension, yielding the following joint proposal distribution q (˜ θ | θ ) = M (cid:89) i =1 B ( a i , b i ) ¯ θ a i − i (1 − ¯ θ i ) b i − , (23)where ¯ θ i = (˜ θ i − θ i, min ) / ( θ i, max − θ i, min ) is the scaled parameterand B ( · ) denotes the Beta function. The parameters a i and b i are specified in terms of the mode λ i and concentration κ as a i = λ i ( κ −
2) + 1 ,b i = (1 − λ i )( κ −
2) + 1 , (24)where the mode is set as λ i = ( θ i − θ i, min ) / ( θ i, max − θ i, min ) . Weuse a fixed κ for all dimensions, which is a hyper-parameterthat controls the acceptance ratio of the proposals. Whilesmaller κ causes consecutive samples to be statistically depen-dent with high acceptance ratio, higher κ values lead to moreindependent samples with a significantly reduced acceptanceratio. Therefore, κ needs to be tuned accordingly to achieve thebest trade-off. In section V-D, we provide a dynamic updatescheme for κ to maintain a desired acceptance ratio. We notethat each temperature level has its own concentration and thecorresponding acceptance probability is given by α = min (cid:26) , p ( y | ˜ θ , γ , σ v ) /T p ( y | θ , γ , σ v ) /T q ( θ | ˜ θ ) q (˜ θ | θ ) (cid:27) , (25)where the proposal distribution is calculated based on theassociated concentration for the given temperature level T .
2) Slice Sampling:
Another widely used method for within-Gibbs sampling is the Slice sampling approach. It is applicableto both univariate and multivariate cases when the target
Algorithm 4:
Reflective Hamiltonian Monte Carlo
Input : γ ( (cid:96),j ) , σ (cid:96),j ) v , θ ( (cid:96),j − , T (cid:96) Output: θ ( (cid:96),j ) Draw momentum z from N ( z ; , M − ) Set ˜ θ = θ ( (cid:96),j − for k = 1 to ∆ do z = z − . (cid:15) ∇ θ U (˜ θ ; T (cid:96) )˜ θ = ˜ θ + (cid:15) M z if ˜ θ / ∈ Λ θ then ˜ θ = ˜ θ − (cid:15) M zz = z + 0 . (cid:15) ∇ θ U (˜ θ ; T (cid:96) ) z i = − z i , ∀ i s.t. ˜ θ i / ∈ [ θ i, min , θ i, max ] else z = z − . (cid:15) ∇ θ U (˜ θ ; T (cid:96) ) endend Compute acceptance probability α using (30) if U [0 , < α then Set θ ( (cid:96),j ) = ˜ θ else Set θ ( (cid:96),j ) = θ ( (cid:96),j − end distribution can be calculated up to a scale. We providean illustration, along with a brief description, of the Slicesampling in Fig. 5 for a two-dimensional case, which can beeasily generalized to higher dimensions. Here, we directly em-ploy the multi-dimensional setting, which is based on hyper-rectangles, instead of sampling each variable in turn in a one-dimensional setting. To simplify the algorithm, we skip thestepping-out procedure and set the widths of hyper-rectangleequal to the range of parameters, i.e. w = [ w , w , . . . , w M ] T with w i = θ i, max − θ i, min .
3) Hamiltonian Monte Carlo:
The last sampling methodwe consider is HMC, which utilizes geometry of the targetdistribution to eliminate the random walk behaviour of MHby enabling longer jumps in parameter space with high accep-tance rate. It is based on an analogy with physical systems, inwhich the target distribution is translated to a potential energyfunction, where the parameters of interest, θ , are regarded asposition variables. An augmented state-space is created by in-troducing momentum variables, denoted by z , representing therate of change of the position variables. Defining the temperedpotential energy function as U ( θ ; T ) = − log p ( y | θ , γ , σ v ; T ) and the kinetic energy function as K ( z ) = z T M z , where M is a diagonal matrix consisting of masses m i associatedwith each variable in its diagonal, total energy of the systemat a given state ( θ , z ) at temperature T is given by theHamiltonian H ( θ , z ; T ) = U ( θ ; T ) + K ( z ) . The masses m i are used to balance the different scales of the parameters andset as m i = ( θ i, max − θ i, min ) as suggested in [].HMC is used to sample ( θ , z ) pairs jointly from thefollowing canonical distribution P ( θ , z ; T ) ∝ exp (cid:0) − H ( θ , z ; T ) (cid:1) , (26) Fig. 7:
The evolution of acceptance ratios (top) and (cid:15) (cid:96) values (bottom) using dynamic update model with 12 geometrically spaced temperaturelevels between T = 1 and T = 10 . The target acceptance ratio is 0.6 for each temperature level. at a given temperature level T . The sampling is achievedby exploiting the Hamiltonian dynamics, which govern theevolution of the system in continuous time: d θ dt = ∇ z H ( θ , z ; T ) , (27) d z dt = −∇ θ H ( θ , z ; T ) , (28)where ∇ θ and ∇ z denotes the gradient operators with respectto θ and z respectively. When simulated exactly for a finiteamount of time τ , (27) and (28) produce new state variables ( θ τ , z τ ) , with ( θ , z ) being the initial state. Note thatvalue of H ( θ , z ; T ) does not change during the simulationdue to conservation of Hamiltonian, i.e., H ( θ τ , z τ ; T ) = H ( θ , z ; T ) . Hence, the MH acceptance probability for theproposed state (˜ θ , ˜ z ) = ( θ τ , z τ ) is always 1, regardless ofthe simulation duration τ . This enables making very largechanges to θ quite efficiently. However, since the value of H ( θ , z ; T ) is preserved, evolution under Hamiltonian dynam-ics only produces samples having the same level of probabilitydensity. In order to achieve an ergodic sampling process,value of the Hamiltonian needs to be altered, which canbe achieved by sampling the momentum variable from itsposterior conditional, which is equivalent to its prior distri-bution due to statistical independence of θ and z . Therefore,a new momentum state is sampled from N ( z ; , M − ) beforesimulating the Hamiltonian dynamics. Another problem withthis is that exact simulation requires analytical integration ofboth (27) and (28), which is usually not possible in practice,but can be approximated by a numerical integration scheme.The most commonly used method is the leapfrog algorithm, which consists of alternating discretized updates to θ and z : z (cid:15)/ = z − (cid:15) ∇ θ U ( θ ; T ) , θ (cid:15) = θ + (cid:15) M z (cid:15)/ , z (cid:15) = z (cid:15)/ − (cid:15) ∇ θ U ( θ (cid:15) ; T ) . (29)One iteration of the leapfrog algorithm simulates the dynamicsfor a time interval (cid:15) , which is the predefined step size ofthe algorithm. In order to simulate for a duration of τ , theprocess is repeated for ∆ = τ /(cid:15) times. Although the leapfrogalgorithm provides quite accurate approximation of the con-tinuous time integration, some residual error will remain dueto discretization, which might alter the value of Hamiltonian.In order to maintain detailed balance, the proposed state isaccepted with probability α = min (cid:26) , exp( − H (˜ θ , ˜ z ; T ))exp( − H ( θ , z ; T )) (cid:27) . (30)In order to have symmetric proposal distribution, either themomentum variables are negated after completing the leapfrogalgorithm or the step size is negated with probability . beforestarting the leapfrog iterations. Here, (cid:15) and ∆ are the hyper-parameters of the HMC sampling scheme, affecting the overallperformance. In general, higher (cid:15) causes high residual errorleading to low acceptance rate. On the other hand, selectinga too small (cid:15) will require large number of steps ∆ to achievelong jumps, which increases the computational load. Hence,both parameters need to be tuned for the best trade-off. Similarto the concentration parameter of the proposal distribution ofMH sampler, the step size is distinct for different temperaturelevels and a dynamic update scheme for the step size (cid:15) , for afixed ∆ , is given in section V-D. Fig. 8:
The evolution of acceptance ratios (top) and κ (cid:96) values (bottom) using dynamic update model with 12 geometrically spaced temperaturelevels between T = 1 and T = 10 . The target acceptance ratio is 0.6 for each temperature level. HMC is conventionally used for sampling from smooth andunbounded distributions. For bounded parameter spaces, as wehave with Λ θ , a modified reflective HMC can be used, wherethe trajectory on the parameter space is bounced back whenit is blocked by a boundary. Specifically, if θ i / ∈ [ θ i, min , θ i, max ] after completing one step of the leapfrog algorithm, we undothe previous step, negate the i th momentum variable, i.e., z (cid:48) i = − z i , and then complete the remaining steps usingthe updated momentum vector. If multiple boundaries areviolated simultaneously, all of the corresponding momentumvariables are negated. In Fig. 6, we demonstrate the employedreflection method for a two-dimensional case. This methodof reflection leaves the Hamiltonian invariant, since negationdoes not change the value of kinetic energy function, i.e., K ( z (cid:48) ) = K ( z ) . Moreover, the acceptance probability givenin (30) remains valid, since the proposal distribution is stillsymmetric.Before closing this section, we provide the analytical ex-pression for the gradient of potential energy function, whichis required to calculate update equations in (29). First, notethat U ( θ ; T ) = (cid:107) Y − diag ( Bγ ) X (cid:107) /T σ v , where the onlyterm that depends on θ is X . Following the derivation givenin Appendix D, we achieve ∇ θ U ( θ ) = 2 T σ v (cid:60) (cid:8)(cid:2) X H D H D − Y H D (cid:3) ∇ θ X (cid:9) , (31)where D = diag ( Bγ ) and the gradient of X is defined as ∇ θ X = (cid:2) ∇ θ X ( ω ) , ∇ θ X ( ω ) , . . . , ∇ θ X ( ω N/ − ) (cid:3) T . (32) The individual gradient term ∇ θ X ( ω i ) has the following form ∇ θ X ( ω i ) = (cid:20) ∂X ( ω i ) ∂ε , . . . , ∂X ( ω i ) ∂ε M , ∂X ( ω i ) ∂σ , . . . ,∂X ( ω i ) ∂σ M , ∂X ( ω i ) ∂d , . . . , ∂X ( ω i ) ∂d M − (cid:21) T (33)for i = 0 , , . . . , N/ − . Exact expression for each elementof ∇ θ X ( ω i ) is also provided in Appendix D. D. Dynamic Parameter Update
The concentration κ of the proposal distribution, in the caseof MH sampler, and the step size (cid:15) of the leapfrog algorithm,in the case of HMC sampler, significantly affect the efficiencyof those samplers. We first note that curvature of the targetdistribution is substantially different at distinct temperaturelevels as shown in Fig. 4, hence, distinct parameters, κ (cid:96) and (cid:15) (cid:96) , are needed for each chain (cid:96) . Moreover, the curvature variessignificantly even for the same temperature level when thechain is exploring different modes of the target distribution.Therefore, selecting a constant κ (cid:96) or (cid:15) (cid:96) usually results ininefficient exploration of the parameter space. In order toaddress these issues, in this section, we provide dynamicmodels for both parameters where we periodically update themto maintain a predetermined acceptance ratio ξ based on thecurrent empirical acceptance ratios. The effect of changes on κ (cid:96) or (cid:15) (cid:96) can only be observed in the proceeding iterations.Therefore, we update the parameters after every J iterationsbased on the empirical acceptance ratio ˆ ξ ( j ) (cid:96) measured bythe ratio of the total accepted proposals between iterations ( j − J + 1) and j to the duration J . We employ a pro-portional controller approach and use the difference between Fig. 9:
The evolution of swap ratios (top) and temperature levels (bottom) using the adaptive temperature adjustment model. L = 32 differenttemperatures are initialized at geometrically spaced levels between T = 1 and T = 10 . The lowest and highest temperature levels arefixed at T = 1 and T = ∞ . The swap ratios are expected to converge to between each level converges to a same level. the target and empirically measured acceptance ratios, i.e., e ( j ) (cid:96) = ξ − ˆ ξ ( j ) (cid:96) , as the model feedback. Hence, the dynamicmodels are described by the following update equations: κ ( j +1) (cid:96) = exp (cid:0) log( κ ( j ) (cid:96) ) + e ( j ) (cid:96) K κ J ( j ) (cid:1) ,(cid:15) ( j +1) (cid:96) = exp (cid:0) log( (cid:15) ( j ) (cid:96) ) − e ( j ) (cid:96) K (cid:15) J ( j ) (cid:1) , (34)where we perform the updates on the logarithm of parametersto level out scale differences and use the same constant gains K κ and K (cid:15) for all temperature levels. Here, note that theminus sign at the bottom equation is due to the negativecorrelation between (cid:15) and acceptance ratio, i.e., the acceptanceratio increases as (cid:15) decreases. Also note that J ( j ) refers tothe indicator function defined as J ( j ) = 1 if j mod J = 0 and J ( j ) = 0 otherwise.In Fig. 8 and 7, we illustrate the evolution of acceptanceratios as well as the parameter values for κ and (cid:15) respectivelyusing L = 12 chains. The temperature levels are geometricallyspaced between T = 1 and T = 10 . We set J (cid:48) = 100 , K κ = 2 , K (cid:15) = 0 . and initialize the parameters as κ (cid:96) = 10 and (cid:15) (cid:96) = 10 − . The target acceptance ratio is set as ξ = 0 . for all chains. E. Adaptive Temperature Selection
For parallel tempering methods, selection of the temper-ature ladder T < . . . < T L has a substantial effect onthe overall sampling performance. The general practice is toset T = 1 to sample from the original target distributionand T L sufficiently high to explore all the modes. Thereexist different point of views to optimize the structure ofthe temperature ladder. In this work, we assume that thetotal number of temperatures is fixed and determined by the available computational budget and optimize the spacing oftemperature levels, in order to improve the overall samplingefficiency. It has been shown in the literature that the optimalstrategy, which maximizes the mean-square displacement ofthe system, is to construct the temperature ladder in a way thatthe swap ratio is approximately 0.23 for adjacent levels [41].Therefore, given T and the number of levels L , our goal is tofind out the temperature spacing that approximately gives theacceptance ratio of 0.23 for adjacent levels. In this section, weprovide an adaptive temperature selection scheme that adjuststhe temperature levels until the target swap ratio is achievedat each level. Consider an intermediate temperature ladderconfiguration { T ( j ) (cid:96) } L(cid:96) =1 at j th MCMC iteration. Similar tothe update schemes of κ and (cid:15) , we perform the updates afterevery J iterations based on the empirical swap ratio s ( j ) (cid:96) ,which is calculated by the ratio of the total accepted swaps tothe total proposed swaps between chains (cid:96) and (cid:96) + 1 duringthe iterations ( j − J + 1) and j . In order to maintain theorder, i.e., T < . . . < T L , and level out scaling of differenttemperature levels, we perform the updates on the logarithmof their difference as T ( j +1)∆ (cid:96) = T ( j )∆ (cid:96) − e ( j ) (cid:96) K T J ( j ) (35)where T ( j )∆ (cid:96) = log (cid:0) T ( j ) (cid:96) +1 − T ( j ) (cid:96) (cid:1) , e ( j ) (cid:96) = 0 . − s ( j ) (cid:96) and K T isthe controller gain. The initial configuration is L geometricallyspaced temperature levels between T and a rough estimate ofthe maximum level T max . Here, we note that any adjustmenton the temperature levels during sampling process, includingthe dynamic scheme discussed above, violates the detailedbalance. Therefore, selection of the temperatures are finalizedwithin the burn-in period and a fixed temperature configuration Fig. 10:
Recovery results based on the measurement given in top left figure with 40dB SNR. The actual and recovered reflectivity profilesare given in top right and bottom right figures.
Fig. 11:
Recovery results for different SNR levels. is used afterwards. VI. S
IMULATIONS
A. Recovery Results on Synthetic Measurements
As the first part of the experiments, we represent therecovery results of the proposed methods on synthetic mea-surements. The measurement sequences are created usingthe circular convolution model given in (5). The reflectivityprofiles are calculated using the 1D multilayer propagationmodel given in (1). The transmitted waveform used in the experiments is selected as a bandlimited Gaussian modulatedsinusoidal pulse with center frequency f c = 4 GHz andfractional bandwidth of . , which is represented in Fig. 10(bottom left) with solid line.We first consider the case where the number of layers inthe actual underlying reflectivity profile matches the numberof layers used for model fitting. The measurement sequence { y n } n =0 generated for this experiment is based on a 4-layerstructure and represented in Fig. 10 (top left). The underlyingrelative permittivity and conductivity profiles are illustrated in Fig. 12:
Unbiased Cramer-Rao Lower Bound for each parameter at different SNR levels ranging from 20dB (top curves) to 60dB (bottomcurves) with 10dB increments. The actual values of the parameters are indicated with dashed vertical lines.
Fig. 10 (top right and bottom right figures respectively) as wellwith solid black lines. With a sampling frequency of f s = 144 GHz, total duration of the measurement corresponds to . ns, which is sufficient to capture all significant reflections. Thenoise variance σ v is adjusted to achieve different levels ofSignal-to-Noise ratio (SNR) ranging between dB to dB. TABLE II:
Comparison of the log posteriors and the residualerrors (data misfit) achieved by the actual and estimated parametersfor different SNR levels. The estimated parameters provide higherposterior with better fit to the measurement compared to actualparameters used to generate the measurement.Log Posterior
Residual Sum of Squares
As an illustrative example, in Fig. 10, we represent therecovery results for the relative permittivity and conductiv-ity profiles as well as the transmitted waveform using themeasurement with dB SNR. We also represent how thereconstructed measurements fit to the actual one. The resultswere obtained after running the simulations × iterationsfor each sampler. The model parameters are selected as σ γ = 10 , α v = 1 , β v = 1 , which constitute nearly non-informative priors for the pulse sequence and noise variance.The subspace matrix A for the pulse sequence is constructedby the first 11 length- DPS sequences, which span thefrequency range between to GHz. The lower and upperbounds of the parameter space are specified as ε min = 2 , ε max = 60 , σ min = 5 × − , σ max = 2 , d min = 2 × − and d max = 3 × − . For parallel tempering, a total of L = 32 different temperature levels are employed, which areinitialized at geometrically spaced points in between T = 1 and T = 10 . For MH and HMC samplers, the concentration κ (cid:96) and step size (cid:15) (cid:96) are initialized at and − respectivelyand the target acceptance ratios are set as . for eachtemperature level with K κ = 2 and K ε = 0 . . B. Cramer-Rao Lower Bound
Let us denote all the parameters except the noise varianceas ¯ θ = ( θ , γ ) and denote the noise-free signal as s , i.e., s = diag ( F T h ) x . In our measurement model, s is corrupted bywhite Gaussian noise v . For a given noise variance σ v , thelog-likelihood is given by log p ( y | ¯ θ ) = − N log( πσ v ) / − σ v N/ (cid:88) n =1 | y n − s n | . (36) For multivariate case, the Fisher information matrix I (¯ θ ) isgiven by [ I (¯ θ )] i,j = − E (cid:20) ∂ log p ( y | ¯ θ ) ∂ ¯ θ i ∂ ¯ θ j (cid:21) , (37)where ∂ log p ( y | ¯ θ ) ∂ ¯ θ i = 2 σ v N/ (cid:88) n =1 (cid:60) (cid:26) ( y n − s n ) ∗ ∂s n ∂ ¯ θ i (cid:27) ∂ log p ( y | ¯ θ ) ∂ ¯ θ i ∂ ¯ θ j = 2 σ v N/ (cid:88) n =1 (cid:60) (cid:26) − ∂s ∗ n ∂ ¯ θ j ∂s n ∂ ¯ θ i + ( y n − s n ) ∗ ∂ s n ∂ ¯ θ i ∂ ¯ θ j (cid:27) (38)Since E [ y ∗ n ] = s ∗ n , we have [ I (¯ θ )] i,j = 2 σ v N/ (cid:88) n =1 (cid:60) (cid:26) ∂s ∗ n ∂ ¯ θ j ∂s n ∂ ¯ θ i (cid:27) . (39)Therefore, the covariance matrix of the estimator C ˆ θ satisfies C ˆ θ − I − (¯ θ ) (cid:60) , (40)which implies Var (ˆ θ i ) = [ C ˆ θ ] i,i ≥ [ I − (¯ θ )] i,i . (41) A PPENDIX AV ALIDATION OF THE G IBBS S AMPLER FOR P ARTIALLY T EMPERED P OSTERIOR
The proposed Gibbs sampler given in Table I is validatedby showing that the MH acceptance probability is 1 regardlessof the proposed values for each step, when the proposal distri-butions are selected as those given in Table I. Given the targetdistribution p ( θ , γ , σ v | y ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( θ , γ , σ v ) and the proposal distribution p ( y | θ , γ , σ v ) /T p ( θ ) for θ (thederivation generalizes to the other variables as well, hence, weonly consider θ here), the MH acceptance probability is givenby α = min (cid:26) , p ( θ (cid:48) , γ , σ v | y ; T ) p ( θ , γ , σ v | y ; T ) p ( y | θ , γ , σ v ) /T p ( θ ) p ( y | θ (cid:48) , γ , σ v ) /T p ( θ (cid:48) ) (cid:27) = min (cid:26) , p ( y | θ (cid:48) , γ , σ v ) /T p ( θ (cid:48) , γ , σ v ) p ( y | θ , γ , σ v ) /T p ( θ , γ , σ v ) × p ( y | θ , γ , σ v ) /T p ( θ ) p ( y | θ (cid:48) , γ , σ v ) /T p ( θ (cid:48) ) (cid:27) = min (cid:26) , p ( θ (cid:48) ) p ( γ ) p ( σ v ) p ( θ ) p ( γ ) p ( σ v ) p ( θ ) p ( θ (cid:48) ) (cid:27) = 1 . (42)where θ (cid:48) denotes the proposed value of θ , which is obtainedby sampling from the proposal distribution. This holds forthe Step 1 and 2 as well, hence, the proposed sampler is avalid Gibbs sampler that draws each variable exactly once withacceptance probability of 1.A PPENDIX BA CCEPTANCE P ROBABILITY FOR E XCHANGE P ROPOSALS
At a given MCMC iteration j , the value of the joint targetdistribution under a PT scheme with L temperature levels isgiven by p ( { z ( (cid:96),j ) } L(cid:96) =1 | y ) = L (cid:89) (cid:96) =1 p ( θ ( (cid:96),j ) , γ ( (cid:96),j ) , σ (cid:96),j ) v | y ; T (cid:96) ) K ( T (cid:96) ) , (43)where z ( (cid:96),j ) = ( θ ( (cid:96),j ) , γ ( (cid:96),j ) , σ (cid:96),j ) v ) denotes the set off allvariables at (cid:96) th chain and j th iteration, and K ( T (cid:96) ) denotesthe scaling factor associated with temperature level T (cid:96) . Whena parameter exchange between the temperature levels T i and T i +1 is proposed, such that the new parameter set ˜ z ( (cid:96),j ) becomes ˜ z ( (cid:96),j ) = z ( i +1 ,j ) , if (cid:96) = i z ( i,j ) , if (cid:96) = i + 1 z ( (cid:96),j ) , otherwise , (44)the value of the joint posterior distribution is updated as p ( { ˜ z ( (cid:96),j ) } L(cid:96) =1 | y ) = p ( θ ( i +1 ,j ) , γ ( i +1 ,j ) , σ i +1 ,j ) v | y ; T i ) K ( T i ) × p ( θ ( i,j ) , γ ( i,j ) , σ i,j ) v | y ; T i +1 ) K ( T i +1 ) × L (cid:89) (cid:96) =1 (cid:96) (cid:54) = i,i +1 p ( θ ( (cid:96),j ) , γ ( (cid:96),j ) , σ (cid:96),j ) v | y ; T (cid:96) ) K ( T (cid:96) ) . (45) Based on the MH criterion with symmetric proposal as givenin (17), we achieve α i = min (cid:26) , p ( { ˜ z ( (cid:96),j ) } L(cid:96) =1 | y ) p ( { z ( (cid:96),j ) } L(cid:96) =1 | y ) (cid:27) = min (cid:26) , p ( θ ( i +1 ,j ) , γ ( i +1 ,j ) , σ i +1 ,j ) v | y ; T i ) K ( T i ) p ( θ ( i,j ) , γ ( i,j ) , σ i,j ) v | y ; T i ) K ( T i ) × p ( θ ( i,j ) , γ ( i,j ) , σ i,j ) v | y ; T i +1 ) K ( T i +1 ) p ( θ ( i +1 ,j ) , γ ( i +1 ,j ) , σ i +1 ,j ) v | y ; T i +1 ) K ( T i +1 ) (cid:27) = min (cid:26) , p ( y | θ ( i +1 ,j ) , γ ( i +1 ,j ) , σ i +1 ,j ) v ) /T i p ( y | θ ( i,j ) , γ ( i,j ) , σ i,j ) v ) /T i × p ( θ ( i +1 ,j ) , γ ( i +1 ,j ) , σ i +1 ,j ) v ) p ( θ ( i,j ) , γ ( i,j ) , σ i,j ) v ) p ( θ ( i,j ) , γ ( i,j ) , σ i,j ) v ) p ( θ ( i +1 ,j ) , γ ( i +1 ,j ) , σ i +1 ,j ) v ) × p ( y | θ ( i,j ) , γ ( i,j ) , σ i,j ) v ) /T i +1 p ( y | θ ( i +1 ,j ) , γ ( i +1 ,j ) , σ i +1 ,j ) v ) /T i +1 (cid:27) = min (cid:26) , p ( y | θ ( i,j ) , γ ( i,j ) , σ i,j ) v ) /T i +1 − /T i p ( y | θ ( i +1 ,j ) , γ ( i +1 ,j ) , σ i +1 ,j ) v ) /T i +1 − /T i (cid:27) , (46)which is the acceptance probability given in (18).A PPENDIX CD ERIVATION OF S AMPLING D ISTRIBUTIONS
In this appendix, we derive the sampling distributions forthe first two steps of our proposed Gibbs sampler. The originalposterior conditional for σ v is given by p ( σ v | y , θ , γ ) = p ( θ , γ , σ v | y ) p ( θ , γ | y )= p ( y | θ , γ , σ v ) p ( θ ) p ( γ ) p ( σ v ) p ( y ) p ( θ , γ | y ) ∝ p ( y | θ , γ , σ v ) p ( σ v ) (47)where we use the Bayes’ theorem and the independenceof variables in second line and dropped all irrelevant scal-ing factors in the last line. The partially tempered pos-terior conditional is obtained by tempering the likelihoodexpression, i.e., p ( σ v | y , θ , γ ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( σ v ) .Similarly, the partially tempered posterior conditionals forthe pulse coefficients and the multilayer model parametersare given by p ( γ | y , θ , σ v ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( γ ) and p ( θ | y , γ , σ v ; T ) ∝ p ( y | θ , γ , σ v ) /T p ( θ ) respectively. It isshown in Appendix A that the sampler using these as itsproposal distributions is a valid Gibbs sampler.For the noise variance, inserting the prior distribution givenin (12) and the likelihood function given in (14) into the definition of p ( σ v | y , θ , γ ; T ) yields p ( σ v | y , θ , γ ; T ) ∝ (cid:18) σ v (cid:19) N T exp (cid:18) − (cid:107) Y − diag ( Bγ ) X (cid:107) T σ v (cid:19) × (cid:18) σ v (cid:19) α v +1 exp (cid:18) − β v σ v (cid:19) = (cid:18) σ v (cid:19) N T + α v +1 exp (cid:18) − (cid:107) Y − diag ( Bγ ) X (cid:107) /T + β v σ v (cid:19) = (cid:18) σ v (cid:19) ˜ α v +1 exp (cid:18) − ˜ β v σ v (cid:19) , (48)where ˜ α v = α v + N/ T and ˜ β v = (cid:107) Y − diag ( Bγ ) X (cid:107) /T + β v . This is the sampling distribution given in (19) .The pulse coefficient vector γ is sampled from p ( γ | y , θ , σ v ; T ) , which is calculated by using the priorand likelihood expressions given in (11) and (14) respectively, p ( γ | y , θ , σ v ; T ) ∝ exp (cid:18) − (cid:107) Y − Dγ (cid:107) T σ v (cid:19) exp (cid:18) − γ T Σ − γ γ (cid:19) ∝ exp (cid:18) Re { Y H D } γ − γ T Re { D H D } γ T σ v − γ T Σ − γ γ (cid:19) = exp (cid:18) T σ v Re { Y H D } γ − γ T (cid:20) T σ v Re { D H D } + Σ − γ (cid:21) γ (cid:19) = exp (cid:18) ˜ µ Tγ ˜ Σ − γ γ − γ T ˜ Σ − γ γ −
12 ˜ µ Tγ ˜ Σ − γ ˜ µ γ (cid:19) × exp (cid:18)
12 ˜ µ Tγ ˜ Σ − γ ˜ µ γ (cid:19) ∝ exp (cid:18) −
12 ( γ − ˜ µ γ ) T ˜ Σ − γ ( γ − ˜ µ γ ) (cid:19) (49)where we define D = diag ( X ) B and use the fact that γ isreal valued to arrive at the second line. Here, Re {·} denotesthe real part of its argument. The mean vector ˜ µ γ and thecovariance matrix ˜ Σ γ have the following definitions ˜ µ γ = 2 T σ v ˜ Σ γ Re { D H Y } , ˜ Σ γ = (cid:18) T σ v Re { D H D } + Σ − γ (cid:19) − , (50)which completes the derivation of the sampling distributiongiven in (20) and (21). A PPENDIX DG RADIENT OF P OTENTIAL E NERGY F UNCTION
The expression given in (31) for the gradient of potentialfunction U ( θ ) is found by following the steps below ∇ θ U ( θ ) = ∇ θ (cid:20) σ vN/ − (cid:88) i =0 ( Y i − D i X ( ω i )) ∗ ( Y i − D i X ( ω i )) (cid:21) = ∇ θ (cid:20) σ vN/ − (cid:88) i =0 − Y ∗ i D i X ( ω i ) − Y i D ∗ i X ∗ ( ω i )+ X ∗ ( ω i ) D ∗ i D i X ( ω i ) (cid:21) = 2 σ vN/ − (cid:88) i =0 (cid:60) (cid:8)(cid:2) X ∗ ( ω i ) D ∗ i D i − Y ∗ i D i (cid:3) ∇ θ X ( ω i ) (cid:9) = 2 σ v (cid:60) (cid:8)(cid:2) X H D H D − Y H D (cid:3) ∇ θ X (cid:9) , (51)where we defined D = diag ( B γ ) with D i = [ D ] i,i and usedthe fact that (cid:60) (cid:8) ∇ θ X ∗ ( ω i ) (cid:9) = (cid:2) (cid:60) (cid:8) ∇ θ X ( ω i ) (cid:9)(cid:3) ∗ to arrive atthe last line.We also provide the expressions for partial derivatives in(33) by employing recursive derivations on the multilayer re-flection model given in (1). Starting with the relative permittiv-ities ε k for k = 1 , , . . . , M , we first note that ∂X ( ω ) /∂ε k = e − α d e − j β d ∂X ( ω ) /∂ε k and ∂X (cid:96) ( ω ) /∂ε k = 0 if k <(cid:96) − . For k ≥ (cid:96) − , we have the following recursive expression ∂X (cid:96) ( ω ) ∂ε k = C (cid:96) (cid:20)(cid:0) − X (cid:96) +1 ρ (cid:96) ψ (cid:96) (cid:1) ∂r (cid:96) ∂ε k + (cid:0) − r (cid:96) (cid:1) × (cid:18) ρ (cid:96) ψ (cid:96) ∂X (cid:96) +1 ∂ε k + X (cid:96) +1 ψ (cid:96) ∂ρ (cid:96) ∂ε k + X (cid:96) +1 ρ (cid:96) ∂ψ (cid:96) ∂ε k (cid:19)(cid:21) , (52)where we set C (cid:96) = (cid:0) r (cid:96) X (cid:96) +1 ρ (cid:96) ψ (cid:96) (cid:1) − , ρ (cid:96) = e − α (cid:96) d (cid:96) , ψ (cid:96) = e − j β (cid:96) d (cid:96) and represent X (cid:96) +1 ( ω ) as X (cid:96) +1 to ease notation.Here note that X M ( ω ) = r M and hence ∂X M ( ω ) /∂ε k = ∂r M /∂ε k . The partial derivatives ∂ρ (cid:96) /∂ε k and ∂ψ (cid:96) /∂ε k arenonzero only for k = (cid:96) and given as ∂ρ (cid:96) ∂ε (cid:96) = d (cid:96) ρ (cid:96) α (cid:96) ε (cid:96) ζ (cid:96) , ∂ψ (cid:96) ∂ε (cid:96) = − j d (cid:96) ψ (cid:96) β (cid:96) ε (cid:96) ζ (cid:96) . (53)The partial derivative ∂r (cid:96) /∂ε k is nonzero for both k = (cid:96) − and k = (cid:96) cases, which can be combined into the followingexpression ∂r (cid:96) ∂ε k = ( − (cid:96) − k +1 ε o µ o (1 − r (cid:96) ) η k . (54)We now consider the partial derivatives w.r.t. conductivityparameters σ k for k = 1 , , . . . , M . Similar to the pre-vious case, ∂X ( ω ) /∂σ k = e − α d e − j β d ∂X ( ω ) /∂σ k , ∂X (cid:96) ( ω ) /∂σ k = 0 for k < (cid:96) − , ∂X (cid:96) ( ω ) ∂σ k = C (cid:96) (cid:20)(cid:0) − X (cid:96) +1 ρ (cid:96) ψ (cid:96) (cid:1) ∂r (cid:96) ∂σ k + (cid:0) − r (cid:96) (cid:1) × (cid:18) ρ (cid:96) ψ (cid:96) ∂X (cid:96) +1 ∂σ k + X (cid:96) +1 ψ (cid:96) ∂ρ (cid:96) ∂σ k + X (cid:96) +1 ρ (cid:96) ∂ψ (cid:96) ∂σ k (cid:19)(cid:21) , (55) for k ≥ (cid:96) and ∂X M ( ω ) /∂σ k = ∂r M /∂σ k . The partialderivatives ∂r (cid:96) /∂σ k , ∂ρ (cid:96) /∂σ k and ∂ψ (cid:96) /∂σ k are now givenas ∂ρ (cid:96) ∂σ (cid:96) = − d (cid:96) ρ (cid:96) α (cid:96) ( ζ (cid:96) + 1) σ (cid:96) ζ (cid:96) , ∂ψ (cid:96) ∂σ (cid:96) = − j d (cid:96) ψ (cid:96) β (cid:96) ( ζ (cid:96) − σ (cid:96) ζ (cid:96) (56)for k = (cid:96) and ∂r (cid:96) ∂σ k = j ( − (cid:96) − k ωµ (1 − r (cid:96) ) η k (57)for (cid:96) − ≤ k ≤ (cid:96) .We close this section by providing the partial derivativesw.r.t. depth parameters d k for k = 0 , , . . . , M − . First,note that ∂X ( ω ) /∂d = − α + jβ ) X ( ω ) ρ ψ and ∂X (cid:96) ( ω ) /∂d k = 0 if k < (cid:96) . For the case where k = (cid:96) , wehave ∂X (cid:96) ( ω ) ∂d (cid:96) = − C (cid:96) (cid:0) − r (cid:96) (cid:1) ( α (cid:96) + jβ (cid:96) ) X (cid:96) +1 ρ (cid:96) ψ (cid:96) , (58)which is replaced by the following recursive expression when k > (cid:96) : ∂X (cid:96) ( ω ) ∂d k = C (cid:96) (cid:0) − r (cid:96) (cid:1) ρ (cid:96) ψ (cid:96) ∂X (cid:96) +1 ∂d k . (59)R EFERENCES[1] A. Pantelopoulos and N. G. Bourbakis, “A survey on wearable sensor-based systems for health monitoring and prognosis,”
IEEE Transactionson Systems, Man, and Cybernetics, Part C (Applications and Reviews) ,vol. 40, no. 1, pp. 1–12, 2010.[2] S. Majumder, T. Mondal, and M. Deen, “Wearable sensors for remotehealth monitoring,”
Sensors , vol. 17, no. 12, p. 130, Jan 2012.[3] S. Patel, H. Park, P. Bonato, L. Chan, and M. Rodgers, “A review ofwearable sensors and systems with application in rehabilitation,”
Journalof NeuroEngineering and Rehabilitation , vol. 9, no. 21, 2012.[4] J. Gao, S. Baskar, D. Teng, M. al’Absi, S. Kumar, and E. Ertin, “A newdirection for biosensing: RF sensors for monitoring cardio-pulmonaryfunction,” in
Mobile Health , J. Rehg, S. Murphy, and S. Kumar, Eds.Springer, 2017, p. 289–312.[5] C. Hsien-Chin, R. Ch´avez-Santiago, I. Balasingham, and J. Bergsland,“Ultrawideband technology in medicine: A survey,”
Journal of Electricaland Computer Engineering , 2012.[6] R. Zetik, J. Sachs, and R. S. Thoma, “UWB short-range radar sensing -The architecture of a baseband, pseudo-noise UWB radar sensor,”
IEEEInstrumentation Measurement Magazine , vol. 10, no. 2, pp. 39–45, 2007.[7] E. M. Staderini, “UWB radars in medicine,”
IEEE Aerospace andElectronic Systems Magazine , vol. 17, no. 1, pp. 13–18, 2002.[8] G. Varotto and E. Staderini, “On the UWB medical radars workingprinciples,”
Int. J. Ultra Wideband Commun. Syst. , vol. 2, pp. 83–93,2011.[9] Federal Communications Commission, “Revision of part 15 of thecommission’s rules regarding ultra-wideband transmission systems,”
First report and order, ET Docket , vol. 98153, 2002.[10] T. E. McEwan, “Body monitoring and imaging apparatus and method,”United States Patent 5,573,012, Nov. 12, 1996.[11] ——, “Body monitoring and imaging apparatus and method,” UnitedStates Patent 5,766,208, Jun. 16, 1998.[12] G. Varotto and E. M. Staderini, “A 2D simple attenuation model forEM waves in human tissues: Comparison with a FDTD 3D simulatorfor UWB medical radar,” in , vol. 3, 2008, pp. 1–4.[13] G. Varotto and E. Staderini, “On the UWB medical radars workingprinciples,”
Int. J. Ultra Wideband Commun. Syst. , vol. 2, pp. 83–93,2011.[14] M. Cavagnaro, E. Pittella, and S. Pisa, “UWB pulse propagation intohuman tissues,”
Physics in Medicine and Biology , vol. 58, no. 24, pp.8689–8707, Nov 2013.[15] D. Dias and J. P. S. Cunha, “Wearable health devices—vital signmonitoring, systems and technologies,”
Sensors , vol. 18, no. 8, Aug2018. [16] J. Gao, E. Ertin, S. Kumar, and M. al’Absi, “Contactless sensing ofphysiological signals using wideband RF probes,” in , 2013, pp. 86–90.[17] J. Gao, “Wearable sensing of cardio-pulmonary function: Non-invasivesensor design and statistical approaches to signal compression andanalysis,” Ph.D. dissertation, The Ohio State University, 2018.[18] X. Li, E. Pancera, L. Zwirello, Huaming Wu, and T. Zwick, “Ultrawideband radar for water detection in the human body,” in GermanMicrowave Conference Digest of Papers , 2010, pp. 150–153.[19] E. Pancera, X. Li, L. Zwirello, and T. Zwick, “Performance of ultrawideband antennas for monitoring water accumulation in human bodies,”in
Proceedings of the Fourth European Conference on Antennas andPropagation , 2010, pp. 1–5.[20] L. Niestoruk, O. Perkuhn, and W. Stork, “The experimental resultsof tissue thickness estimation with UWB signals for the purpose ofdetecting water accumulations in the human body,” in , 2011, pp. 1–4.[21] M. O’Halloran, F. Morgan, D. Flores Tapia, D. Byrne, M. Glavin,and E. Jones, “Ultra wideband radar system for bladder monitoringapplications,”
Progress In Electromagnetics Research C , vol. 33, pp.17–28, 01 2012.[22] T. Lauteslager, N. Nicolaou, T. S. Lande, and T. Constandinou, “Func-tional neuroimaging using UWB impulse radar: A feasibility study,”in ,2015, pp. 1–4.[23] J. J. Kormylo and J. M. Mendel, “Maximum-Likelihood seismic decon-volution,”
IEEE Transactions on Geoscience and Remote Sensing , vol.GE-21, no. 1, pp. 72–82, 1983.[24] Q. Cheng, R. Chen, and T. Li, “Simultaneous wavelet estimation anddeconvolution of reflection seismic signals,”
IEEE Transactions onGeoscience and Remote Sensing , vol. 34, no. 2, pp. 377–384, 1996.[25] M. Ketata, M. Dhieb, G. Ben Hmida, H. Ghariani, and M. Lahiani,“UWB pulse propagation in human tissue: Comparison between Gaus-sian and square waves shape,” in , 2015, pp. 158–162.[26] T. Saarenketo and T. Scullion, “Road evaluation with ground penetratingradar,”
Journal of Applied Geophysics , vol. 43, no. 2, pp. 119 – 138,2000.[27] I. AL-Qadi and S. Lahouar, “Measuring layer thicknesses with GPR– Theory to practice,”
Construction and Building Materials , vol. 19,no. 10, pp. 763 – 772, 2005.[28] A. Loizos and C. Plati, “Accuracy of pavement thicknesses estimationusing different ground penetrating radar analysis approaches,”
NDT &E International , vol. 40, no. 2, pp. 147 – 157, 2007.[29] S. Lahouar and I. L. Al-Qadi, “Automatic detection of multiple pavementlayers from GPR data,”
NDT & E International , vol. 41, no. 2, pp. 69– 81, 2008.[30] M. Africano, J. O. Vargas, R. Adriano, D. B. Oliveira, and A. C. Lisboa,“Ground-penetrating radar antenna design for homogeneous and low-loss dielectric multilayer media,”
Journal of Microwaves, Optoelectron-ics and Electromagnetic Applications , vol. 19, pp. 137 – 151, 06 2020.[31] J. Lee, C. Nguyen, and T. Scullion, “A novel, compact, low-cost, impulseground-penetrating radar for nondestructive evaluation of pavements,”
IEEE Transactions on Instrumentation and Measurement , vol. 53, no. 6,pp. 1502–1509, 2004.[32] S. Caorsi and M. Stasolla, “A layer stripping approach for EM recon-struction of stratified media,”
IEEE Transactions on Geoscience andRemote Sensing , vol. 52, no. 9, pp. 5855–5869, 2014.[33] S. Caorsi and M. Stasolla, “Towards the detection of multiple reflectionsin time-domain EM inverse scattering of multi-layered media,”
Progressin Electromagnetics Research B , vol. 38, pp. 351–365, 2012.[34] U. Spagnolini, “Permittivity measurements of multilayered media withmonostatic pulse radar,”
IEEE Transactions on Geoscience and RemoteSensing , vol. 35, no. 2, pp. 454–463, 1997.[35] S. Lambot, E. C. Slob, I. van den Bosch, B. Stockbroeckx, B. Scheers,and M. Vanclooster, “Estimating soil electric properties from monostaticground-penetrating radar signal inversion in the frequency domain,”
Water Resources Research , vol. 40, no. 4, 2004.[36] S. Lambot, E. C. Slob, I. van den Bosch, B. Stockbroeckx, andM. Vanclooster, “Modeling of ground-penetrating radar for accuratecharacterization of subsurface electric properties,”
IEEE Transactionson Geoscience and Remote Sensing , vol. 42, no. 11, pp. 2555–2568,2004.[37] P. Debye,
Polar Molecules , New York: Reinhold, 1929. [38] S. Lambot and F. Andr´e, “Full-wave modeling of near-field radar data forplanar layered media reconstruction,”
IEEE Transactions on Geoscienceand Remote Sensing , vol. 52, no. 5, pp. 2295–2303, 2014.[39] W. C. Chew,
Waves and Fields in Inhomogeneous Media . New York:IEEE Press, 1995.[40] F. Hlawatsch,
Time-Frequency Analysis and Synthesis of Linear SignalSpaces: Time-Frequency Filters, Signal Detection and Estimation, andRange-Doppler Estimation . USA: Kluwer Academic Publishers, 1998.[41] A. Kone and D. A. Kofke, “Selection of temperature intervals forparallel-tempering simulations,”