Bayesian nonparametric regression using complex wavelets
aa r X i v : . [ s t a t . M E ] M a r Bayesian nonparametric regression using complexwavelets
Norbert Rem´enyi a and Brani Vidakovic b a Research Group, Sabre Holdings, Southlake, Texas b School of Industrial and Systems Engineering, Georgia Institute of Technology
Abstract.
In this paper we propose a new adaptive wavelet denois-ing methodology using complex wavelets. The method is based on afully Bayesian hierarchical model in the complex wavelet domain thatuses a bivariate mixture prior on the wavelet coefficients. The heart ofthe procedure is computational, where the posterior mean is computedthrough Markov chain Monte Carlo (MCMC) simulations. We show thatthe method has good performance, as demonstrated by simulations onthe well-known test functions and by comparison to a well-establishedcomplex wavelet-based denoising procedure. An application to real-lifedata set is also considered.
In the present paper we consider a novel Bayesian model as a solution tothe classical nonparametric regression problem y i = f ( x i ) + ε i , i = 1 , . . . , n, (1)where x i , i = 1 , . . . , n , are equispaced sampling points, and the errors ε i arei.i.d. normal random variables, with zero mean and variance σ . The interestis to estimate the function f using the observations y i . After applying a linearand orthogonal wavelet transform, the equation in (1) becomes d jk = θ jk + ε jk , (2)where d jk , θ jk and ε jk are the wavelet coefficients (at resolution j and po-sition k ) corresponding to y , f and ε respectively. Note that ε i and ε jk areequal in distribution due to orthogonality of wavelet transforms. Due to thethe whitening property of the wavelet transforms (Flandrin, 1992) many ex-isting methods assume independence of the coefficients, and omit the doubleindices jk to model a generic wavelet coefficient as d = θ + ε, ǫ ∼ N (0 , σ ) . (3) Keywords and phrases.
Bayesian estimation, Hierarchical Bayes model, Markov ChainMonte Carlo, Kotz distribution, Wavelet shrinkage, Complex wavelets, Nonparametricregression Rem´enyi and Vidakovic
When indices are needed for the clarity of exposition they will be used.To estimate θ in model (3) Bayesian shrinkage rules have been proposedin the literature by many authors. By a shrinkage rule the observed waveletcoefficients d are replaced with their shrunk version ˆ θ = δ ( d ). Then f isestimated as the inverse wavelet transform of ˆ θ . Empirical distributions ofdetail wavelet coefficients for signals encountered in practical applicationsare (at each resolution level) centered around and peaked at zero (Mallat,1989). A range of models, for which unconditional distribution of waveletcoefficients mimic these properties, have been considered in the literature.The traditional Bayesian models consider prior distribution on the waveletcoefficient θ as π ( θ ) = ǫδ + (1 − ǫ ) ξ ( θ ) , (4)where δ is a point mass at zero, ξ is a symmetric and unimodal distribution,and ǫ is a fixed parameter in [0,1], usually level dependent, that controlsthe extent of shrinkage for values of d close to 0. This type of model wasconsidered by Abramovich et al. (1998), Vidakovic (1998), Vidakovic andRuggeri (2001) and Johnstone and Silverman (2005), among others. A recentoverview of Bayesian strategies in wavelet shrinkage can be found in Rem´enyiand Vidakovic (2012).Wavelet shrinkage methods using complex-valued wavelets provide ad-ditional insights to shrinkage process due to the information contained inthe phase. After taking a complex wavelet transform of a real-valued sig-nal, the model remains as in (2), however, the observed wavelet coefficients d jk become complex numbers at resolution j and location k . Several papersconsidering Bayesian wavelet shrinkage with complex wavelets are avail-able. Lina and Mayrand (1995) describes the complex-valued Daubechies’wavelets in detail, Lina and Macgibbon (1997), Lina (1997), and Lina et al.(1999) focus on image denoising, in which the phase of the observed waveletcoefficients is preserved, but the modulus of the coefficients is shrunk by theBayes rule. Barber and Nason (2004) develops the complex empirical Bayes(CEB) procedure, which modifies both the phase and modulus of waveletcoefficients by a bivariate shrinkage rule. The wavelet coefficients are repre-sented as bivariate real-valued random variables and the authors formulatea bivariate model in the complex wavelet domain. The resulting estimatorsare in closed-form and the hyperparameters of the model are estimated bythe empirical Bayes procedure.We build on the results above, and formulate a fully Bayesian hierarchicalmodel which accounts for the uncertainty of the prior parameters by placinghyperpriors on them. Since a closed-form solution for the Bayes estimator ayesian nonparametric regression using complex wavelets does not exist, the Markov Chain Monte Carlo (MCMC) methodology isapplied and an approximate estimator (posterior mean) is computed fromthe output of simulational runs. Although the simplicity of a closed-formsolution is lost, the procedure is fully Bayesian, adaptive to the underly-ing signal, and the estimation of the hyperparameters is automatic via theMCMC sampling algorithm. The estimation is governed by the data and hy-perprior distributions on the parameters, and this adaptivity ensures gooddenoising performance.The paper is organized as follows. Section 2 formalizes the model andpresents some results related to it. Section 3 details the MCMC samplingscheme. Section 4 discusses the selection of hyperparameters and presentssimulation results and comparisons to existing methods. In Section 5 weapply the proposed shrinkage to a real data set, and in Section 6 conclusionsare provided. In this section we describe a novel, fully Bayesian hierarchical model in thecomplex wavelet domain. After applying the complex wavelet transform toa real-valued signal, the observed wavelet coefficients d jk at resolution j andlocation k become complex numbers. Building on the approach taken byBarber and Nason (2004), we represent the complex-valued wavelet coeffi-cients as bivariate real-valued random variables.We consider the following Bayesian model d jk | θ jk , σ ∼ N ( θ jk , σ Σ j ) θ jk | ǫ j , C j ∼ (1 − ǫ j ) δ + ǫ j EP ( µ, C j , β ) , (5)where EP stands for the bivariate exponential power distribution. The mul-tivariate exponential power distribution is an extension of the class of normaldistributions in which the heaviness of tails can be controlled. Its defini-tion and properties can be found in Gomez et al. (1998). The prior on thelocation θ jk is a bivariate extension of the standard mixture prior in theBayesian wavelet shrinkage literature, consisting of a point mass at zero anda heavy-tailed distribution. As a prior, Barber and Nason (2004) considereda mixture of point mass and bivariate normal distribution. A heavy-tailedmixture prior in our proposal better captures the sparsity of wavelet coeffi-cients common in most applications; however, a closed-form solution is lost,and we need to rely on MCMC simulations to compute estimators.To calibrate the exponential power prior in (5) for wavelet shrinkage, weuse µ = 0, because the detail wavelet coefficients are centered around zero Rem´enyi and Vidakovic by their definition. We also fix β = 1 /
2, which gives the prior on θ jk thefollowing form: π ( θ jk | C j ) = 18 π | C j | / exp (cid:26) − (cid:16) θ ′ jk C − j θ jk (cid:17) / (cid:27) . (6)The prior specified above is equivalent to the bivariate double exponentialdistribution. The univariate double exponential prior was found to havedesirable properties in the real-valued wavelet denoising context (Vidakovicand Ruggeri, 2001; Johnstone and Silverman, 2005), hence it is natural toextend it to the bivariate case.From model (5) it is apparent that the mixture prior on θ jk is set de-pending on the dyadic level j , which ensures scale-adaptivity of the method.Quantity σ Σ j represents the scaled covariance matrix of the noise at eachdecomposition level, and C j represents the levelwise scale matrix in the ex-ponential power prior. Explicit expression for the covariance (Σ j ) inducedby white noise in complex wavelet shrinkage was derived in Barber and Na-son (2004). We adopt the approach described in their paper to model thecovariance structure of the noise and compute Σ j for each dyadic level j from the expressionsCov { Re( ε ) , Im( ε ) } = − σ Im(
W W T ) / , Cov { Re( ε ) , Re( ε ) } = σ { I n + Re( W W T ) } / , (7)Cov { Im( ε ) , Im( ε ) } = σ { I n − Re(
W W T ) } / . We assume a common i.i.d normal noise model e ∼ N n ( , σ I n ), as in (1).After taking complex wavelet transform, the real and imaginary parts ofthe transformed noise ε = W e become correlated, which is expressed by(7) above. Note that W is a complex-valued unitary matrix representingthe wavelet transform, therefore ¯ W T W = W ¯ W T = I , where ¯ W denotes thecomplex conjugate of W .Instead of estimating hyperparameters σ , ǫ j , and C j in (5) in empiricalBayes fashion, we elicit hyperprior distributions on them in a fully Bayesianmanner. We specify a conjugate inverse gamma prior on the noise variance σ , and an inverse Wishart prior on the matrix C j describing the covariancestructure of the spread prior of θ jk . Mixing weight ǫ j regulates the strengthof shrinkage of a wavelet coefficient to zero. We specify a “noninformative”uniform prior on this parameter, allowing the estimation to be governedmostly by the data.For computational purposes, we represent the exponential power priorin (6) as a scale mixture of multivariate normal distributions, which is an ayesian nonparametric regression using complex wavelets essential step for efficient Monte Carlo simulation. From Gomez et al. (2008),the bivariate exponential power distribution with µ = 0 and β = 1 / EP ( µ = 0 , C j , β = 1 /
2) = Z ∞ N (0 , vC j ) 1Γ(3 / / v / e − v/ dv, which is a scale mixture of bivariate normal distributions with mixing dis-tribution gamma. Using the specified hyperpriors and the mixture represen-tation, the basic model in (5) extends to d jk | θ jk , σ ∼ N ( θ jk , σ Σ j ) σ ∼ IG ( a, b ) θ jk | z jk , v jk , C j ∼ (1 − z jk ) δ + z jk N (0 , v jk C j ) z jk | ǫ j ∼ B er ( ǫ j ) (8) ǫ j ∼ U (0 , v jk ∼ G a (3 / , C j ∼ IW ( A j , w ) . For computational purposes, we introduce a latent variable z jk . Variable z jk is a Bernoulli variable indicating whether parameter θ jk comes from a pointmass at zero ( z jk = 0) or from a bivariate normal distribution ( z jk = 1),with prior probability of 1 − ǫ j or ǫ j , respectively. The uniform U (0 ,
1) prioron ǫ j is equivalent to beta B e (1 ,
1) distribution, which is a conjugate priorfor the Bernoulli distribution. Integrating out z jk from model (9) gives backthe original mixture expression in model (5).By representing the exponential power prior as a scale mixture of normals,the hierarchical model in (9) becomes tractable, because the full conditionaldistributions of all the parameters become explicit. Therefore, we can de-velop a fast Gibbs sampling algorithm to update all the necessary parameters σ , z jk , ǫ j , θ jk , v jk and C j . Note that in order to run the Gibbs samplingalgorithm we only have to specify hyperparameters a , b , A j and w. Thedetails of Gibbs sampling scheme will be discussed in the next section.
To conduct posterior inference on the wavelet coefficients θ jk , a standardGibbs sampling procedure is adopted. In this section we provide details ofhow to develop a Gibbs sampler for the model in (9). Gibbs sampling is aniterative algorithm that simulates from a joint posterior distribution through Rem´enyi and Vidakovic iterative simulation over the list of full conditional distributions. For moredetails on Gibbs sampling, see Casella and George (1992), or Robert andCasella (1999). For the model in (9), the full conditionals for parameters σ , z jk , ǫ j , θ jk , v jk and C j can be specified exactly. Specification of hyperpa-rameters a , b , A j and w will be discussed in Section 4. Technical details ofthis section are deferred to Appendix. σ Using a conjugate IG ( a, b ) prior on σ results in a full conditional which isinverse gamma. Therefore, update σ as σ i ) ∼ IG a + n, /b + 1 / X jk (cid:16) d jk − θ ( i − jk (cid:17) ′ Σ − j (cid:16) d jk − θ ( i − jk (cid:17) − , (9) where n = 2 J − J denotes the sample size, and i denotes the i th simulationrun. z jk and ǫ j In model (9) the latent variable z jk has Bernoulli prior with parameter ǫ j .Its full conditional remains Bernoulli and is updated as follows: z ( i ) jk = , wp. (cid:16) − ǫ ( i − j (cid:17) f (cid:16) d jk | , σ i ) (cid:17)(cid:16) − ǫ ( i − j (cid:17) f (cid:16) d jk | , σ i ) (cid:17) + ǫ ( i − j m (cid:16) d jk | σ i ) , v ( i − jk , C ( i − j (cid:17) , wp. ǫ ( i − j m (cid:16) d jk | σ i ) , v ( i − jk , C ( i − j (cid:17)(cid:16) − ǫ ( i − j (cid:17) f (cid:16) d jk | , σ i ) (cid:17) + ǫ ( i − j m (cid:16) d jk | σ i ) , v ( i − jk , C ( i − j (cid:17) , (10) where f ( d jk | , σ ) = 12 π | σ Σ j | / exp (cid:26) − σ d ′ jk Σ − j d jk (cid:27) ,m ( d jk | σ , v jk , C j ) = 12 π | σ Σ j + v jk C j | / exp (cid:26) − d ′ jk (cid:0) σ Σ j + v jk C j (cid:1) − d jk (cid:27) . Parameter ǫ j is given a conjugate B e (1 ,
1) prior. This results in a full con-ditional distributed as beta. Therefore we update ǫ j as ǫ ( i ) j ∼ B e X k z ( i ) jk , X k (cid:16) − z ( i ) jk (cid:17)! . (11) Note that other choices from the B e ( α, β ) family are possible as the priorfor ǫ j . However, we used the noninformative choice of α = 1 and β = 1 tofacilitate data-driven estimation of ǫ j . ayesian nonparametric regression using complex wavelets θ jk From the conjugate setup of model (9) and using the latent variable z jk , itfollows that the full conditional distribution of θ jk is either a point mass atzero ( z jk = 0), or a bivariate normal distribution ( z jk = 1). Therefore weupdate θ jk as follows: θ ( i ) jk ∼ ( δ ( θ jk ) , if z ( i ) jk = 0 f (cid:16) θ jk | d jk , σ i ) , v ( i − jk , C ( i − j (cid:17) , if z ( i ) jk = 1 , (12) where f ( θ jk | d jk , σ , v jk , C j ) = 12 π | ˜Σ jk | / exp (cid:26) −
12 ˜ µ ′ jk ˜Σ − jk ˜ µ jk (cid:27) , ˜ µ jk = ˜Σ jk Σ − j σ d jk , ˜Σ jk = (cid:0) Σ − j /σ + C − j /v jk (cid:1) − . v jk In model (9) for the scale mixture of normals representation we placed agamma prior on v jk . The full conditional distribution of v jk depends on thevalue of z jk , and the updating scheme is: v ( i ) jk ∼ ( G a (3 / , , if z ( i ) jk = 0 GIG (cid:18) / , θ ( i ) jk ′ n C ( i − j o − θ ( i ) jk , / (cid:19) , if z ( i ) jk = 1 . (13)Here GIG ( a, b, p ) denotes the generalized inverse Gaussian distribution (John-son et al., 1994, p.284) with probability density function f ( x | a, b, p ) = ( a/b ) p/ K p ( √ ab ) x p − e − ( ax + b/x ) / , x > a, b > , where K p denotes the modified Bessel function of the third kind. Simulationof GIG random variates is available through a MATLAB c (cid:13) implementationbased on Dagpunar (1989). C j Placing a conjugate inverse Wishart prior on covariance matrix C j resultsin a full conditional distribution which remains inverse Wishart. Therefore, C j is updated as: C ( i ) j ∼ IW A j + X k z ( i ) jk θ ( i ) jk θ ( i ) jk ′ v ( i ) jk , w + X k z ( i ) jk . (14) Rem´enyi and Vidakovic
The implementation of the described Gibbs sampler requires simulation rou-tines for standard distributions such as inverse gamma, Bernoulli, beta, nor-mal, and also a specialized routine to simulate from the generalized inverseGaussian. The procedure was implemented in MATLAB and available fromthe authors.The Gibbs sampling procedure can be summarized as(i) Choose initial values for parameters(ii) Repeat steps (iii) - (viii) for l = 1 , . . . , M (iii) Update σ (iv) Update z jk for j = J , . . . , log ( n ) − , k = 0 , . . . , j − ǫ j for j = J , . . . , log ( n ) − θ jk for j = J , . . . , log ( n ) − , k = 0 , . . . , j − v jk for j = J , . . . , log ( n ) − , k = 0 , . . . , j − C j for j = J , . . . , log ( n ) − CGSWS ). In thefollowing section we explain the specification of hyperparameters a , b , A j and w , and apply the CGSWS algorithm to denoise simulated test functions.
In this section we discuss the performance of the proposed
CGSWS estima-tor and compare it to an established method from the literature consideringcomplex Bayesian wavelet denoising. Within each replication of the simula-tions we performed 10,000 Gibbs sampling iterations, of which the first 5,000was burn-in. We used the sample average ˆ θ jk = P i θ ( i ) jk /N as the usual esti-mator for the posterior mean. In our set-up N = 5 , In any Bayesian modeling task the selection of hyperparameters is criticalfor good performance of the model. It is also desirable to have a defaultway of selecting the hyperparameters which makes the shrinkage procedureautomatic.In order to apply the
CGSWS method we need to specify hyperparameters a , b , A j and w in the hyperprior distributions. This selection is governed bythe data and hyperprior distributions. The advantage of the fully Bayesianapproach is that once the hyperpriors are set, the estimation of parameters σ , ǫ j , θ jk , v jk and C j is automatic via the Gibbs sampling scheme. Another ayesian nonparametric regression using complex wavelets advantage is that the method is relatively robust to the choice of hyperpa-rameters since they enter the model at higher level of hierarchy. Parameters a and b . For simplicity we set a = 2 and b = 1 / ˆ σ , whereˆ σ = (cid:0) MAD( d re jk / . (cid:1) + (cid:0) MAD( d im jk / . (cid:1) , j = log ( n ) − . This ensures that the mean of the inverse gamma prior on σ is the standardrobust estimator of the noise variation (Donoho and Johnstone, 1994). HereMAD stands for the median absolute deviation of the wavelet coefficients,which we calculate at the finest level of detail from both the real and imag-inary parts of wavelet coefficients (Barber and Nason, 2004). Parameters A j and w . Hyperparameters A j and w play an importantrole in the prior on the covariance matrix C j . Since in the Gibbs samplerupdates P k z ( i ) jk and therefore P k z ( i ) jk θ ( i ) jk θ ( i ) jk ′ /v ( i ) jk can possibly be zero, anoninformative Jeffreys prior on C j is not computationally feasible. Alsonote that the mean of the inverse Wishart prior is A j / ( w − p − p is the dimension of A j , which is equal to 2 in our case. Therefore we set A j = ( w − −
1) ˆ C j , which forces the mean of the prior to be a pre-specified estimate of C j . Inthe case of the mixture bivariate double exponential prior, the covarianceof the signal part is Cov( θ jk ) = ǫ j C j , where 12 C j is the covariance ofa bivariate double exponential random variable (Gomez et al., 1998). Sincethe model assumes independence of signal and error parts, we have thatCov( d jk ) = ǫ j C j + σ Σ j , where Cov( d jk ) is the covariance of the ob-servations d jk at j th dyadic level. We choose ǫ j = 1 / √
12 as a reasonableestimate, which additionally simplifies the equation in hand. Therefore areasonable estimator for C j isˆ C j = Cov( d j ) − ˆ σ Σ j , J ≤ j ≤ log n − , (15)where Cov( d j ) is the sample covariance estimator using observations d jk at j th dyadic level. Note that Σ j is known, and ˆ σ is the usual robust estimatorof the variance of wavelet coefficients introduced before. Also note that whenˆ C j is not positive definite, we regularize it by adding a multiple of theidentity matrix.Finally, we set w = 10. Note, that w = 4 is the least informative choicein our case, however, we found that a slightly higher values for w workedbetter in practice. Rem´enyi and Vidakovic
In the following section we present results of a simulation study in which wecompare the denoising performance of the
CGSWS method to two complexwavelet-based denoising methods introduced by Barber and Nason (2004).The first one (
CMWS-Hard ) is a phase preserving estimator based on hardthresholding of a “thresholding statistic” d ′ jk Σ − j d jk . The second one ( CEB-Posterior mean ) is a bivariate posterior mean estimator based on an empir-ical Bayes procedure.Four standard test functions (
Blocks , Bumps , Doppler , Heavisine ) wereconsidered (Donoho and Johnstone, 1994) in the simulations. The func-tions were rescaled so that the added noise produced preassigned signal-to-noise ratio (SNR), as standardly done. The test functions were simulatedat n = 256, 512, and 1024 equally spaced points in the interval [0 , J = 3 which matches J = ⌊ log (log( n )) + 1 ⌋ suggested byAntoniadis et al. (2001).Reconstruction of the theoretical signal was measured by the averagemean squared error (AMSE), calculated as1 M n M X k =1 n X i =1 (cid:16) ˆ f k ( t i ) − f ( t i ) (cid:17) , where M is the number of simulation runs and f ( t i ) , i = 1 , . . . , n are knownvalues of the test functions considered. We denote by ˆ f k ( t i ) , i = 1 , . . . , n the estimator from the k -th simulation run. Note, that in each of thesesimulation runs we perform 10,000 Gibbs iterations to get the estimatorsˆ θ jk . We set M = 100.The results are summarized in Table 1, where boldface numbers indicatethe smallest AMSE result for each test scenario. The results convey thatthe proposed CGSWS method outperforms both estimators for majority ofthe test scenarios, and in most other cases it is very close in performance tothe superior method. The improvement is most pronounced at small samplesizes ( n = 256) and for the test functions Bumps and
Heavisine . This resultconfirms the adaptiveness of the method and the advantage of using a heavy-tailed prior as prior distribution for the location of wavelet coefficients. Note,however, that the computational cost of the algorithm is higher than for thecompetitors. The
CEB-Posterior mean method can be a good compromisein terms of performance and computational efficiency. ayesian nonparametric regression using complex wavelets Table 1
AMSE of CGSWS method compared to estimators CMWS-Hard andCEB-Posterior mean.
Signal N Method SNR=3 SNR=5 SNR=7 SNR=10 Signal N Method SNR=3 SNR=5 SNR=7 SNR=10Blocks 256 CGSWS
Doppler 256 CGSWS
CMWS-H 0.4929 0.5476 0.5490 0.5021 CMWS-H 0.3332 0.3351 0.3644 0.4000CEB-PM 0.4343 0.4675 0.4715 0.4547 CEB-PM 0.3137 0.3158 0.3351 0.3723512 CGSWS
CMWS-H 0.3481 0.3627 0.3457 0.3166 CMWS-H 0.2048 0.2217 0.2192 0.2289CEB-PM 0.3028 0.3202
CEB-PM
CEB-PM 0.1087 0.1225
Bumps 256 CGSWS
Heavisine 256 CGSWS
CMWS-H 0.5972 0.5946 0.5853 0.5809 CMWS-H 0.1547 0.2075 0.2144 0.2198CEB-PM 0.4855 0.4996 0.5120 0.5390 CEB-PM 0.1338 0.1838 0.2098 0.2188512 CGSWS
512 CGSWS
CEB-PM 0.3295 0.3315 0.3287 0.3228 CEB-PM 0.0881 0.1167 0.1340 0.14271024 CGSWS 0.1965
CEB-PM
For illustration we apply the described
CGSWS method to a real-life data setfrom anesthesiology collected by inductance plethysmography. The record-ings were made by the Department of Anaesthesia at the Bristol Royal In-firmary and represent measure of flow of air during breathing. The data setwas analyzed by several authors, for example Nason (1996) and Abramovichet al. (1998, 2002) where more information about the data can be found.Figure 1 shows a section of plethysmograph recording lasting approxi-mately 80 s ( n = 4096 observations), while Figure 2 shows the reconstruc-tion of the signal with the CGSWS method. In the reconstruction processwe applied N = 10 ,
000 iterations of the Gibbs sampler of which the first5,000 was burn-in. The aim of smoothing was to preserve features such aspeak heights while eliminating spurious high-frequency variation. The resultprovided by the proposed method satisfies these requirements providing avery smooth result. Abramovich et al. (2002) report the heights of the firstpeak while analyzing this data set. In our case the height is 0.8342, whichis quite close to the result 0.8433, obtained by Abramovich et al. (2002),and better compared to the results obtained by other established methodsanalyzed in their paper.
In this paper we proposed the Complex Gibbs Sampling Wavelet Smoother(
CGSWS ), a complex wavelet-based method for nonparametric regression.A fully Bayesian approach was taken, in which a hierarchical model wasformulated that accounts for the uncertainty of the prior parameters byplacing hyperpriors on them. A mixture prior was specified on the complex Rem´enyi and Vidakovic V o l t age Time
Figure 1
A section of inductance plethysmography data with n = 4096 . V o l t age Time
Figure 2
Reconstruction of the inductance plethysmography data by CGSWS. ayesian nonparametric regression using complex wavelets wavelet coefficients with a bivariate double exponential spread distributionto account for the large wavelet coefficients. Since all the full conditional dis-tributions were available in an explicit distributional form, an efficient Gibbssampling estimation procedure was proposed. The CGSWS method providedexcellent denoising performance, which was demonstrated by simulations onwell-known test functions and by comparison to a well-established waveletdenoising method that uses complex wavelets. The methodology was alsoillustrated on a real-life data set from inductance plethysmography. Therethe proposed method performed well in both smoothing and preserving theimportant features of the phenomenon. Rem´enyi and Vidakovic
In this Appendix we provide some results used for setting the Gibbs samplingalgorithm in (9). To derive the full conditional distribution for a parameterof interest we start with the joint distribution of all the parameters andcollect the terms which contain the desired parameter. Denote d = { d jk : j = J , . . . , log ( n ) − , k = 0 , . . . , j − } , θ = { θ jk : j = J , . . . , log ( n ) − , k = 0 , . . . , j − } , where d jk and θ jk are bivariate components. Similarly, denote z = { z jk : j = J , . . . , log ( n ) − , k = 0 , . . . , j − } , ǫ = { ǫ j : j = J , . . . , log ( n ) − } , v = { v jk : j = J , . . . , log ( n ) − , k = 0 , . . . , j − } , and C = { C j : j = J , . . . , log ( n ) − } the vector containing matrices C j for resolution levels j . The joint distribu-tion of the data and parameters is f ( d , θ , z , ǫ , v , σ , C ) = Y j,k √ π | σ Σ j | / · exp (cid:26) − σ ( d jk − θ jk ) ′ Σ − j ( d jk − θ jk ) (cid:27)(cid:21) · a ) b a ( σ ) − a − e − σ b Y j,k { (1 − z jk ) δ + z jk √ π | v jk C j | / exp (cid:26) − v jk θ ′ jk C − j θ jk (cid:27)) · Y j,k ǫ z jk j (1 − ǫ j ) (1 − z jk ) Y j { ≤ ǫ j ≤ } · Y j,k / / v / − jk e − v jk / · Y j | C j | − ( w + d +1) / exp (cid:26) −
12 tr (cid:0) A j C − j (cid:1)(cid:27) . ayesian nonparametric regression using complex wavelets From the joint distribution, the full conditional distribution of σ is p ( σ | θ , d ) ∝ (cid:18) σ (cid:19) n exp − σ X j,k ( d jk − θ jk ) ′ Σ − j ( d jk − θ jk ) (cid:0) σ (cid:1) − a − e − σ b = ( σ ) − a − n − exp − σ /b + 1 / X j,k ( d jk − θ jk ) ′ Σ − j ( d jk − θ jk ) = IG a + n, /b + 1 / X j,k ( d jk − θ jk ) ′ Σ − j ( d jk − θ jk ) − . The conditional distribution of z jk remains Bernoulli with posterior successprobability P ( z jk = 1 | d jk , σ , ǫ j , v jk , C j ) = ǫ j m (cid:0) d jk | σ , v jk , C j (cid:1) (1 − ǫ j ) f ( d jk | , σ ) + ǫ j m ( d jk | σ , v jk , C j ) , where f ( d jk | , σ ) = 12 π | σ Σ j | / exp (cid:26) − σ d ′ jk Σ − j d jk (cid:27) ,m ( d jk | σ , v jk , C j ) = 12 π | σ Σ j + v jk C j | / exp (cid:26) − d ′ jk (cid:0) σ Σ j + v jk C j (cid:1) − d jk (cid:27) . The marginal distribution m ( d jk | σ , v jk , C j ) is a bivariate normal distribu-tion with zero mean and covariance matrix σ Σ j + v jk C j . This follows formconjugate multivariate normal-normal structure (Lindley and Smith, 1972).The full conditional distribution of ǫ j is p ( ǫ j | z ) ∝ "Y k ǫ z jk j (1 − ǫ j ) (1 − z jk ) { ≤ ǫ j ≤ } = ǫ P k z jk j (1 − ǫ j ) P k (1 − z jk ) = B e X k z jk , X k (1 − z jk ) ! . Rem´enyi and Vidakovic
The full conditional distribution of θ jk is p ( θ jk | d jk , z jk , σ , v jk , C j ) ∝ exp (cid:26) − σ ( d jk − θ jk ) ′ Σ − j ( d jk − θ jk ) (cid:27) · [(1 − z jk ) δ + z jk √ π | v jk C j | / exp (cid:26) − v jk θ ′ jk C − j θ jk (cid:27) = ( δ ( θ jk ) , if z jk = 0 f (cid:0) θ jk | d jk , σ , v jk , C j (cid:1) , if z jk = 1 , where f ( θ jk | d jk , σ , v jk , C j ) = 12 π | ˜Σ jk | / exp (cid:26) −
12 ˜ µ ′ jk ˜Σ − jk ˜ µ jk (cid:27) , ˜ µ jk = ˜Σ jk Σ − j σ d jk , ˜Σ jk = (cid:0) Σ − j /σ + C − j /v jk (cid:1) − . Derivation of f ( θ jk | d jk , σ , v jk , C j ) is also a standard result contained forexample in Lindley and Smith (1972) and was used in the wavelet shrinkagecontext by Barber and Nason (2004).The full conditional distribution of v jk is proportional to p ( v jk | θ jk , z jk , C j ) ∝ " (1 − z jk ) δ + z jk √ π | v jk C j | / exp (cid:26) − v jk θ ′ jk C − j θ jk (cid:27) · v / − jk exp n − v jk o . For z jk = 0, this becomes p ( v jk | θ jk , z jk = 0 , C j ) ∝ v / − jk exp n − v jk o = G a (3 / , , and when z jk = 1, it becomes p ( v jk | θ jk , z jk = 1 , C j ) ∝ v jk exp (cid:26) − v jk θ ′ jk C − j θ jk (cid:27) v / − jk exp n − v jk o = v / − jk exp (cid:26) − (cid:18) v jk + θ ′ jk C − j θ jk v jk (cid:19)(cid:27) = GIG (cid:0) / , θ ′ jk C − j θ jk , / (cid:1) . ayesian nonparametric regression using complex wavelets Here
GIG ( a, b, p ) denotes the generalized inverse Gaussian distribution (John-son et al., 1994, p.284) with probability density function f ( x | a, b, p ) = ( a/b ) p/ K p ( √ ab ) x p − e − ( ax + b/x ) / , x > a, b > , where K p denotes the modified Bessel function of the third kind.Finally, the full conditional distribution of C j is given as p ( C j | θ j , z j , v j ) ∝ Y k " (1 − z jk ) δ + z jk √ π | v jk C j | / exp (cid:26) − v jk θ ′ jk C − j θ jk (cid:27) ·| C j | − ( w + d +1) / exp (cid:26) −
12 tr (cid:0) A j C − j (cid:1)(cid:27) = Y k " (1 − z jk ) δ + z jk √ π | v jk C j | / · exp (cid:26) −
12 tr (cid:18) θ jk θ ′ jk v jk C − j (cid:19)(cid:27)(cid:21) | C j | − ( w + d +1) / exp (cid:26) −
12 tr (cid:0) A j C − j (cid:1)(cid:27) ∝ | C j | − ( P k z jk + w + d +1 ) / exp ( −
12 tr " A j + X k z jk θ jk θ ′ jk v jk C − j !) = IW A j + X k z jk θ jk θ ′ jk v jk , w + X k z jk ! , where IW denotes the inverse Wishart distribution.8 Rem´enyi and Vidakovic
References
Abramovich, F., Besbeas, P., and Sapatinas T. (2002). Empirical Bayes Approach to BlockWavelet Function Estimation.
Computational Statistics and Data Analysis , 435–451.Abramovich, F., Sapatinas T., and Silverman B.W. (1998). Wavelet thresholding via aBayesian Approach. Journal of the Royal Statistical Society, Ser. B , 725–749.Antoniadis, A., Bigot, J., and Sapatinas, T. (2001). Wavelet estimators in nonparametricregression: a comparative simulation study. Journal of Statistical Software , 1–83.Barber, S. and Nason, G.P. (2004). Real nonparametric regression using complex wavelets. Journal of the Royal Statistical Society, Ser. B , 927–939.Casella, G. and George, E.I. (1992). Explaining the Gibbs Sampler. The American Statis-tician , 167–174.Dagpunar, J.S. (1989). An easily implemented generalized inverse Gaussian generator. Communications in Statistics - Simulation and Computation , 703–710.Donoho, D.L. and Johnstone, I.M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika , 425–455.Flandrin, P. (1992). Wavelet analysis and synthesis of fractional Brownian motion. IEEETransactions on Information Theory , 910–917.Gomez, E., Gomez-Villegas, M.A., and Marin, J.M. (1997). A multivariate generalizationof the power exponential family of distributions Communications in Statistics - Theoryand Methods , 589–600.Gomez, E., Gomez-Villegas, M.A., and Marin, J.M. (1997). Multivariate ExponentialPower Distributions as Mixtures of Normal Distributions with Bayesian Applications Communications in Statistics - Theory and Methods , 972–985.Johnson, N.L., Kotz, S., and Balakrishnan, N. (1999) Continuous Univariate Distributions,Volume 1. Wiley-Interscience.Johnstone, I.M. and Silverman, B.W. (2005). Empirical Bayes selection of wavelet thresh-olds. Annals of Statistics , 1700–1752.Lina, J.M. (1997). Image Processing with Complex Daubechies Wavelets. Journal of Math-ematical Imaging and Vision , 211–233.Lina, J.M. and Macgibbon, B. (1997). Non-Linear Shrinkage Estimation with ComplexDaubechies Wavelets. Proceedings of SPIE, Wavelet Applications in Signal and ImageProcessing V , 67–79.Lina, J.M. and Mayrand, M. (1995). Complex Daubechies Wavelets.
Applied and Compu-tational Harmonic Analysis , , 219–229.Lina, J.M., Turcotte, P., and Goulard, B. (1999). Complex Dyadic Multiresolution Anal-yses. Advances in Imaging and Electron Physics , , 163–197.Lindley, D.V. and Smith, A.F.M. (1972). Bayes Estimates for the Linear Model. Journalof the Royal Statistical Society, Series B , , 1–41.Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet represen-tation. IEEE Transactions on Pattern Analysis and Machine Intelligence , 674–693.Nason, G.P. (1996). Wavelet shrinkage using cross-validation. Journal of the Royal Statis-tical Society, Series B , 463–479.Rem´enyi, N. and Vidakovic, B. (2012). Bayesian Wavelet Shrinkage Strategies - NewDevelopments. In Multiscale Signal Analysis and Modeling , Lecture Notes in ElectricalEngineering, Editors X. Shen and A. Zayed. Springer-Verlag, New York, 317–346.Robert, C.P. and Casella, G. (1999). Monte Carlo Statistical Methods. Springer-Verlag,New York.Vidakovic, B. (1998). Nonlinear wavelet shrinkage with Bayes rules and Bayes factors.
Journal of the American Statistical Association , 173–179.ayesian nonparametric regression using complex wavelets Vidakovic, B. and Ruggeri, F. (2001). BAMS Method: Theory and Simulations.
Sankhy¯a,Series B63