Online Supervised Acoustic System Identification exploiting Prelearned Local Affine Subspace Models
22020 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 21–24, 2020, ESPOO, FINLAND
ONLINE SUPERVISED ACOUSTIC SYSTEM IDENTIFICATION EXPLOITINGPRELEARNED LOCAL AFFINE SUBSPACE MODELS
Thomas Haubner, Andreas Brendel and Walter Kellermann
Multimedia Communications and Signal Processing, Friedrich-Alexander-University Erlangen-N¨urnberg,Cauerstr. 7, D-91058 Erlangen, Germany, [email protected]
ABSTRACT
In this paper we present a novel algorithm for improved block-onlinesupervised acoustic system identification in adverse noise scenariosby exploiting prior knowledge about the space of Room Impulse Re-sponses (RIRs). The method is based on the assumption that thevariability of the unknown RIRs is controlled by only few physicalparameters, describing, e.g., source position movements, and thusis confined to a low-dimensional manifold which is modelled by aunion of affine subspaces. The offsets and bases of the affine sub-spaces are learned in advance from training data by unsupervisedclustering followed by Principal Component Analysis. We suggestto denoise the parameter update of any supervised adaptive filter byprojecting it onto an optimal affine subspace which is selected basedon a novel computationally efficient approximation of the associatedevidence. The proposed method significantly improves the systemidentification performance of state-of-the-art algorithms in adversenoise scenarios.
Index Terms — Online Supervised System Identification, Acous-tic Echo Cancellation, Model Learning, Local Affine Subspace,Model Selection
1. INTRODUCTION
Online Supervised Acoustic System Identification (OSASI) is oneof the classical tasks in acoustic signal processing with a multitudeof applications [1, 2]. In this paper we consider linear convo-lutive Multiple-Input Multiple-Output (MIMO) applications withhigh-level interfering noise sources which are prone to non-robustOSASI performance. Such situations are typically encountered inhands-free acoustic human-machine interfaces which operate in,e.g., driving cars with open windows, or factories, and often involvenegative Signal-to-Noise-Ratios (SNRs). MIMO OSASI is usuallytackled by frequency-domain adaptive filter algorithms which takefor its optimization the statistical properties of the excitation signals,e.g., non-stationarity, temporal and spatial correlation, into account[3, 4]. Noise and interference in the observations is often addressedby Variable Step Size Selection (VSSS) methods which use either bi-nary or smooth adaptation control. Binary adaptation control, whichin the context of Acoustic Echo Cancellation (AEC) is applied tocope with double-talk, stipulates halting the adaptation during peri-ods of high interference levels [5, 6]. In contrast, smooth adaptationcontrol continuously adjusts the step size in dependence of a noiseestimate. A powerful model-based approach for smooth adaptationcontrol, based on an online Maximum Likelihood (ML) algorithm,
This work was supported by the DFG under contract no < Ke890/10-2 > within the Research Unit FOR2457 ”Acoustic Sensor Networks”. was introduced in [7]. However, VSSS-based algorithms still resultin limited system identification performance for applications withpersistent low SNR.Besides adaptation control, the exploitation of prior knowledgeabout the unknown system has proven to be beneficial for OSASIwith high-level interfering noise [8, 9, 10]. This prior knowledge isusually extracted in advance from a training data set of Room Im-pulse Response (RIR) samples. The main assumption behind theseapproaches is the existence of a low-dimensional manifold that isembedded in the high-dimensional space of adaptive filter param-eters for a given OSASI scenario. This can be motivated by theassumption that the variability of the unknown RIRs is controlledby only few physical parameters, describing, e.g., source positionmovements, temperature changes or movement of furniture [11, 12].There is a variety of different approaches to model this manifoldwith the most prominent one assuming that the RIRs are confinedto a single affine subspace which can be estimated, e.g., by Prin-cipal Component Analysis (PCA). In [8] this model has been em-ployed by regularizing a Least-Squares (LS) cost function with theMahalanobis distance based on the estimated RIR covariance matrix.The strong assumption of globally-correlated RIRs is however onlyrarely valid in practice, e.g., see [12]. Thus, [9] modifies it to a localPCA model, which can be motivated by the assumption of manifoldsbeing locally Euclidean [13]. By the increased model flexibility,which results from employing several PCAs instead of a single one,[9] shows a performance improvement in an offline LS-based systemidentification task. Hereby, each PCA is associated with a specificsource position and estimated from RIR samples which correspondto local source position movements. By employing several mutuallyexclusive local models, a model selection is required. As selectioncriterion [9] suggests the Frobenius norm of the difference of thea-priori-learned model covariance matrices and an estimated FiniteImpulse Response (FIR) covariance matrix. The latter one is esti-mated from the solutions of several LS system identification prob-lems with local source position variations. In [10] another offline LSapproach for noise-robust system identification is introduced whichrepresents the training data by a globally-nonlinear manifold model.As [9] and [10] rely on an affinity measure between a statistic of theadaptive filter estimate and the model parameters, they are suscep-tible to nonunique solutions to the system identification problemswhich result, e.g., from cross-correlated input signals [14, 15].In this paper we introduce a general method which allows toinclude prior knowledge about the RIRs into any OSASI algorithmto enhance its performance in adverse noise scenarios. The methodrelies on the assumption that the RIRs can be modelled by a set ofaffine subspaces whose parameters are estimated by unsupervisedclustering and PCA. We suggest to denoise the estimated FIR coef-ficient updates of any OSASI algorithm by projecting it onto an op-978-1-7281-6662-9/20/$31.00 c (cid:13) a r X i v : . [ ee ss . A S ] J u l imally selected affine subspace. Furthermore, we introduce a prob-abilistic approach for computationally-efficient online model selec-tion by evidence maximization which is independent of the currentFIR estimate of the OSASI algorithm.
2. SUPERVISED ADAPTIVE MIMO FILTERING
In this section we will define a signal model for MIMO OSASI.Hereby, it is assumed that there exists a linear functional relation-ship between the n th sample of the Q estimated output signals ˆ y ( n ) = ˆ H T ( n ) x ( n ) ∈ R Q (1)and the most recent L samples of the P input signals x ( n ) = (cid:0) x T ( n ) , . . . , x T P ( n ) (cid:1) T ∈ R PL , (2)with x p ( n ) = (cid:0) x p ( n ) , . . . , x p ( n − L + 1) (cid:1) T ∈ R L . (3)The estimated transmission matrix at time instant n ˆ H ( n ) = ˆ h ( n ) . . . ˆ h Q ( n ) ... . . . ... ˆ h P ( n ) . . . ˆ h PQ ( n ) ∈ R PL × Q (4)models FIR filters ˆ h pq ( n ) of length L between each input and eachoutput signal. As most algorithms directly process blocks of obser-vations, we introduce the block output matrix ˆ Y ( m ) = (cid:0) ˆ y ( mL ) , . . . , ˆ y ( mL − L + 1) (cid:1) ∈ R Q × L (5)which captures L samples into one block indexed by m .The estimation of the transmission matrix Eq. (4) represents anoptimization problem in the high-dimensional parameter space R R of dimension R = P LQ with elements ˜ h ( n ) = vec ( ˆ H T ( n )) andvec ( · ) being the vectorization operator [16]. Then, the generic pa-rameter update for iterative OSASI algorithms reads: ˜ h ( m ) = ˜ h ( m −
1) + ∆˜ h ( m ) (6)with ∆˜ h ( m ) denoting the update term. Note that in the followingthe block-dependency m of the parameters ˜ h ( m ) is omitted if pos-sible for notational convenience.
3. LOCAL AFFINE SUBSPACE MODELS
As discussed in Sec. 1, the latent FIR coefficient vectors often pop-ulate only a structured subset of the high-dimensional space R R ofadaptive filter parameters [11], which leads to the assumption of alow-dimensional manifold that can be learned in advance from a setof G training data samples ˜ h g with g = 1 , . . . , G .With the assumption of manifolds being locally Euclidean [13],the coefficient vector manifold can be approximated by patches of lo-cally tangential hyperplanes M i as illustrated exemplarily in Fig. 1for R = 3 . Each tangential hyperplane M i describes a local ap-proximation of the manifold. This motivates the idea of confiningthe FIR coefficient vectors ˜ h to a union M loc = M ∪ · · · ∪ M I (7)of I affine subspaces M i := { ¯ h i + V i β i | β i ∈ R D i } of dimension D i . Each subspace M i is defined by its offset ¯ h i and its basis matrix M M Fig. 1 : Local tangential hyperplane approximation of the FIR coef-ficient vector manifold for R = 3 . V i ∈ R R × D i . While estimating the offset and the basis of a singleglobal affine subspace, i.e., I = 1 , by, e.g., PCA, is straightforward,it is not obvious how to learn the parameters of the local models.However, as each affine subspace M i denotes a local approximationof the manifold, its parameters can be estimated from the surround-ing training data samples. Therefore we first assign each trainingdata sample ˜ h g to a specific cluster U i by introducing the indicatorvariable z gi z gi := (cid:40) if ˜ h g ∈ U i , if ˜ h g / ∈ U i (8)and then use the clustered data for estimating the model parameters.The mean and covariance matrix of the respective RIR cluster U i canbe estimated by ¯ h i = 1 G i G (cid:88) g =1 z gi ˜ h g (9) C i = 1 G i − G (cid:88) g =1 z gi (cid:104) (˜ h g − ¯ h i )(˜ h g − ¯ h i ) T (cid:105) (10)with G i = (cid:80) Gg =1 z gi . A local basis matrix V i can be computed by,e.g., the eigenvectors u i corresponding to the largest eigenvalues d i of the parameter covariance matrix C i . Note that one is by no meanslimited to PCA for extracting the model parameters and can resort toany other algorithm for estimating a linear representation [17]. Dueto the broadband definition of the filter parameters in Eq. (4), the co-variance matrix C i describes, in addition to the correlation of differ-ent taps of one FIR filter ˆ h pq , also the correlation between differentFIR filters. Note that I = 1 denotes the special case of dimensionreduction by a single PCA which assumes globally-correlated FIRcoefficient vectors, i.e., strong correlation between all RIR samplesused as training data. The local affine subspace model relaxes thisassumption by requiring only a local correlation, i.e., only subsets ofthe RIR training data are assumed to be correlated.In [9] it was assumed that the clusters represent local source po-sition variations and the assignment of the samples was given byoracle knowledge. As this oracle knowledge cannot be assumed ingeneral and the resulting assignment is by no means guaranteed tobe optimum, we suggest to learn the assignment blindly from thedata by unsupervised K-Means clustering [18] which employs a Eu-clidean affinity measure which can only be assumed to be meaning-ful in a local neighbourhood of the samples. . LOCAL PROJECTION-BASED UPDATE DENOISING In the previous section we have introduced the union of I affine sub-space models as a low-dimensional approximation of the parameterspace of RIR coefficient vectors. Now we will describe how to ex-ploit this knowledge for the general OSASI update of the form (6)to become more robust against noise. The proposed algorithm is in-spired by the theory of manifold optimization, e.g., [19], in whichthe main idea is to exploit prior knowledge about the structure of theparameter space, e.g., matrix properties, by computing the steepestdescent direction with respect to the metric defined by the manifold. A powerful method for model selection is given by the evidencemaximization framework [20, 21]. It suggests to employ the like-lihood of each model p ( Y ( m ) |M i ) = (cid:90) p ( Y ( m ) | ˜ h , M i ) p (˜ h |M i ) d ˜ h , (11)given by the evidence of the observations, as selection criterion. Byassuming i.i.d. observations y ( n ) , the evidence of block m is de-fined by p ( Y ( m ) |M i ) := mL (cid:89) n = mL − L +1 p ( y ( n ) |M i ) . (12)Note that the assumption of i.i.d. observations is only a simplify-ing modelling assumption and its validity depends on the statisticalproperties of the excitation signal and the system. If we assume alinear Gaussian model for the likelihood [22] p ( y ( n ) | ˜ h , M i ) = p ( y ( n ) | ˜ h ) = N (cid:16) y ( n ) | ˜ X T ( n )˜ h , L (cid:17) (13)which is independent of the model M i and further assume a Gaus-sian prior for each model M i p (˜ h |M i ) = N (cid:16) ˜ h | ¯ h i , C i (cid:17) , (14)the sample-wise evidence is given by [20] p ( y ( n ) |M i ) = N (cid:16) y ( n ) | ˜ X T ( n )¯ h i , R i ( n ) (cid:17) (15)with covariance matrix R i ( n ) = L + ˜ X T ( n ) C i ˜ X ( n ) . (16)We introduced here the input signal matrix ˜ X T ( n ) = x T ( n ) ⊗ I Q ∈ R Q × R with ⊗ denoting the Kronecker product and I Q ∈ R Q × Q being the identity matrix, and the observation noise covariance ma-trix L ∈ R Q × Q . Instead of employing the logarithmic evidence log p ( Y ( m ) |M i ) of block m as objective function for model selec-tion, we suggest to use the recursive average evidence estimator ˆ E i ( m ) = λ ˆ E i ( m −
1) + (1 − λ ) log p ( Y ( m ) |M i ) (17)to reflect the smooth trajectories on the manifolds caused by RIRchanges. The recursive averaging factor λ ∈ [0 , in Eq. (17) mod-els an exponential weighting of temporally preceding observationsand needs to be chosen according the time-variance of the RIR. Fi-nally, the optimum model index i ∗ ( m ) at block index m is computedby i ∗ ( m ) = argmax i =1 ,...,I ˆ E i ( m ) . (18) We will now aim at interpreting the logarithmic evidence log p ( y ( n ) |M i ) c = − (cid:16) log det R i ( n ) + ¯ e T i ( n ) R − i ( n )¯ e i ( n ) (cid:17) (19)of the observed sample y ( n ) given the model M i with the estimatedaverage observation error ¯ e i ( n ) = y ( n ) − ˜ X T ( n )¯ h i (20)and c = denoting equality up to a constant term. As expected forevidence-based model selection [20, 21], Eq. (19) consists of twoterms which trade model complexity, described by log det R i ( n ) ,against data fitting, described by ¯ e T i ( n ) R − i ( n )¯ e i ( n ) . By addition-ally assuming uncorrelated observations y ( n ) , the logarithmic evi-dence (19) reduces to a sum of channel-wise measures log p ( y ( n ) |M i ) c = − Q (cid:88) q =1 (cid:18) log det r iq ( n ) + | ¯ e i ( n ) | r iq ( n ) (cid:19) . (21)The data-fitting term is given by the weighted sum of the squared av-erage observation errors ¯ e i ( n ) of model M i . As the diagonal termsof the covariance matrix (see Eq. (16)) r iq ( n ) denote an estimateof the observation power, we can interpret the data-fitting term asa sum of the channel-dependent instantaneous inverse Echo ReturnLoss Enhancement (ERLE) performance measures [23] which arewell-known in AEC. Thus, the logarithmic evidence (19) can beseen as an extension of the data-fitting ERLE performance measurewhich additionally penalizes complex models. As the direct evaluation of the logarithmic evidence by Eq. (19) iscomputationally demanding, we will now introduce an efficient ap-proximation based on the low-dimensionality assumption of the sub-spaces. Therefore, we insert the Eigenvalue Decomposition (EVD)of the prior covariance matrix C i = U i D i U T i of model M i intothe second term of the evidence covariance matrix computation (16) ˜ X T ( n ) C i ˜ X ( n ) = ˜ X T ( n ) U i D i D i U T i ˜ X ( n ) (22) = ˜ X T ( n ) ˇ U i ˇ U T i ˜ X ( n ) (23) = R (cid:88) r =1 ˇ y ir ( n )ˇ y T ir ( n ) , (24)which shows that it can be computed by a sum of outer products.The existence of the matrix square root is guaranteed, due to thesymmetry and positive semi-definiteness of the covariance matrix.Each vector ˇ y ir ( n ) of the sum is computed by a multiplication ofthe input signal matrix with a scaled eigenvector ˇ u ir = u ir √ d ir of the prior covariance matrix C i . As each matrix-vector product ˇ y ir ( n ) = ˜ X T ( n )ˇ u ir corresponds to a linear convolution of the in-put signals with a scaled eigenvector, i.e., eigenfilter, it can be effi-ciently computed by an overlap-save block processing structure. Thelatter also holds for the computation of the estimated average obser-vation ˜ X T ( n )¯ h i (see Eq. (15)).Furthermore, as we originally assumed the existence of a lower-dimensional subspace (see Sec. 3), the ordered eigenvalues d ir with r = 1 , . . . , R of the covariance matrix C i are assumed to exhibita pronounced decay of magnitude. Hence, it is reasonable to ap-proximate Eq. (24) by the K i = D i largest terms corresponding tothe dominant eigenvalues. Note that often K i can be chosen muchmaller compared to D i , i.e., K i (cid:28) D i , as the first K i eigenfiltersprovide sufficient discrimination for model selection. This allowsfor computationally efficient low-rank evidence approximations. As each sub model M i denotes an affine subspace of R R , the pa-rameter vector ˜ h p i resulting from orthogonal projection onto M i reads (see, e.g., [24]) ˜ h p i = ¯ h i + P i (cid:16) ˜ h − ¯ h i (cid:17) (25)with the rank-deficient projection matrix P i = V i ( V i T V i ) − V T i . (26)Note that the projection matrix P i depends only on the training dataand can thus be computed a priori. Alg. 1 gives a detailed description of the proposed Local Projection-based Update Denoising (LPUD) for OSASI. For each block of ob-servations, indexed by m , the evidence estimates of all models M i are updated by Eq. (17). Hereby, the evidence p ( Y ( m ) |M i ) ofblock m , given model M i , is efficiently computed by an overlap-save processing and the low-rank evidence approximation derived inSec. 4.2. If the optimum model index i ∗ ( m ) has changed relative tothe previous block, the previous parameter estimate ˜ h ( m − is pro-jected onto the optimum affine subspace M i ∗ ( m ) by Eq. (25). Thisensures that the updated FIR estimate will be confined to M loc . Sub-sequently, the parameter update ∆˜ h ( m ) is computed by a suitableOSASI algorithm and projected onto the optimum affine subspaceby multiplication with the projection matrix P i ∗ ( m ) (see Eq. (26)).Finally, the projected update is used for optimizing the adaptive filtercoefficient vector (see Eq. (6)). Algorithm 1
OSASI by LPUD for m = 1 , . . . , M do Update evidences of all I models by Eq. (17)Compute optimum model M i ∗ ( m ) by Eq. (18) if i ∗ ( m ) (cid:54) = i ∗ ( m − then Project ˜ h ( m − onto opt. aff. subspace by Eq. (25) end if Compute parameter update ∆˜ h ( m ) Project parameter update: ∆˜ h ( m ) ← P i ∗ ( m ) ∆˜ h ( m ) Update FIR coefficients: ˜ h ( m ) ← ˜ h ( m −
1) + ∆˜ h ( m ) end for 5. EXPERIMENTS In this section we will evaluate the proposed LPUD algorithm ina simulated environment with respect to its performance in noisyscenarios. Therefore, we consider an acoustic system identificationscenario with Q = 2 microphones of cm spacing and a singlesource, i.e., P = 1 , located on a sector of a sphere with a radiusof . m, an azimuth angle range θ ∈ [30 ◦ , ◦ ] and an elevationangle range φ ∈ [ − ◦ , ◦ ] . All P Q
RIRs h pq have been simulatedaccording to the image method [25, 26] with maximum reflection or-der for a room of dimension [6 , , . m with a reverberation time of T = 0 . s, a sampling frequency of f s = 8 kHz and an RIRlength of W = 4096 samples. The observed microphone signalshave been sampled from the Gaussian density y ( n ) ∼ N ( d ( n ) , L ) with d ( n ) = H T x ( n ) ∈ R Q denoting the true source image at themicrophones and H being the acoustic transmission matrix whichincludes the true RIRs h pq analogously to Eq. (4). The noise co-variance matrix L is a scaled identity matrix with the scale factordetermined by the SNR.For assessing the performance of the proposed algorithm, weintroduce the signal-dependent average ERLE measureERLE = 1( N − N + 1) Q N (cid:88) n = N Q (cid:88) q =1 (cid:18) d q ( n ) ( d q ( n ) − ˆ y q ( n )) (cid:19) (27)and the signal-independent average system mismatch Υ = 1( M − M + 1) M (cid:88) m = M Υ( m ) (28)which is computed by the temporal average of the block-dependentsystem mismatch Υ( m ) = 1 P Q
P,Q (cid:88) p,q =1 (cid:32) || h pq − ˆ h pq ( m ) || || h pq || (cid:33) . (29)Note that, as the adaptive filter length L is usually much smaller thanthe true filter length W of the physical system to be modelled, weonly use the first L taps of h pq to obtain an estimate of the attain-able system mismatch. The observed signal that is caused by theremaining W − L taps of the true RIR acts as an error in the in-troduced signal model Eq. (1) and results in an upper bound for thesignal-dependent ERLE measure. It corresponds to the excess errorin statistically optimum filtering [1].As pointed out in Sec. 4, the presented method is not tied toany specific OSASI algorithm. In this paper we employ, as a fast-converging state-of-the-art algorithm, the Generalized Frequency-Domain Adaptive Filter (GFDAF) [3] which represents a compu-tationally efficient optimization of the well-known block-recursiveleast-squares cost function in the frequency domain. For Single-Input Single-Output OSASI applications the GFDAF is equivalent tothe popular FDAF [1] with a recursive power spectral density (PSD)estimation and an additional data-dependent dynamical regulariza-tion. We use a filter length of L = 1024 and no block overlap, a con-stant step size of µ = 1 , a recursive PSD averaging factor of ν = 0 . and the dynamical regularization parameters δ max = δ = 1 . Notethat for stationary noise and non-stationary excitation signals, e.g.,speech, VSSS is still beneficial due to the time-varying SNR.In the following we will evaluate the proposed LPUD algorithmagainst two baselines, i.e., the raw GFDAF and a Global Projection-based Update Denoising (GPUD). The GPUD algorithm is a specialcase of the LPUD with I = 1 . The training data for learning themodel consisted of G = 5000 RIRs which were simulated accord-ing to randomly drawn source positions. The global affine subspacedimension is set to D = 550 which showed good overall perfor-mance. The LPUD algorithm consists of I = 40 clusters of iden-tical local dimension D i = 50 . The cluster assignment was learnedby the K-Means algorithm [18, 27]. Furthermore, the evidence ofeach model M i was approximated by the K i = 5 most dominanteigenfilters (see Sec. 4.2).Fig. 2 shows the block-dependent system mismatch Υ( m ) ofall algorithms for different types of input signals, i.e., stationary FDAF GPUD LPUD − − WGN l og Υ ( m ) − − SpeechTime in s l og Υ ( m ) Fig. 2 : Block-dependent system mismatch Υ( m ) for a SNR of − dB in dependence of the excitation signal type.White Gaussian Noise (WGN) and speech signals, and a SNR of − dB. For each type of input signal we have averaged Υ( m ) over independent Monte Carlo experiments which are defined by ran-domly drawing the source position and the source signals from therespective models. This limits the influence of a specific input sig-nal and source position. As speech source signals we employed different talkers reading out random concatenations of IEEE Har-vard sentences [28]. As can be concluded from Fig. 2 all algorithmsreach their steady-state estimate after approximately s. While thesteady-state performance of the GPUD improves only slightly incomparison to the GFDAF, the LPUD results in a significant im-provement for both types of excitation signals. By comparing WGNto speech excitation, we observe that WGN shows consistently ap-proximately dB smaller system mismatch than speech for all al-gorithms. This reflects the well-known difference in convergencebehaviour of adaptive filters caused by the nonstationarity and non-whiteness of speech signals [1, 3, 29]. While for this demanding sce-nario the state-of-the-art algorithm GFDAF is not capable of achiev-ing a sufficient system identification performance anymore, the pro-posed LPUD achieves an average system mismatch of − dB af-ter convergence. Additionally, by comparing the initial convergencephases of the algorithms, we observe an almost instantaneous gain ofthe LPUD which is caused by the projection on the estimated affinesubspace. This results in superior system identification performanceeven during the early convergence phase, i.e., the first second.In Fig. 3 we compare the respective algorithms for different SNRlevels in terms of average ERLE and system mismatch. The resultsare averaged over s of WGN excitation and s of speech exci-tation and independent Monte Carlo experiments. The respectivelimits of the sums in Eqs. (27) and (28), i.e., N , N , M , M , arechosen to divide the signals into two parts of equal length. This al-lows to assess the Convergence Phase ( CP ), i.e., the first part, andthe Steady-State ( SS ), i.e., the second part, independently. As canbe concluded from Fig. 3 the proposed LPUD method significantly GFDAF ( CP ) GPUD ( CP ) LPUD ( CP )GFDAF ( SS ) GPUD ( SS ) LPUD ( SS ) − −
10 0 10 − SNR in dB l og E R LE WGN − −
10 0 10 − SpeechSNR in dB − −
10 0 10 − WGNSNR in dB l og Υ − −
10 0 10 − SpeechSNR in dB
Fig. 3 : Performance evaluation of the various algorithms in depen-dence of the SNR and the excitation signal type ( CP : ConvergencePhase, SS : Steady State).outperforms the GFDAF for all SNR levels in terms of steady-stateperformance for both ERLE and system mismatch Υ . This suggestsan efficient denoising of the update in low-SNR applications whilestill preserving a sufficient model flexibility for precise system iden-tification in high-SNR scenarios. Additionally, by comparing theGPUD to the LPUD algorithm, one can observe the advantage of as-suming only local linearity compared to the global linear approachwhich lacks the aforementioned trade-off opportunity. Finally, weobserved that the optimum subspace dimensions D i are strongly re-lated to the respective SNR which would allow even higher perfor-mance improvements by choosing the signal-dependent optimum foreach scenario.
6. SUMMARY AND OUTLOOK
In this paper we presented a novel method for improved OSASI innoisy environments by exploiting prior knowledge about the space ofRIRs for a given acoustic scenario. The proposed method is basedon the projection of the parameter update onto an affine subspacewhich is selected by a novel computationally efficient computationof the associated evidence. The benefit of the proposed update de-noising for a state-of-the-art OSASI algorithm was corroborated bysimulated experiments.Future research aims at evaluating the benefit of various dictio-nary learning algorithms in comparison to PCA for estimating themodel parameters. Furthermore, probabilistic mixtures of subspacemodels, e.g., [30], are of interest to improve the unsupervised clus-tering of the training data in Sec. 3. Finally, an adaptive estimationof the noise variances by, e.g., an Expectation-Maximzation (EM)framework, and an adaptive computation of the optimum subspacedimension appears to be promising for non-stationary noise signals. . REFERENCES [1] S. Haykin,
Adaptive filter theory , Prentice Hall, Upper SaddleRiver, NJ, 2002.[2] P. S. R. Diniz,
Adaptive Filtering: Algorithms and PracticalImplementation , Springer, Berlin, Heidelberg, 2007.[3] H. Buchner, J. Benesty, and W. Kellermann, “Generalized mul-tichannel frequency-domain adaptive filtering: efficient real-ization and application to hands-free speech communication,”
Signal Processing , vol. 85, no. 3, pp. 549–570, Mar. 2005.[4] S. Malik and G. Enzner, “Recursive Bayesian Control of Multi-channel Acoustic Echo Cancellation,”
IEEE Signal ProcessingLetters , vol. 18, no. 11, pp. 619–622, Nov. 2011.[5] T. Gansler, M. Hansson, C.-J. Ivarsson, and G. Salomonsson,“A double-talk detector based on coherence,”
IEEE Transac-tions on Communications , vol. 44, no. 11, pp. 1421–1427, Nov.1996.[6] J. Benesty, D.R. Morgan, and J.H. Cho, “A new class of dou-bletalk detectors based on cross-correlation,”
IEEE Transac-tions on Speech and Audio Processing , vol. 8, no. 2, pp. 168–172, Mar. 2000.[7] S. Malik and G. Enzner, “Online maximum-likelihood learn-ing of time-varying dynamical models in block-frequency-domain,” in
International Conference on Acoustics, Speechand Signal Processing (ICASSP) , Dallas, USA, Mar. 2010.[8] M. Fozunbal, T. Kalker, and R. W. Schafer, “Multi-ChannelEcho Control by Model Learning,” in
International Workshopon Acoustic Echo and Noise Control (IWAENC) , Seattle, USA,Sept. 2008.[9] T. Koren, R. Talmon, and I. Cohen, “Supervised system iden-tification based on local PCA models,” in
International Con-ference on Acoustics, Speech and Signal Processing (ICASSP) ,Kyoto, Japan, Mar. 2012.[10] R. Talmon and S. Gannot, “Relative transfer function identi-fication on manifolds for supervised GSC beamformers,” in
European Conference on Signal Processing (EUSIPCO) , Mar-rakech, Marocco, Sept. 2013.[11] R. Talmon, I. Cohen, S. Gannot, and R. R. Coifman, “Diffu-sion Maps for Signal Processing: A Deeper Look at Manifold-Learning Techniques Based on Kernels and Graphs,”
IEEESignal Processing Magazine , vol. 30, no. 4, pp. 75–86, July2013.[12] B. Laufer-Goldshtein, R. Talmon, and S. Gannot, “A Study onManifolds of Acoustic Responses,” in
Latent Variable Analysisand Signal Separation (LVA/ICA) , Liberec, Czech Republic,Aug. 2015.[13] L.W. Tu,
An Introduction to Manifolds , Universitext. SpringerNew York, 2010.[14] M.M. Sondhi, D.R. Morgan, and J.L. Hall, “Stereophonicacoustic echo cancellation-an overview of the fundamentalproblem,”
IEEE Signal Processing Letters , vol. 2, no. 8, pp.148–151, Aug. 1995.[15] J. Benesty, D.R. Morgan, and M.M. Sondhi, “A better under-standing and an improved solution to the specific problems ofstereophonic acoustic echo cancellation,”
IEEE Transactionson Speech and Audio Processing , vol. 6, no. 2, pp. 156–165,Mar. 1998. [16] P. J. Dhrymes,
Matrix Vectorization , Springer New York, NY,2000.[17] M. Elad,
Sparse and Redundant Representations - From The-ory to Applications in Signal and Image Processing , Springer,New York, NY, 2010.[18] S. Lloyd, “Least squares quantization in PCM,”
IEEE Transac-tions on Information Theory , vol. 28, no. 2, pp. 129–137, Mar.1982.[19] P.-A. Absil, R. Mahony, and R. Sepulchre,
Optimization Al-gorithms on Matrix Manifolds , Princeton University Press,Princeton, NJ, 2008.[20] C. M. Bishop,
Pattern Recognition and Machine Learning (In-formation Science and Statistics) , Springer, Berlin, Heidel-berg, 2007.[21] J. Ding, V. Tarokh, and Y. Yang, “Model selection techniques:An overview,”
IEEE Signal Processing Magazine , vol. 35, no.6, pp. 16–34, 2018.[22] S. Roweis and Z. Ghahramani, “A unifying review of linearGaussian models,”
Neural computation , vol. 11, no. 2, pp.305–345, 1999.[23] G. Enzner, H. Buchner, A. Favrot, and F. Kuech, “AcousticEcho Control,” in
Academic Press Library in Signal Process-ing , vol. 4, pp. 807–877. Elsevier, 2014.[24] G. Strang,
Linear Algebra and its Applications , Thomson,Brooks/Cole, Belmont, CA, 2006.[25] J. B. Allen and D. A. Berkley, “Image method for efficientlysimulating small-room acoustics,”
Journal of the AcousticalSociety of America , vol. 65, no. 4, pp. 943–950, 1979.[26] E. Habets, “Room Impulse Response Generator,” Tech. Rep.,Technische Universiteit Eindhoven, Sept. 2010.[27] D. Arthur and V. Vassilvitskii, “K-means++: The advantagesof careful seeding,” in
Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms , New Orleans, USA,2007.[28] L. M. Panfili, J. Haywood, D. R. McCloy, P. E. Souza,and R. A. Wright, “The UW/NU corpus, version2.0,” https://depts.washington.edu/phonlab/projects/uwnu.php , 2017.[29] C. Breining, P. Dreiscitel, E. Hansler, A. Mader, B. Nitsch,H. Puder, T. Schertler, G. Schmidt, and J. Tilp, “Acousticecho control. an application of very-high-order adaptive fil-ters,”
IEEE Signal Processing Magazine , vol. 16, no. 4, pp.42–69, 1999.[30] M. E Tipping and C. M Bishop, “Mixtures of ProbabilisticPrincipal Component Analysers,”