[PDF] Online Supervised Acoustic System Identification exploiting Prelearned Local Affine Subspace Models

Abstract

In this paper we present a novel algorithm for improved block-online supervised acoustic system identification in adverse noise scenarios by exploiting prior knowledge about the space of Room Impulse Responses (RIRs). The method is based on the assumption that the variability of the unknown RIRs is controlled by only few physical parameters, describing, e.g., source position movements, and thus is confined to a low-dimensional manifold which is modelled by a union of affine subspaces. The offsets and bases of the affine subspaces are learned in advance from training data by unsupervised clustering followed by Principal Component Analysis. We suggest to denoise the parameter update of any supervised adaptive filter by projecting it onto an optimal affine subspace which is selected based on a novel computationally efficient approximation of the associated evidence. The proposed method significantly improves the system identification performance of state-of-the-art algorithms in adverse noise scenarios.

Full PDF

22020 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 21–24, 2020, ESPOO, FINLAND

ONLINE SUPERVISED ACOUSTIC SYSTEM IDENTIFICATION EXPLOITINGPRELEARNED LOCAL AFFINE SUBSPACE MODELS

Thomas Haubner, Andreas Brendel and Walter Kellermann

Multimedia Communications and Signal Processing, Friedrich-Alexander-University Erlangen-N¨urnberg,Cauerstr. 7, D-91058 Erlangen, Germany, [email protected]

ABSTRACT

In this paper we present a novel algorithm for improved block-onlinesupervised acoustic system identiﬁcation in adverse noise scenariosby exploiting prior knowledge about the space of Room Impulse Re-sponses (RIRs). The method is based on the assumption that thevariability of the unknown RIRs is controlled by only few physicalparameters, describing, e.g., source position movements, and thusis conﬁned to a low-dimensional manifold which is modelled by aunion of afﬁne subspaces. The offsets and bases of the afﬁne sub-spaces are learned in advance from training data by unsupervisedclustering followed by Principal Component Analysis. We suggestto denoise the parameter update of any supervised adaptive ﬁlter byprojecting it onto an optimal afﬁne subspace which is selected basedon a novel computationally efﬁcient approximation of the associatedevidence. The proposed method signiﬁcantly improves the systemidentiﬁcation performance of state-of-the-art algorithms in adversenoise scenarios.

Index Terms — Online Supervised System Identiﬁcation, Acous-tic Echo Cancellation, Model Learning, Local Afﬁne Subspace,Model Selection

1. INTRODUCTION

Online Supervised Acoustic System Identiﬁcation (OSASI) is oneof the classical tasks in acoustic signal processing with a multitudeof applications [1, 2]. In this paper we consider linear convo-lutive Multiple-Input Multiple-Output (MIMO) applications withhigh-level interfering noise sources which are prone to non-robustOSASI performance. Such situations are typically encountered inhands-free acoustic human-machine interfaces which operate in,e.g., driving cars with open windows, or factories, and often involvenegative Signal-to-Noise-Ratios (SNRs). MIMO OSASI is usuallytackled by frequency-domain adaptive ﬁlter algorithms which takefor its optimization the statistical properties of the excitation signals,e.g., non-stationarity, temporal and spatial correlation, into account[3, 4]. Noise and interference in the observations is often addressedby Variable Step Size Selection (VSSS) methods which use either bi-nary or smooth adaptation control. Binary adaptation control, whichin the context of Acoustic Echo Cancellation (AEC) is applied tocope with double-talk, stipulates halting the adaptation during peri-ods of high interference levels [5, 6]. In contrast, smooth adaptationcontrol continuously adjusts the step size in dependence of a noiseestimate. A powerful model-based approach for smooth adaptationcontrol, based on an online Maximum Likelihood (ML) algorithm,

This work was supported by the DFG under contract no < Ke890/10-2 > within the Research Unit FOR2457 ”Acoustic Sensor Networks”. was introduced in [7]. However, VSSS-based algorithms still resultin limited system identiﬁcation performance for applications withpersistent low SNR.Besides adaptation control, the exploitation of prior knowledgeabout the unknown system has proven to be beneﬁcial for OSASIwith high-level interfering noise [8, 9, 10]. This prior knowledge isusually extracted in advance from a training data set of Room Im-pulse Response (RIR) samples. The main assumption behind theseapproaches is the existence of a low-dimensional manifold that isembedded in the high-dimensional space of adaptive ﬁlter param-eters for a given OSASI scenario. This can be motivated by theassumption that the variability of the unknown RIRs is controlledby only few physical parameters, describing, e.g., source positionmovements, temperature changes or movement of furniture [11, 12].There is a variety of different approaches to model this manifoldwith the most prominent one assuming that the RIRs are conﬁnedto a single afﬁne subspace which can be estimated, e.g., by Prin-cipal Component Analysis (PCA). In [8] this model has been em-ployed by regularizing a Least-Squares (LS) cost function with theMahalanobis distance based on the estimated RIR covariance matrix.The strong assumption of globally-correlated RIRs is however onlyrarely valid in practice, e.g., see [12]. Thus, [9] modiﬁes it to a localPCA model, which can be motivated by the assumption of manifoldsbeing locally Euclidean [13]. By the increased model ﬂexibility,which results from employing several PCAs instead of a single one,[9] shows a performance improvement in an ofﬂine LS-based systemidentiﬁcation task. Hereby, each PCA is associated with a speciﬁcsource position and estimated from RIR samples which correspondto local source position movements. By employing several mutuallyexclusive local models, a model selection is required. As selectioncriterion [9] suggests the Frobenius norm of the difference of thea-priori-learned model covariance matrices and an estimated FiniteImpulse Response (FIR) covariance matrix. The latter one is esti-mated from the solutions of several LS system identiﬁcation prob-lems with local source position variations. In [10] another ofﬂine LSapproach for noise-robust system identiﬁcation is introduced whichrepresents the training data by a globally-nonlinear manifold model.As [9] and [10] rely on an afﬁnity measure between a statistic of theadaptive ﬁlter estimate and the model parameters, they are suscep-tible to nonunique solutions to the system identiﬁcation problemswhich result, e.g., from cross-correlated input signals [14, 15].In this paper we introduce a general method which allows toinclude prior knowledge about the RIRs into any OSASI algorithmto enhance its performance in adverse noise scenarios. The methodrelies on the assumption that the RIRs can be modelled by a set ofafﬁne subspaces whose parameters are estimated by unsupervisedclustering and PCA. We suggest to denoise the estimated FIR coef-ﬁcient updates of any OSASI algorithm by projecting it onto an op-978-1-7281-6662-9/20/$31.00 c (cid:13) a r X i v : . [ ee ss . A S ] J u l imally selected afﬁne subspace. Furthermore, we introduce a prob-abilistic approach for computationally-efﬁcient online model selec-tion by evidence maximization which is independent of the currentFIR estimate of the OSASI algorithm.

2. SUPERVISED ADAPTIVE MIMO FILTERING

In this section we will deﬁne a signal model for MIMO OSASI.Hereby, it is assumed that there exists a linear functional relation-ship between the n th sample of the Q estimated output signals ˆ y ( n ) = ˆ H T ( n ) x ( n ) ∈ R Q (1)and the most recent L samples of the P input signals x ( n ) = (cid:0) x T ( n ) , . . . , x T P ( n ) (cid:1) T ∈ R PL , (2)with x p ( n ) = (cid:0) x p ( n ) , . . . , x p ( n − L + 1) (cid:1) T ∈ R L . (3)The estimated transmission matrix at time instant n ˆ H ( n ) =  ˆ h ( n ) . . . ˆ h Q ( n ) ... . . . ... ˆ h P ( n ) . . . ˆ h PQ ( n )  ∈ R PL × Q (4)models FIR ﬁlters ˆ h pq ( n ) of length L between each input and eachoutput signal. As most algorithms directly process blocks of obser-vations, we introduce the block output matrix ˆ Y ( m ) = (cid:0) ˆ y ( mL ) , . . . , ˆ y ( mL − L + 1) (cid:1) ∈ R Q × L (5)which captures L samples into one block indexed by m .The estimation of the transmission matrix Eq. (4) represents anoptimization problem in the high-dimensional parameter space R R of dimension R = P LQ with elements ˜ h ( n ) = vec ( ˆ H T ( n )) andvec ( · ) being the vectorization operator [16]. Then, the generic pa-rameter update for iterative OSASI algorithms reads: ˜ h ( m ) = ˜ h ( m −

1) + ∆˜ h ( m ) (6)with ∆˜ h ( m ) denoting the update term. Note that in the followingthe block-dependency m of the parameters ˜ h ( m ) is omitted if pos-sible for notational convenience.

3. LOCAL AFFINE SUBSPACE MODELS

As discussed in Sec. 1, the latent FIR coefﬁcient vectors often pop-ulate only a structured subset of the high-dimensional space R R ofadaptive ﬁlter parameters [11], which leads to the assumption of alow-dimensional manifold that can be learned in advance from a setof G training data samples ˜ h g with g = 1 , . . . , G .With the assumption of manifolds being locally Euclidean [13],the coefﬁcient vector manifold can be approximated by patches of lo-cally tangential hyperplanes M i as illustrated exemplarily in Fig. 1for R = 3 . Each tangential hyperplane M i describes a local ap-proximation of the manifold. This motivates the idea of conﬁningthe FIR coefﬁcient vectors ˜ h to a union M loc = M ∪ · · · ∪ M I (7)of I afﬁne subspaces M i := { ¯ h i + V i β i | β i ∈ R D i } of dimension D i . Each subspace M i is deﬁned by its offset ¯ h i and its basis matrix M M Fig. 1 : Local tangential hyperplane approximation of the FIR coef-ﬁcient vector manifold for R = 3 . V i ∈ R R × D i . While estimating the offset and the basis of a singleglobal afﬁne subspace, i.e., I = 1 , by, e.g., PCA, is straightforward,it is not obvious how to learn the parameters of the local models.However, as each afﬁne subspace M i denotes a local approximationof the manifold, its parameters can be estimated from the surround-ing training data samples. Therefore we ﬁrst assign each trainingdata sample ˜ h g to a speciﬁc cluster U i by introducing the indicatorvariable z gi z gi := (cid:40) if ˜ h g ∈ U i , if ˜ h g / ∈ U i (8)and then use the clustered data for estimating the model parameters.The mean and covariance matrix of the respective RIR cluster U i canbe estimated by ¯ h i = 1 G i G (cid:88) g =1 z gi ˜ h g (9) C i = 1 G i − G (cid:88) g =1 z gi (cid:104) (˜ h g − ¯ h i )(˜ h g − ¯ h i ) T (cid:105) (10)with G i = (cid:80) Gg =1 z gi . A local basis matrix V i can be computed by,e.g., the eigenvectors u i corresponding to the largest eigenvalues d i of the parameter covariance matrix C i . Note that one is by no meanslimited to PCA for extracting the model parameters and can resort toany other algorithm for estimating a linear representation [17]. Dueto the broadband deﬁnition of the ﬁlter parameters in Eq. (4), the co-variance matrix C i describes, in addition to the correlation of differ-ent taps of one FIR ﬁlter ˆ h pq , also the correlation between differentFIR ﬁlters. Note that I = 1 denotes the special case of dimensionreduction by a single PCA which assumes globally-correlated FIRcoefﬁcient vectors, i.e., strong correlation between all RIR samplesused as training data. The local afﬁne subspace model relaxes thisassumption by requiring only a local correlation, i.e., only subsets ofthe RIR training data are assumed to be correlated.In [9] it was assumed that the clusters represent local source po-sition variations and the assignment of the samples was given byoracle knowledge. As this oracle knowledge cannot be assumed ingeneral and the resulting assignment is by no means guaranteed tobe optimum, we suggest to learn the assignment blindly from thedata by unsupervised K-Means clustering [18] which employs a Eu-clidean afﬁnity measure which can only be assumed to be meaning-ful in a local neighbourhood of the samples. . LOCAL PROJECTION-BASED UPDATE DENOISING In the previous section we have introduced the union of I afﬁne sub-space models as a low-dimensional approximation of the parameterspace of RIR coefﬁcient vectors. Now we will describe how to ex-ploit this knowledge for the general OSASI update of the form (6)to become more robust against noise. The proposed algorithm is in-spired by the theory of manifold optimization, e.g., [19], in whichthe main idea is to exploit prior knowledge about the structure of theparameter space, e.g., matrix properties, by computing the steepestdescent direction with respect to the metric deﬁned by the manifold. A powerful method for model selection is given by the evidencemaximization framework [20, 21]. It suggests to employ the like-lihood of each model p ( Y ( m ) |M i ) = (cid:90) p ( Y ( m ) | ˜ h , M i ) p (˜ h |M i ) d ˜ h , (11)given by the evidence of the observations, as selection criterion. Byassuming i.i.d. observations y ( n ) , the evidence of block m is de-ﬁned by p ( Y ( m ) |M i ) := mL (cid:89) n = mL − L +1 p ( y ( n ) |M i ) . (12)Note that the assumption of i.i.d. observations is only a simplify-ing modelling assumption and its validity depends on the statisticalproperties of the excitation signal and the system. If we assume alinear Gaussian model for the likelihood [22] p ( y ( n ) | ˜ h , M i ) = p ( y ( n ) | ˜ h ) = N (cid:16) y ( n ) | ˜ X T ( n )˜ h , L (cid:17) (13)which is independent of the model M i and further assume a Gaus-sian prior for each model M i p (˜ h |M i ) = N (cid:16) ˜ h | ¯ h i , C i (cid:17) , (14)the sample-wise evidence is given by [20] p ( y ( n ) |M i ) = N (cid:16) y ( n ) | ˜ X T ( n )¯ h i , R i ( n ) (cid:17) (15)with covariance matrix R i ( n ) = L + ˜ X T ( n ) C i ˜ X ( n ) . (16)We introduced here the input signal matrix ˜ X T ( n ) = x T ( n ) ⊗ I Q ∈ R Q × R with ⊗ denoting the Kronecker product and I Q ∈ R Q × Q being the identity matrix, and the observation noise covariance ma-trix L ∈ R Q × Q . Instead of employing the logarithmic evidence log p ( Y ( m ) |M i ) of block m as objective function for model selec-tion, we suggest to use the recursive average evidence estimator ˆ E i ( m ) = λ ˆ E i ( m −

1) + (1 − λ ) log p ( Y ( m ) |M i ) (17)to reﬂect the smooth trajectories on the manifolds caused by RIRchanges. The recursive averaging factor λ ∈ [0 , in Eq. (17) mod-els an exponential weighting of temporally preceding observationsand needs to be chosen according the time-variance of the RIR. Fi-nally, the optimum model index i ∗ ( m ) at block index m is computedby i ∗ ( m ) = argmax i =1 ,...,I ˆ E i ( m ) . (18) We will now aim at interpreting the logarithmic evidence log p ( y ( n ) |M i ) c = − (cid:16) log det R i ( n ) + ¯ e T i ( n ) R − i ( n )¯ e i ( n ) (cid:17) (19)of the observed sample y ( n ) given the model M i with the estimatedaverage observation error ¯ e i ( n ) = y ( n ) − ˜ X T ( n )¯ h i (20)and c = denoting equality up to a constant term. As expected forevidence-based model selection [20, 21], Eq. (19) consists of twoterms which trade model complexity, described by log det R i ( n ) ,against data ﬁtting, described by ¯ e T i ( n ) R − i ( n )¯ e i ( n ) . By addition-ally assuming uncorrelated observations y ( n ) , the logarithmic evi-dence (19) reduces to a sum of channel-wise measures log p ( y ( n ) |M i ) c = − Q (cid:88) q =1 (cid:18) log det r iq ( n ) + | ¯ e i ( n ) | r iq ( n ) (cid:19) . (21)The data-ﬁtting term is given by the weighted sum of the squared av-erage observation errors ¯ e i ( n ) of model M i . As the diagonal termsof the covariance matrix (see Eq. (16)) r iq ( n ) denote an estimateof the observation power, we can interpret the data-ﬁtting term asa sum of the channel-dependent instantaneous inverse Echo ReturnLoss Enhancement (ERLE) performance measures [23] which arewell-known in AEC. Thus, the logarithmic evidence (19) can beseen as an extension of the data-ﬁtting ERLE performance measurewhich additionally penalizes complex models. As the direct evaluation of the logarithmic evidence by Eq. (19) iscomputationally demanding, we will now introduce an efﬁcient ap-proximation based on the low-dimensionality assumption of the sub-spaces. Therefore, we insert the Eigenvalue Decomposition (EVD)of the prior covariance matrix C i = U i D i U T i of model M i intothe second term of the evidence covariance matrix computation (16) ˜ X T ( n ) C i ˜ X ( n ) = ˜ X T ( n ) U i D i D i U T i ˜ X ( n ) (22) = ˜ X T ( n ) ˇ U i ˇ U T i ˜ X ( n ) (23) = R (cid:88) r =1 ˇ y ir ( n )ˇ y T ir ( n ) , (24)which shows that it can be computed by a sum of outer products.The existence of the matrix square root is guaranteed, due to thesymmetry and positive semi-deﬁniteness of the covariance matrix.Each vector ˇ y ir ( n ) of the sum is computed by a multiplication ofthe input signal matrix with a scaled eigenvector ˇ u ir = u ir √ d ir of the prior covariance matrix C i . As each matrix-vector product ˇ y ir ( n ) = ˜ X T ( n )ˇ u ir corresponds to a linear convolution of the in-put signals with a scaled eigenvector, i.e., eigenﬁlter, it can be efﬁ-ciently computed by an overlap-save block processing structure. Thelatter also holds for the computation of the estimated average obser-vation ˜ X T ( n )¯ h i (see Eq. (15)).Furthermore, as we originally assumed the existence of a lower-dimensional subspace (see Sec. 3), the ordered eigenvalues d ir with r = 1 , . . . , R of the covariance matrix C i are assumed to exhibita pronounced decay of magnitude. Hence, it is reasonable to ap-proximate Eq. (24) by the K i = D i largest terms corresponding tothe dominant eigenvalues. Note that often K i can be chosen muchmaller compared to D i , i.e., K i (cid:28) D i , as the ﬁrst K i eigenﬁltersprovide sufﬁcient discrimination for model selection. This allowsfor computationally efﬁcient low-rank evidence approximations. As each sub model M i denotes an afﬁne subspace of R R , the pa-rameter vector ˜ h p i resulting from orthogonal projection onto M i reads (see, e.g., [24]) ˜ h p i = ¯ h i + P i (cid:16) ˜ h − ¯ h i (cid:17) (25)with the rank-deﬁcient projection matrix P i = V i ( V i T V i ) − V T i . (26)Note that the projection matrix P i depends only on the training dataand can thus be computed a priori. Alg. 1 gives a detailed description of the proposed Local Projection-based Update Denoising (LPUD) for OSASI. For each block of ob-servations, indexed by m , the evidence estimates of all models M i are updated by Eq. (17). Hereby, the evidence p ( Y ( m ) |M i ) ofblock m , given model M i , is efﬁciently computed by an overlap-save processing and the low-rank evidence approximation derived inSec. 4.2. If the optimum model index i ∗ ( m ) has changed relative tothe previous block, the previous parameter estimate ˜ h ( m − is pro-jected onto the optimum afﬁne subspace M i ∗ ( m ) by Eq. (25). Thisensures that the updated FIR estimate will be conﬁned to M loc . Sub-sequently, the parameter update ∆˜ h ( m ) is computed by a suitableOSASI algorithm and projected onto the optimum afﬁne subspaceby multiplication with the projection matrix P i ∗ ( m ) (see Eq. (26)).Finally, the projected update is used for optimizing the adaptive ﬁltercoefﬁcient vector (see Eq. (6)). Algorithm 1

OSASI by LPUD for m = 1 , . . . , M do Update evidences of all I models by Eq. (17)Compute optimum model M i ∗ ( m ) by Eq. (18) if i ∗ ( m ) (cid:54) = i ∗ ( m − then Project ˜ h ( m − onto opt. aff. subspace by Eq. (25) end if Compute parameter update ∆˜ h ( m ) Project parameter update: ∆˜ h ( m ) ← P i ∗ ( m ) ∆˜ h ( m ) Update FIR coefﬁcients: ˜ h ( m ) ← ˜ h ( m −

1) + ∆˜ h ( m ) end for 5. EXPERIMENTS In this section we will evaluate the proposed LPUD algorithm ina simulated environment with respect to its performance in noisyscenarios. Therefore, we consider an acoustic system identiﬁcationscenario with Q = 2 microphones of cm spacing and a singlesource, i.e., P = 1 , located on a sector of a sphere with a radiusof . m, an azimuth angle range θ ∈ [30 ◦ , ◦ ] and an elevationangle range φ ∈ [ − ◦ , ◦ ] . All P Q

RIRs h pq have been simulatedaccording to the image method [25, 26] with maximum reﬂection or-der for a room of dimension [6 , , . m with a reverberation time of T = 0 . s, a sampling frequency of f s = 8 kHz and an RIRlength of W = 4096 samples. The observed microphone signalshave been sampled from the Gaussian density y ( n ) ∼ N ( d ( n ) , L ) with d ( n ) = H T x ( n ) ∈ R Q denoting the true source image at themicrophones and H being the acoustic transmission matrix whichincludes the true RIRs h pq analogously to Eq. (4). The noise co-variance matrix L is a scaled identity matrix with the scale factordetermined by the SNR.For assessing the performance of the proposed algorithm, weintroduce the signal-dependent average ERLE measureERLE = 1( N − N + 1) Q N (cid:88) n = N Q (cid:88) q =1 (cid:18) d q ( n ) ( d q ( n ) − ˆ y q ( n )) (cid:19) (27)and the signal-independent average system mismatch Υ = 1( M − M + 1) M (cid:88) m = M Υ( m ) (28)which is computed by the temporal average of the block-dependentsystem mismatch Υ( m ) = 1 P Q

P,Q (cid:88) p,q =1 (cid:32) || h pq − ˆ h pq ( m ) || || h pq || (cid:33) . (29)Note that, as the adaptive ﬁlter length L is usually much smaller thanthe true ﬁlter length W of the physical system to be modelled, weonly use the ﬁrst L taps of h pq to obtain an estimate of the attain-able system mismatch. The observed signal that is caused by theremaining W − L taps of the true RIR acts as an error in the in-troduced signal model Eq. (1) and results in an upper bound for thesignal-dependent ERLE measure. It corresponds to the excess errorin statistically optimum ﬁltering [1].As pointed out in Sec. 4, the presented method is not tied toany speciﬁc OSASI algorithm. In this paper we employ, as a fast-converging state-of-the-art algorithm, the Generalized Frequency-Domain Adaptive Filter (GFDAF) [3] which represents a compu-tationally efﬁcient optimization of the well-known block-recursiveleast-squares cost function in the frequency domain. For Single-Input Single-Output OSASI applications the GFDAF is equivalent tothe popular FDAF [1] with a recursive power spectral density (PSD)estimation and an additional data-dependent dynamical regulariza-tion. We use a ﬁlter length of L = 1024 and no block overlap, a con-stant step size of µ = 1 , a recursive PSD averaging factor of ν = 0 . and the dynamical regularization parameters δ max = δ = 1 . Notethat for stationary noise and non-stationary excitation signals, e.g.,speech, VSSS is still beneﬁcial due to the time-varying SNR.In the following we will evaluate the proposed LPUD algorithmagainst two baselines, i.e., the raw GFDAF and a Global Projection-based Update Denoising (GPUD). The GPUD algorithm is a specialcase of the LPUD with I = 1 . The training data for learning themodel consisted of G = 5000 RIRs which were simulated accord-ing to randomly drawn source positions. The global afﬁne subspacedimension is set to D = 550 which showed good overall perfor-mance. The LPUD algorithm consists of I = 40 clusters of iden-tical local dimension D i = 50 . The cluster assignment was learnedby the K-Means algorithm [18, 27]. Furthermore, the evidence ofeach model M i was approximated by the K i = 5 most dominanteigenﬁlters (see Sec. 4.2).Fig. 2 shows the block-dependent system mismatch Υ( m ) ofall algorithms for different types of input signals, i.e., stationary FDAF GPUD LPUD − − WGN l og Υ ( m ) − − SpeechTime in s l og Υ ( m ) Fig. 2 : Block-dependent system mismatch Υ( m ) for a SNR of − dB in dependence of the excitation signal type.White Gaussian Noise (WGN) and speech signals, and a SNR of − dB. For each type of input signal we have averaged Υ( m ) over independent Monte Carlo experiments which are deﬁned by ran-domly drawing the source position and the source signals from therespective models. This limits the inﬂuence of a speciﬁc input sig-nal and source position. As speech source signals we employed different talkers reading out random concatenations of IEEE Har-vard sentences [28]. As can be concluded from Fig. 2 all algorithmsreach their steady-state estimate after approximately s. While thesteady-state performance of the GPUD improves only slightly incomparison to the GFDAF, the LPUD results in a signiﬁcant im-provement for both types of excitation signals. By comparing WGNto speech excitation, we observe that WGN shows consistently ap-proximately dB smaller system mismatch than speech for all al-gorithms. This reﬂects the well-known difference in convergencebehaviour of adaptive ﬁlters caused by the nonstationarity and non-whiteness of speech signals [1, 3, 29]. While for this demanding sce-nario the state-of-the-art algorithm GFDAF is not capable of achiev-ing a sufﬁcient system identiﬁcation performance anymore, the pro-posed LPUD achieves an average system mismatch of − dB af-ter convergence. Additionally, by comparing the initial convergencephases of the algorithms, we observe an almost instantaneous gain ofthe LPUD which is caused by the projection on the estimated afﬁnesubspace. This results in superior system identiﬁcation performanceeven during the early convergence phase, i.e., the ﬁrst second.In Fig. 3 we compare the respective algorithms for different SNRlevels in terms of average ERLE and system mismatch. The resultsare averaged over s of WGN excitation and s of speech exci-tation and independent Monte Carlo experiments. The respectivelimits of the sums in Eqs. (27) and (28), i.e., N , N , M , M , arechosen to divide the signals into two parts of equal length. This al-lows to assess the Convergence Phase ( CP ), i.e., the ﬁrst part, andthe Steady-State ( SS ), i.e., the second part, independently. As canbe concluded from Fig. 3 the proposed LPUD method signiﬁcantly GFDAF ( CP ) GPUD ( CP ) LPUD ( CP )GFDAF ( SS ) GPUD ( SS ) LPUD ( SS ) − −

10 0 10 − SNR in dB l og E R LE WGN − −

10 0 10 − SpeechSNR in dB − −

10 0 10 − WGNSNR in dB l og Υ − −

10 0 10 − SpeechSNR in dB

Fig. 3 : Performance evaluation of the various algorithms in depen-dence of the SNR and the excitation signal type ( CP : ConvergencePhase, SS : Steady State).outperforms the GFDAF for all SNR levels in terms of steady-stateperformance for both ERLE and system mismatch Υ . This suggestsan efﬁcient denoising of the update in low-SNR applications whilestill preserving a sufﬁcient model ﬂexibility for precise system iden-tiﬁcation in high-SNR scenarios. Additionally, by comparing theGPUD to the LPUD algorithm, one can observe the advantage of as-suming only local linearity compared to the global linear approachwhich lacks the aforementioned trade-off opportunity. Finally, weobserved that the optimum subspace dimensions D i are strongly re-lated to the respective SNR which would allow even higher perfor-mance improvements by choosing the signal-dependent optimum foreach scenario.

6. SUMMARY AND OUTLOOK

In this paper we presented a novel method for improved OSASI innoisy environments by exploiting prior knowledge about the space ofRIRs for a given acoustic scenario. The proposed method is basedon the projection of the parameter update onto an afﬁne subspacewhich is selected by a novel computationally efﬁcient computationof the associated evidence. The beneﬁt of the proposed update de-noising for a state-of-the-art OSASI algorithm was corroborated bysimulated experiments.Future research aims at evaluating the beneﬁt of various dictio-nary learning algorithms in comparison to PCA for estimating themodel parameters. Furthermore, probabilistic mixtures of subspacemodels, e.g., [30], are of interest to improve the unsupervised clus-tering of the training data in Sec. 3. Finally, an adaptive estimationof the noise variances by, e.g., an Expectation-Maximzation (EM)framework, and an adaptive computation of the optimum subspacedimension appears to be promising for non-stationary noise signals. . REFERENCES [1] S. Haykin,

Adaptive ﬁlter theory , Prentice Hall, Upper SaddleRiver, NJ, 2002.[2] P. S. R. Diniz,

Adaptive Filtering: Algorithms and PracticalImplementation , Springer, Berlin, Heidelberg, 2007.[3] H. Buchner, J. Benesty, and W. Kellermann, “Generalized mul-tichannel frequency-domain adaptive ﬁltering: efﬁcient real-ization and application to hands-free speech communication,”

Signal Processing , vol. 85, no. 3, pp. 549–570, Mar. 2005.[4] S. Malik and G. Enzner, “Recursive Bayesian Control of Multi-channel Acoustic Echo Cancellation,”

IEEE Signal ProcessingLetters , vol. 18, no. 11, pp. 619–622, Nov. 2011.[5] T. Gansler, M. Hansson, C.-J. Ivarsson, and G. Salomonsson,“A double-talk detector based on coherence,”

IEEE Transac-tions on Communications , vol. 44, no. 11, pp. 1421–1427, Nov.1996.[6] J. Benesty, D.R. Morgan, and J.H. Cho, “A new class of dou-bletalk detectors based on cross-correlation,”

IEEE Transac-tions on Speech and Audio Processing , vol. 8, no. 2, pp. 168–172, Mar. 2000.[7] S. Malik and G. Enzner, “Online maximum-likelihood learn-ing of time-varying dynamical models in block-frequency-domain,” in

International Conference on Acoustics, Speechand Signal Processing (ICASSP) , Dallas, USA, Mar. 2010.[8] M. Fozunbal, T. Kalker, and R. W. Schafer, “Multi-ChannelEcho Control by Model Learning,” in

International Workshopon Acoustic Echo and Noise Control (IWAENC) , Seattle, USA,Sept. 2008.[9] T. Koren, R. Talmon, and I. Cohen, “Supervised system iden-tiﬁcation based on local PCA models,” in

International Con-ference on Acoustics, Speech and Signal Processing (ICASSP) ,Kyoto, Japan, Mar. 2012.[10] R. Talmon and S. Gannot, “Relative transfer function identi-ﬁcation on manifolds for supervised GSC beamformers,” in

European Conference on Signal Processing (EUSIPCO) , Mar-rakech, Marocco, Sept. 2013.[11] R. Talmon, I. Cohen, S. Gannot, and R. R. Coifman, “Diffu-sion Maps for Signal Processing: A Deeper Look at Manifold-Learning Techniques Based on Kernels and Graphs,”

IEEESignal Processing Magazine , vol. 30, no. 4, pp. 75–86, July2013.[12] B. Laufer-Goldshtein, R. Talmon, and S. Gannot, “A Study onManifolds of Acoustic Responses,” in

Latent Variable Analysisand Signal Separation (LVA/ICA) , Liberec, Czech Republic,Aug. 2015.[13] L.W. Tu,

An Introduction to Manifolds , Universitext. SpringerNew York, 2010.[14] M.M. Sondhi, D.R. Morgan, and J.L. Hall, “Stereophonicacoustic echo cancellation-an overview of the fundamentalproblem,”

IEEE Signal Processing Letters , vol. 2, no. 8, pp.148–151, Aug. 1995.[15] J. Benesty, D.R. Morgan, and M.M. Sondhi, “A better under-standing and an improved solution to the speciﬁc problems ofstereophonic acoustic echo cancellation,”

IEEE Transactionson Speech and Audio Processing , vol. 6, no. 2, pp. 156–165,Mar. 1998. [16] P. J. Dhrymes,

Matrix Vectorization , Springer New York, NY,2000.[17] M. Elad,

Sparse and Redundant Representations - From The-ory to Applications in Signal and Image Processing , Springer,New York, NY, 2010.[18] S. Lloyd, “Least squares quantization in PCM,”

IEEE Transac-tions on Information Theory , vol. 28, no. 2, pp. 129–137, Mar.1982.[19] P.-A. Absil, R. Mahony, and R. Sepulchre,

Optimization Al-gorithms on Matrix Manifolds , Princeton University Press,Princeton, NJ, 2008.[20] C. M. Bishop,

Pattern Recognition and Machine Learning (In-formation Science and Statistics) , Springer, Berlin, Heidel-berg, 2007.[21] J. Ding, V. Tarokh, and Y. Yang, “Model selection techniques:An overview,”

IEEE Signal Processing Magazine , vol. 35, no.6, pp. 16–34, 2018.[22] S. Roweis and Z. Ghahramani, “A unifying review of linearGaussian models,”

Neural computation , vol. 11, no. 2, pp.305–345, 1999.[23] G. Enzner, H. Buchner, A. Favrot, and F. Kuech, “AcousticEcho Control,” in

Academic Press Library in Signal Process-ing , vol. 4, pp. 807–877. Elsevier, 2014.[24] G. Strang,

Linear Algebra and its Applications , Thomson,Brooks/Cole, Belmont, CA, 2006.[25] J. B. Allen and D. A. Berkley, “Image method for efﬁcientlysimulating small-room acoustics,”

Journal of the AcousticalSociety of America , vol. 65, no. 4, pp. 943–950, 1979.[26] E. Habets, “Room Impulse Response Generator,” Tech. Rep.,Technische Universiteit Eindhoven, Sept. 2010.[27] D. Arthur and V. Vassilvitskii, “K-means++: The advantagesof careful seeding,” in

Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms , New Orleans, USA,2007.[28] L. M. Panﬁli, J. Haywood, D. R. McCloy, P. E. Souza,and R. A. Wright, “The UW/NU corpus, version2.0,” https://depts.washington.edu/phonlab/projects/uwnu.php , 2017.[29] C. Breining, P. Dreiscitel, E. Hansler, A. Mader, B. Nitsch,H. Puder, T. Schertler, G. Schmidt, and J. Tilp, “Acousticecho control. an application of very-high-order adaptive ﬁl-ters,”

IEEE Signal Processing Magazine , vol. 16, no. 4, pp.42–69, 1999.[30] M. E Tipping and C. M Bishop, “Mixtures of ProbabilisticPrincipal Component Analysers,”