Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Feifei Xiong is active.

Publication


Featured researches published by Feifei Xiong.


international conference on acoustics, speech, and signal processing | 2013

Blind estimation of reverberation time based on spectro-temporal modulation filtering

Feifei Xiong; Stefan Goetze; Bernd T. Meyer

A novel method for blind estimation of the reverberation time (RT60) is proposed based on applying spectro-temporal modulation filters to time-frequency representations. 2D-Gabor filters arranged in a filterbank enable an analysis of the properties of temporal, spectral, and spectro-temporal filtering for this task. Features are used as input to a multi-layer perceptron (MLP) classifier combined with a simple decision rule that attributes a specific RT60 to a given utterance and allows to assess the reliability of the approach for different resolutions of RT60 classification. While the filter set including temporal, spectral, and spectro-temporal filters already outperforms an MFCC baseline, the error rates are further reduced when relying on diagonal spectro-temporal filters alone. The average error rate is 1.9% for the best feature set, which corresponds to a relative reduction of 58.3% compared to the MFCC baseline for RT60s in 0.1 s resolution.


international conference on e-health networking, applications and services | 2010

Hands-free telecommunication for elderly persons suffering from hearing deficiencies

Stefan Goetze; Feifei Xiong; Jan Rennies; Thomas Rohdenburg; Jens-E. Appell

Speech communication is the most natural form of human interaction. Communication by means of telephones, mobile phones or video-conference systems is common nowadays especially amongst younger persons. In the past years, also a growing amount of elderly people has started to extensively use communication systems since more and more people live apart from their relatives, friends or acquaintances. However, especially elderly people suffer from hearing loss, which often prevents them from using acoustic communication devices. While approximately every second European adult of age 65+ has a hearing loss that requires treatment, only the minority actually wears hearing aids for different reasons. To tackle this problem, this contribution deals with a personalized and adaptable communication system that enhances the acoustic signal and incorporates the individual hearing loss of a hearing-impaired person. By this, the typical elderly user is enabled to take part in natural communication again.


international conference on acoustics, speech, and signal processing | 2014

Estimating room acoustic parameters for speech recognizer adaptation and combination in reverberant environments

Feifei Xiong; Stefan Goetze; Bernd T. Meyer

This work analyzes the influence of reverberation on automatic speech recognition (ASR) systems and how to compensate its influence, with special focus on the important acoustical parameters i.e. room reverberation time T60 and clarity index C50. A multilayer perceptron (MLP) using features of a spectro-temporal filter bank as input is employed to identify the acoustic conditions spanning various reverberant scenarios. The posterior probabilities of the MLP are used to design a novel selection scheme for adaptation in a cluster-based manner and for system combination achieved by recognizer output voting error reduction (ROVER). A comparison of word error rates is performed considering different training modes, and an average relative improvement of 7.1% is obtained by the proposed system compared to conventional multistyle training.


international conference on acoustics, speech, and signal processing | 2015

A study on joint beamforming and spectral enhancement for robust speech recognition in reverberant environments

Feifei Xiong; Bernd T. Meyer; Stefan Goetze

This work evaluates multi-microphone beamforming and single-microphone spectral enhancement strategies to alleviate the reverberation effect for robust automatic speech recognition (ASR) systems in different reverberant environments characterized by different reverberation times T60 and direct-to-reverberation ratios (DRRs). The systems consist of minimum variance distortionless response (MVDR) beamformers in combination with minimum mean square error (MMSE) estimators, and late reverberation spectral variance (LRSV) estimators, the latter employing a generalized model of the room impulse response (RIR). Various system architectures are analyzed with a focus on optimal speech recognition performance. The system combining an MVDR beamformer and a subsequent MMSE estimator was found to lead to the best results, with relative reductions of 27.7% compared to the baseline system. This is attributed to a more accurate LRSV estimate from spatial averaging and diffuse field refinement for the MMSE estimator.


international conference on acoustics, speech, and signal processing | 2012

System identification for listening-room compensation by means of acoustic echo cancellation and acoustic echo suppression filters

Feifei Xiong; Jens-E. Appell; Stefan Goetze

Subsystems for dereverberation and acoustic echo cancellation (AEC)/acoustic echo suppression (AES) are important components in high-quality hands-free telecommunication systems. This contribution describes and analyzes a combined system for dereverberation and AEC/AES. The system identification inherently achieved by the AEC/AES system is used for the design of the room impulse response (RIR) equalization filter, i.e. the listening-room compensation (LRC) system. We use complex RIR smoothing and decoupled filtered-X least-mean-squares (dFxLMS) gradient algorithm for LRC and a combined AEC/AES system for the system identification necessary for the LRC filter design. The performance of the combined system and the mutual influences of LRC and AEC/AES are analyzed.


international conference on acoustics, speech, and signal processing | 2017

Combination strategy based on relative performance monitoring for multi-stream reverberant speech recognition

Feifei Xiong; Stefan Goetze; Bernd T. Meyer

A multi-stream framework with deep neural network (DNN) classifiers is applied to improve automatic speech recognition (ASR) in environments with different reverberation characteristics. We propose a room parameter estimation model to establish a reliable combination strategy which performs on either DNN posterior probabilities or word lattices. The model is implemented by training a multilayer perceptron incorporating auditory-inspired features in order to distinguish between and generalize to various reverberant conditions, and the model output is shown to be highly correlated to ASR performances between multiple streams, i.e., relative performance monitoring, in contrast to conventional mean temporal distance based performance monitoring for a single stream. Compared to traditional multi-condition training, average relative word error rate improvements of 7.7% and 9.4% have been achieved by the proposed combination strategies performing on posteriors and lattices, respectively, when the multi-stream ASR is tested in known and unknown simulated reverberant environments as well as realistically recorded conditions taken from REVERB Challenge evaluation set.


international conference on acoustics, speech, and signal processing | 2017

On DNN posterior probability combination in multi-stream speech recognition for reverberant environments

Feifei Xiong; Stefan Goetze; Bernd T. Meyer

A multi-stream framework with deep neural network (DNN) classifiers has been applied in this paper to improve automatic speech recognition (ASR) performance in environments with different reverberation characteristics. We propose a room parameter estimation model to determine the stream weights for DNN posterior probability combination with the aim of obtaining reliable log-likelihoods for decoding. The model is implemented by training a multi-layer perceptron to distinguish between various reverberant environments. The method is tested in known and unknown environments against approaches based on inverse entropy and autoencoders, with average relative word error rate improvements of 46% and 29%, respectively, when performing multi-stream ASR in different reverberant situations.


2017 Hands-free Speech Communications and Microphone Arrays (HSCMA) | 2017

Performance comparison of real-time single-channel speech dereverberation algorithms

Feifei Xiong; Bernd T. Meyer; Benjamin Cauchi; Ante Jukic; Simon Doclo; Stefan Goetze

This paper investigates four single-channel speech dereverberation algorithms, i.e., two unsupervised approaches based on (i) spectral enhancement and (ii) linear prediction, as well as two supervised approaches relying on machine learning which incorporate deep neural networks to predict either (iii) the magnitude spectrogram or (iv) the ideal ratio mask. The relative merits of the four algorithms in terms of several objective measures, automatic speech recognition performance, robustness against noise, variations between simulated and recorded reverberant speech, computation time and latency are discussed. Experimental results show that all four algorithms are capable of providing benefits in reverberant environments even with moderate background noises. In addition, low complexity and latency indicate their potential for real-time applications.


Journal of the Acoustical Society of America | 2016

Joint beamforming and spectral enhancement for robust automatic speech recognition in reverberant environments

Fanuel Melak Asmare; Feifei Xiong; Mathias Bode; Bernard Mayer; Stefan Goetze

This work evaluates multi-microphone beamforming techniques and single-microphone spectral enhancement strategies to alleviate the reverberation effect for robust automatic speech recognition (ASR) systems in different reverberant environments characterized by different reverberation times T60 and direct-to- reverberation ratios (DRRs). The systems under test consist of minimum variance distortionless response (MVDR) beamformers in combination with minimum mean square error (MMSE) estimators. For the later, reliable late reverberation spectral variance (LRSV) estimation employing a generalized model of the room impulse response (RIR) is crucial. Based on the generalized RIR model which separates the direct path from the remaining RIR, two different frequency resolutions in the short time Fourier transform (STFT) domain are evaluated, referred to as short- and long-term, to effectively estimate the direct signal. Regarding to the fusion between the MVDR beamformer and the MMSE estimator, the LRSV estimator...


EURASIP Journal on Advances in Signal Processing | 2015

Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features

Feifei Xiong; Bernd T. Meyer; Niko Moritz; Robert Rehr; Jörn Anemüller; Timo Gerkmann; Simon Doclo; Stefan Goetze

Collaboration


Dive into the Feifei Xiong's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Simon Doclo

University of Oldenburg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ante Jukic

University of Oldenburg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Robert Rehr

University of Oldenburg

View shared research outputs
Researchain Logo
Decentralizing Knowledge