Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dorothea Kolossa is active.

Publication


Featured researches published by Dorothea Kolossa.


international conference on independent component analysis and signal separation | 2004

Nonlinear Postprocessing for Blind Speech Separation

Dorothea Kolossa; Reinhold Orglmeister

Frequency domain ICA has been used successfully to separate the utterances of interfering speakers in convolutive environments, see e.g. [6],[7]. Improved separation results can be obtained by applying a time frequency mask to the ICA outputs. After using the direction of arrival information for permutation correction, the time frequency mask is obtained with little computational effort. The proposed postprocessing is applied in conjunction with two frequency domain ICA methods and a beamforming algorithm, which increases separation performance for reverberant, as well as for in-car speech recordings, by an average 3.8dB. By combined ICA and time frequency masking, SNR-improvements up to 15dB are obtained in the car environment. Due to its robustness to the environment and regarding the employed ICA algorithm, time frequency masking appears to be a good choice for enhancing the output of convolutive ICA algorithms at a marginal computational cost.


workshop on applications of signal processing to audio and acoustics | 2005

Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques

Dorothea Kolossa; Aleksander Klimas; Reinhold Orglmeister

Time-frequency masking has emerged as a powerful technique for source separation of noisy and convolved speech mixtures. It has also been applied successfully for noisy speech recognition. But while significant SNR gains are possible by adequate masking functions, speech recognition performance suffers from the involved nonlinear operations so that the greatly improved SNR often contrasts with only slight improvements in the recognition rate. To address this problem, marginalization techniques have been used for speech recognition, but they rely on speech recognition and source separation to be carried out in the same domain. However, source separation and denoising are often carried out in the short-time-Fourier-transform (STFT) domain, whereas the most useful speech recognition features are e.g. mel-frequency cepstral coefficients (MFCCs), LPC-cepstral coefficients and VQ-features. In these cases, marginalization techniques are not directly applicable. Here, another approach is suggested, which estimates sufficient statistics for speech features in the preprocessing (e.g. STFT-) domain, propagates these statistics through the transforms from the spectrum to e.g. the MFCCs of a speech recognition system and uses the estimated statistics for missing data speech recognition. With this approach, significant gains can be achieved in speech recognition rates, and in this context, time-frequency masking yields recognition rate improvements of more than 35% when compared to TF-masking based source separation


Archive | 2011

Robust Speech Recognition of Uncertain or Missing Data

Dorothea Kolossa; Reinhold Häb-Umbach

Automatic speech recognition suffers from a lack of robustness with respect to noise, reverberation and interfering speech. The growing field of speech recognition in the presence of missing or uncertain input data seeks to ameliorate those problems by using not only a preprocessed speech signal but also an estimate of its reliability to selectively focus on those segments and features that are most reliable for recognition. This book presents the state of the art in recognition in the presence of uncertainty, offering examples that utilize uncertainty information for noise robustness, reverberation robustness, simultaneous recognition of multiple speech signals, and audiovisual speech recognition. The book is appropriate for scientists and researchers in the field of speech recognition who will find an overview of the state of the art in robust speech recognition, professionals working in speech recognition who will find strategies for improving recognition results in various conditions of mismatch, and lecturers of advanced courses on speech processing or speech recognition who will find a reference and a comprehensive introduction to the field. The book assumes an understanding of the fundamentals of speech recognition using Hidden Markov Models.


Eurasip Journal on Audio, Speech, and Music Processing | 2010

Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

Dorothea Kolossa; Ramón Fernández Astudillo; Eugen Hoffmann; Reinhold Orglmeister

When a number of speakers are simultaneously active, for example in meetings or noisy public places, the sources of interest need to be separated from interfering speakers and from each other in order to be robustly recognized. Independent component analysis (ICA) has proven a valuable tool for this purpose. However, ICA outputs can still contain strong residual components of the interfering speakers whenever noise or reverberation is high. In such cases, nonlinear postprocessing can be applied to the ICA outputs, for the purpose of reducing remaining interferences. In order to improve robustness to the artefacts and loss of information caused by this process, recognition can be greatly enhanced by considering the processed speech feature vector as a random variable with time-varying uncertainty, rather than as deterministic. The aim of this paper is to show the potential to improve recognition of multiple overlapping speech signals through nonlinear postprocessing together with uncertainty-based decoding techniques.


IEEE Transactions on Audio, Speech, and Language Processing | 2015

Learning dynamic stream weights for coupled-HMM-based audio-visual speech recognition

Ahmed Hussen Abdelaziz; Steffen Zeiler; Dorothea Kolossa

With the increasing use of multimedia data in communication technologies, the idea of employing visual information in automatic speech recognition (ASR) has recently gathered momentum. In conjunction with the acoustical information, the visual data enhances the recognition performance and improves the robustness of ASR systems in noisy and reverberant environments. In audio-visual systems, dynamic weighting of audio and video streams according to their instantaneous confidence is essential for reliably and systematically achieving high performance. In this paper, we present a complete framework that allows blind estimation of dynamic stream weights for audio-visual speech recognition based on coupled hidden Markov models (CHMMs). As a stream weight estimator, we consider using multilayer perceptrons and logistic functions to map multidimensional reliability measure features to audiovisual stream weights. Training the parameters of the stream weight estimator requires numerous input-output tuples of reliability measure features and their corresponding stream weights. We estimate these stream weights based on oracle knowledge using an expectation maximization algorithm. We define 31-dimensional feature vectors that combine model-based and signal-based reliability measures as inputs to the stream weight estimator. During decoding, the trained stream weight estimator is used to blindly estimate stream weights. The entire framework is evaluated using the Grid audio-visual corpus and compared to state-of-the-art stream weight estimation strategies. The proposed framework significantly enhances the performance of the audio-visual ASR system in all examined test conditions.


IEEE Journal of Selected Topics in Signal Processing | 2010

An Uncertainty Propagation Approach to Robust ASR Using the ETSI Advanced Front-End

Ramón Fernández Astudillo; Dorothea Kolossa; Philipp Mandelartz; Reinhold Orglmeister

In this paper, we show how uncertainty propagation, combined with observation uncertainty techniques, can be applied to a realistic implementation of robust distributed speech recognition (DSR) to improve recognition robustness furthermore, with little increase in computational complexity. Uncertainty propagation, or error propagation, techniques employ a probabilistic description of speech to reflect the information lost during speech enhancement or source separation in the time or frequency domain. This uncertain description is then propagated through the feature extraction process to the domain of features used in speech recognition. In this domain, the statistical information can be combined with the statistical parameters of the recognition model by employing observation uncertainty techniques. We show that the combination of a piecewise uncertainty propagation scheme with front-end uncertainty decoding or modified imputation improves the baseline of the advanced front-end (AFE), the state of the art algorithm of the European Telecommunications Standards Institute (ETSI), on the AURORA5 database. We compare this method with other observation uncertainty techniques and show how the use of uncertainty propagation reduces the word error rates without the need for any kind of adaptation to noise using stereo data or iterative parameter estimation.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Variational Bayesian inference for multichannel dereverberation and noise reduction

Dominic Schmid; Gerald Enzner; Sarmad Malik; Dorothea Kolossa; Rainer Martin

Room reverberation and background noise severely degrade the quality of hands-free speech communication systems. In this work, we address the problem of combined speech dereverberation and noise reduction using a variational Bayesian (VB) inference approach. Our method relies on a multichannel state-space model for the acoustic channels that combines frame-based observation equations in the frequency domain with a first-order Markov model to describe the time-varying nature of the room impulse responses. By modeling the channels and the source signal as latent random variables, we formulate a lower bound on the log-likelihood function of the model parameters given the observed microphone signals and iteratively maximize it using an online expectation-maximization approach. Our derivation yields update equations to jointly estimate the channel and source posterior distributions and the remaining model parameters. An inspection of the resulting VB algorithm for blind equalization and channel identification (VB-BENCH) reveals that the presented framework includes previously proposed methods as special cases. Finally, we evaluate the performance of our approach in terms of speech quality, adaptation times, and speech recognition results to demonstrate its effectiveness for a wide range of reverberation and noise conditions.


international conference on acoustics, speech, and signal processing | 2003

Beamforming-based convolutive source separation

Wolf Baumann; Dorothea Kolossa; Reinhold Orglmeister

A robust independent component analysis (ICA) algorithm for blind separation of convolved mixtures of speech signals is introduced. It is based on two parallel frequency dependent beamforming stages, each of which cancels the signal from one interfering source by frequency dependent null-beamforming. The zero-directions of the beamforming stages are optimized to yield maximally independent outputs, which is achieved via second and higher order statistics. Optimization is carried out in the frequency domain for each frequency band separately, so that phase distortions caused by the room impulse responses are compensated. In contrast to other frequency domain source separation algorithms, this structure does not suffer from permutation of frequency bands, while retaining the major advantage of blind methods, that do not require an external estimate of the direction of arrival (DOA).


asilomar conference on signals, systems and computers | 2006

Recognition of Convolutive Speech Mixtures by Missing Feature Techniques for ICA

Dorothea Kolossa; Hiroshi Sawada; Ramón Fernández Astudillo; Reinhold Orglmeister; Shoji Makino

One challenging problem for robust speech recognition is the cocktail party effect, where multiple speaker signals are active simultaneously in an overlapping frequency range. In that case, independent component analysis (ICA) can separate the signals in reverberant environments, also. However, incurred feature distortions prove detrimental for speech recognition. To reduce consequential recognition errors, we describe the use of ICA for the additional estimation of uncertainty information. This information is subsequently used in missing feature speech recognition, which leads to far more correct and accurate recognition also in reverberant situations at RT60 = 300ms.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing

Robert M. Nickel; Ramón Fernández Astudillo; Dorothea Kolossa; Rainer Martin

We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiaos method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.

Collaboration


Dive into the Dorothea Kolossa's collaboration.

Top Co-Authors

Avatar

Reinhold Orglmeister

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eugen Hoffmann

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dennis Orth

Ruhr University Bochum

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge