Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Armin Sehr is active.

Publication


Featured researches published by Armin Sehr.


IEEE Signal Processing Magazine | 2012

Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition

Takuya Yoshioka; Armin Sehr; Marc Delcroix; Keisuke Kinoshita; Roland Maas; Tomohiro Nakatani; Walter Kellermann

Speech recognition technology has left the research laboratory and is increasingly coming into practical use, enabling a wide spectrum of innovative and exciting voice-driven applications that are radically changing our way of accessing digital services and information. Most of todays applications still require a microphone located near the talker. However, almost all of these applications would benefit from distant-talking speech capturing, where talkers are able to speak at some distance from the microphones without the encumbrance of handheld or body-worn equipment [1]. For example, applications such as meeting speech recognition, automatic annotation of consumer-generated videos, speech-to-speech translation in teleconferencing, and hands-free interfaces for controlling consumer-products, like interactive TV, will greatly benefit from distant-talking operation. Furthermore, for a number of unexplored but important applications, distant microphones are a prerequisite. This means that distant talking speech recognition technology is essential for extending the availability of speech recognizers as well as enhancing the convenience of existing speech recognition applications.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition

Armin Sehr; Roland Maas; Walter Kellermann

The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in “Distant-talking continuous speech recognition based on a novel reverberation model in the feature domain” (A. Sehr , in Proc. Interspeech, 2006, pp. 769-772) for melspectral features, is extended to logarithmic melspectral (logmelspec) features in this contribution. Thus, the favorable properties of REMOS, including its high flexibility with respect to changing reverberation conditions, become available in the more competitive logmelspec domain. Based on a combined acoustic model consisting of a hidden Markov model (HMM) network and a reverberation model (RM), REMOS determines clean-speech and reverberation estimates during recognition. Therefore, in each iteration of a modified Viterbi algorithm, an inner optimization operation maximizes the joint density of the current HMM output and the RM output subject to the constraint that their combination is equal to the current reverberant observation. Since the combination operation in the logmelspec domain is nonlinear, numerical methods appear necessary for solving the constrained inner optimization problem. A novel reformulation of the constraint, which allows for an efficient solution by nonlinear optimization algorithms, is derived in this paper so that a practicable implementation of REMOS for logmelspec features becomes possible. An in-depth analysis of this REMOS implementation investigates the statistical properties of its reverberation estimates and thus derives possibilities for further improving the performance of REMOS. Connected digit recognition experiments show that the proposed REMOS version in the logmelspec domain significantly outperforms the melspec version. While the proposed RMs with parameters estimated by straightforward training for a given room are robust to a mismatch of the speaker-microphone distance, their performance significantly decreases if they are used in a room with substantially different conditions. However, by training multi-style RMs with data from several rooms, good performance can be achieved across different rooms.


international conference on acoustics, speech, and signal processing | 2012

On the application of reverberation suppression to robust speech recognition

Roland Maas; Emanuel A. P. Habets; Armin Sehr; Walter Kellermann

In this paper, we study the effect of the design parameters of a single-channel reverberation suppression algorithm on reverberation-robust speech recognition. At the same time, reverberation compensation at the speech recognizer is investigated. The analysis reveals that it is highly beneficial to attenuate only the reverberation tail after approximately 50 ms while coping with the early reflections and residual late-reverberation by training the recognizer on moderately reverberant data. It will be shown that the overall system at its optimum configuration yields a very promising recognition performance even in strongly reverberant environments. Since the reverberation suppression algorithm is evidenced to significantly reduce the dependency on the training data, it allows for a very efficient training of acoustic models that are suitable for a wide range of reverberation conditions. Finally, experiments with an “ideal” reverberation suppression algorithm are carried out to cross-check the inferred guidelines.


international conference on acoustics, speech, and signal processing | 2007

A New Concept for Feature-Domain Dereverberation for Robust Distant-Talking ASR

Armin Sehr; W. Kellerman

The feature-domain dereverberation capabilities of a novel approach for automatic speech recognition in reverberant environments are investigated in this paper. By combining a network of clean speech HMMs and a reverberation model, the most likely combination of the HMM output and the reverberation model output is found during decoding time by an extended version of the Viterbi algorithm. We show in this paper that the most likely HMM output represents a good estimate of the clean speech feature sequence and can be used as input to subsequent speech recognizers.


international conference on acoustics, speech, and signal processing | 2011

Frame-wise HMM adaptation using state-dependent reverberation estimates

Armin Sehr; Roland Maas; Walter Kellermann

A novel frame-wise model adaptation approach for reverberation-robust distant-talking speech recognition is proposed. It adjusts the means of static cepstral features to capture the statistics of reverberant feature vector sequences obtained from distant-talking speech recordings. The means of the HMMs are adapted during decoding using a state-dependent estimate of the late reverberation determined by joint use of a feature-domain reverberation model and optimum partial state sequences. Since the parameters of the HMMs and the reverberation model can be estimated completely independently, the approach is very flexible with respect to changing acoustic environments. Due to the frame-wise model adaptation, some of the HMM limitations are relieved, and recognition results surpassing that of matched reverberant training are obtained at the cost of a moderately increased decoding complexity.


2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011

Multi-style training of HMMS with stereo data for reverberation-robust speech recognition

Armin Sehr; Christian Hofmann; Roland Maas; Walter Kellermann

A novel training algorithm using data pairs of clean and reverberant feature vectors for estimating robust Hidden Markov Models (HMMs), introduced in [1] for matched training, is employed in this paper for multi-style training. The multi-style HMMs are derived from well-trained clean-speech HMMs by aligning the clean data to the clean-speech HMM and using the resulting state-frame alignment to estimate the Gaussian mixture densities from the reverberant data of several different rooms. Thus, the temporal alignment is fixed for all reverberation conditions contained in the multi-style training set so that the model mismatch between the different rooms is reduced. Therefore, this training approach is particularly suitable for multi-style training. Multi-style HMMs trained by the proposed approach and adapted to the current room condition using maximum likelihood linear regression significantly outperform the corresponding adapted multi-style HMMs trained by the conventional Baum-Welch algorithm. In strongly reverberant rooms, the proposed adapted multi-style HMMs even outper-form Baum-Welch HMMs trained on matched data.


international conference on acoustics, speech, and signal processing | 2010

Model-based dereverberation in the logmelspec domain for robust distant-talking speech recognition

Armin Sehr; Roland Maas; Walter Kellermann

The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in [1] for melspectral features, is extended in this contribution to logarithmic melspectral (logmelspec) features. Based on a combined acoustic model consisting of a hidden Markov model network and a reverberation model, REMOS determines clean-speech and reverberation estimates during recognition by an inner optimization operation. A reformulation of this inner optimization problem for logmelspec features, allowing an efficient solution by nonlinear optimization algorithms, is derived in this paper so that an efficient implementation of REMOS for logmelspec features becomes possible. Connected digit recognition experiments show that the proposed REMOS implementation significantly outperforms reverberantly-trained HMMs in highly reverberant environments.


asilomar conference on signals, systems and computers | 2008

Model-based dereverberation of speech in the mel-spectral domain

Armin Sehr; Walter Kellermann

A model-based dereverberation approach for robust distant-talking speech recognition employing the powerful acoustic model of the recognizer to describe the clean speech feature sequence is discussed. The clean speech model is combined with a statistical reverberation model describing the acoustic path between speaker and microphone directly in the mel-spectral domain. Dereverberation is performed during recognition by determining the most likely contributions of the combined models components to the current reverberant feature vector. The advantages of processing feature-domain representations of speech rather than using time- or frequency-domain speech representations are the dimension reduction and the possibility to obtain robust reverberation models valid for arbitrary speaker and microphone positions in the recording room. In this contribution, we emphasize that the criterion used for the dereverberation operation is equivalent to maximum a posteriori estimation. Connected-digit recognition experiments confirm the superior performance of the novel concept.


2008 Hands-Free Speech Communication and Microphone Arrays | 2008

New Results for Feature-Domain Reverberation Modeling

Armin Sehr; Walter Kellermann

To achieve robust distant-talking automatic speech recognition in reverberant environments, the effect of reverberation on the speech feature sequences has to be modeled as accurately as possible. A convolution in the feature domain has been proposed recently in [1, 2, 3, 4] to capture the dispersion of the feature vectors caused by reverberation. These publications use a fixed representation of the acoustic path between speaker and microphone or an elementary statistical reverberation model based on simplifying assumptions. In this contribution, we propose a Monte-Carlo approach that allows for an explicit determination of the joint probability density function of a feature-domain reverberation model.


international conference on digital signal processing | 2013

Formulation of the REMOS concept from an uncertainty decoding perspective

Roland Maas; Walter Kellermann; Armin Sehr; Takuya Yoshioka; Marc Delcroix; Keisuke Kinoshita; Tomohiro Nakatani

In this paper, we introduce a new formulation of the REMOS (REverberation MOdeling for Speech recognition) concept from an uncertainty decoding perspective. Based on a convolutive observation model that relaxes the conditional independence assumption of hidden Markov models, REMOS effectively adapts automatic speech recognition (ASR) systems to noisy and strongly reverberant environments. While uncertainty decoding approaches are typically designed to operate irrespectively of the employed decoding routine of the ASR system, REMOS explicitly considers the additional information provided by the Viterbi decoder. In contrast to previous publications of the REMOS concept, we provide a conclusive derivation of its decoding routine using a Bayesian network representation in order to prove its inherent uncertainty decoding character.

Collaboration


Dive into the Armin Sehr's collaboration.

Top Co-Authors

Avatar

Walter Kellermann

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Roland Maas

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Keisuke Kinoshita

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Marc Delcroix

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Tomohiro Nakatani

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Christian Hofmann

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Emanuel A. P. Habets

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Marcus Zeller

University of Erlangen-Nuremberg

View shared research outputs
Researchain Logo
Decentralizing Knowledge