Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robert M. Nickel is active.

Publication


Featured researches published by Robert M. Nickel.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Speech Enhancement With Inventory Style Speech Resynthesis

Xiaoqiang Xiao; Robert M. Nickel

We present a new method for the enhancement of speech. The method is designed for scenarios in which targeted speaker enrollment as well as system training within the typical noise environment are feasible. The proposed procedure is fundamentally different from most conventional and state-of-the-art denoising approaches. Instead of filtering a distorted signal we are resynthesizing a new “clean” signal based on its likely characteristics. These characteristics are estimated from the distorted signal. A successful implementation of the proposed method is presented. Experiments were performed in a scenario with roughly one hour of clean speech training data. Our results show that the proposed method compares very favorably to other state-of-the-art systems in both objective and subjective speech quality assessments. Potential applications for the proposed method include jet cockpit communication systems and offline methods for the restoration of audio recordings.


international conference on acoustics, speech, and signal processing | 2009

Inventory based speech enhancement for speaker dedicated speech communication systems

Xiaoqiang Xiao; Peng Lee; Robert M. Nickel

We are presenting a method for the enhancement of speech in speaker dedicated speech communication systems. The proposed procedure is fundamentally different from most state-of-the-art filtering approaches. Instead of filtering a distorted signal we are re-synthesizing a new “clean” signal based on its likely characteristics. These characteristics are estimated from the distorted signal. We present a successful implementation of the proposed method for a communication system for which speaker enrollment and noise enrollment are feasible. Forty minutes of clean speech training data is usually sufficient for successful denoising. The proposed method compares very favorably to other state-of-the-art systems in both objective and subjective speech quality assessments.


Multidimensional Systems and Signal Processing | 1998

Scale and Translation Invariant Methods for Enhanced Time-FrequencyPattern Recognition

William J. Williams; Eugene J. Zalubas; Robert M. Nickel; Alfred O. Hero

Time-frequency (t-f) analysis has clearly reached a certain maturity. One can now often provide striking visual representations of the joint time-frequency energy representation of signals. However, it has been difficult to take advantage of this rich source of information concerning the signal, especially for multidimensional signals. Properly constructed time-frequency distributions enjoy many desirable properties. Attempts to incorporate t-f analysis results into pattern recognition schemes have not been notably successful to date. Aided by Cohens scale transform one may construct representations from the t-f results which are highly useful in pattern classification. Such methods can produce two dimensional representations which are invariant to time-shift, frequency-shift and scale changes. In addition, two dimensional objects such as images can be represented in a like manner in a four dimensional form. Even so, remaining extraneous variations often defeat the pattern classification approach. This paper presents a method based on noise subspace concepts. The noise subspace enhancement allows one to separate the desired invariant forms from extraneous variations, yielding much improved classification results. Examples from sound classification are discussed.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing

Robert M. Nickel; Ramón Fernández Astudillo; Dorothea Kolossa; Rainer Martin

We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiaos method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.


international conference on acoustics, speech, and signal processing | 2006

A Novel Approach to Automated Source Separation in Multispeaker Environments

Robert M. Nickel; Ananth N. Iyer

We are proposing a new approach to the solution of the cocktail party problem (CPP). The goal of the CPP is to isolate the speech signals of individuals who are concurrently talking while being recorded with a properly positioned microphone array. The new approach provides a powerful yet simple alternative to commonly used methods for the separation of speakers. It is based on the observation that the estimation of the signal transfer matrix between speakers and microphones is significantly simplified if one can assure that during certain periods of the conversation only one speaker is active while all other speakers are silent. Methods to determine such exclusive activity periods are described and a procedure to estimate the signal transfer matrix is presented. A comparison of the proposed method with other popular source separation methods is drawn. The results show an improved performance of the proposed method over earlier approaches


international conference on acoustics speech and signal processing | 1998

A new signal adaptive approach to positive time-frequency distributions with suppressed interference terms

Robert M. Nickel; Tzu-Hsien Sang; William J. Williams

Quadratic time varying-spectral analysis methods that achieve a high resolution jointly in time and frequency generally suffer from interference terms that obscure the true location of the auto components in the resulting time-frequency representation. Unfortunately, there is no general mathematical model available for an exact distinction between cross-terms and auto-terms. Consequently an attempt to suppress interference can only rely on a few qualitative properties which are commonly associated with cross terms. Most of the reduced interference distributions that have been developed so far exploit the fact that cross terms tend to oscillate and can hence be suppressed by a properly chosen two-dimensional low pass filter. Besides the fact that cross-terms oscillate, they are also known to be responsible for all negative density values of a time-frequency distribution. None of the currently existing methods addresses this characteristic. In this paper we introduce an entirely new approach that achieves a significant interference reduction by specifically exploiting the negative density structure of cross-terms.


conference of the international speech communication association | 2016

Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR.

Sebastian Gergen; Steffen Zeiler; Ahmed Hussen Abdelaziz; Robert M. Nickel; Dorothea Kolossa

Automatic speech recognition (ASR) enables very intuitive human-machine interaction. However, signal degradations due to reverberation or noise reduce the accuracy of audio-based recognition. The introduction of a second signal stream that is not affected by degradations in the audio domain (e.g., a video stream) increases the robustness of ASR against degradations in the original domain. Here, depending on the signal quality of audio and video at each point in time, a dynamic weighting of both streams can optimize the recognition performance. In this work, we introduce a strategy for estimating optimal weights for the audio and video streams in turbo-decodingbased ASR using a discriminative cost function. The results show that turbo decoding with this maximally discriminative dynamic weighting of information yields higher recognition accuracy than turbo-decoding-based recognition with fixed stream weights or optimally dynamically weighted audiovisual decoding using coupled hidden Markov models.


Journal of The Franklin Institute-engineering and Applied Mathematics | 2000

On local time–frequency features of speech and their employment in speaker verification☆

Robert M. Nickel; William J. Williams

Abstract Commonly used robust speaker verification systems 1 are based on time-varying autoregressive spectral estimation (AR) combined with hidden Markov modeling (HMM) or dynamic time warping (DTW). An exhaustive optimization of these methods in the past has culminated in quite reliable verification schemes. It seems unlikely, though, that further significant improvements are readily obtained along the same path. While short-time AR-modeling focuses on the time-varying spectral envelope of an utterance, we are introducing a new method that focuses on high-resolution estimates of the time-varying spectral structure of individual pitch periods. The new method employs reduced interference time–frequency distributions (RIDs) in combination with a scale and translation invariant pattern recognition technique (STIR). The new method by itself does not deliver better results than commonly used techniques; however, it is shown that an acceptance/rejection decision derived from both AR-DTW and RID–STIR features greatly improves the performance of the verification system.


international conference on acoustics, speech, and signal processing | 2017

Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates

Hendrik Meutzner; Ning Ma; Robert M. Nickel; Christopher Schymura; Dorothea Kolossa

Audio-visual speech recognition is a promising approach to tackling the problem of reduced recognition rates under adverse acoustic conditions. However, finding an optimal mechanism for combining multi-modal information remains a challenging task. Various methods are applicable for integrating acoustic and visual information in Gaussian-mixture-model-based speech recognition, e.g., via dynamic stream weighting. The recent advances of deep neural network (DNN)-based speech recognition promise improved performance when using audio-visual information. However, the question of how to optimally integrate acoustic and visual information remains. In this paper, we propose a state-based integration scheme that uses dynamic stream weights in DNN-based audio-visual speech recognition. The dynamic weights are obtained from a time-variant reliability estimate that is derived from the audio signal. We show that this state-based integration is superior to early integration of multi-modal features, even if early integration also includes the proposed reliability estimate. Furthermore, the proposed adaptive mechanism is able to outperform a fixed weighting approach that exploits oracle knowledge of the true signal-to-noise ratio.


international conference on acoustics, speech, and signal processing | 2012

Inventory-style speech enhancement with uncertainty-of-observation techniques

Robert M. Nickel; Ramón Fernández Astudillo; Dorothea Kolossa; Steffen Zeiler; Rainer Martin

We present a new method for inventory-style speech enhancement that significantly improves over earlier approaches [1]. Inventory-style enhancement attempts to resynthesize a clean speech signal from a noisy signal via corpus-based speech synthesis. The advantage of such an approach is that one is not bound to trade noise suppression against signal distortion in the same way that most traditional methods do. A significant improvement in perceptual quality is typically the result. Disadvantages of this new approach, however, include speaker dependency, increased processing delays, and the necessity of substantial system training. Earlier published methods relied on a-priori knowledge of the expected noise type during the training process [1]. In this paper we present a new method that exploits uncertainty-of-observation techniques to circumvent the need for noise specific training. Experimental results show that the new method is not only able to match, but outperform the earlier approaches in perceptual quality.

Collaboration


Dive into the Robert M. Nickel's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiaoqiang Xiao

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge