Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Aleksej Chinaev is active.

Publication


Featured researches published by Aleksej Chinaev.


ieee automatic speech recognition and understanding workshop | 2015

BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge

Jahn Heymann; Lukas Drude; Aleksej Chinaev

We present a new beamformer front-end for Automatic Speech Recognition and apply it to the 3rd-CHiME Speech Separation and Recognition Challenge. Without any further modification of the back-end, we achieve a 53% relative reduction of the word error rate over the best baseline enhancement system for the relevant test data set. Our approach leverages the power of a bi-directional Long Short-Term Memory network to robustly estimate soft masks for a subsequent beamforming step. The utilized Generalized Eigenvalue beamforming operation with an optional Blind Analytic Normalization does not rely on a Direction-of-Arrival estimate and can cope with multi-path sound propagation, while at the same time only introducing very limited speech distortions. Our quite simple setup exploits the possibilities provided by simulated training data while still being able to generalize well to the fairly different real data. Finally, combining our front-end with data augmentation and another language model nearly yields a 64 % reduction of the word error rate on the real data test set.


international conference on acoustics, speech, and signal processing | 2014

Source counting in speech mixtures using a variational EM approach for complex WATSON mixture models

Lukas Drude; Aleksej Chinaev; Dang Hai Tran Vu

In this contribution we derive a variational EM (VEM) algorithm for model selection in complex Watson mixture models, which have been recently proposed as a model of the distribution of normalized microphone array signals in the short-time Fourier transform domain. The VEM algorithm is applied to count the number of active sources in a speech mixture by iteratively estimating the mode vectors of the Watson distributions and suppressing the signals from the corresponding directions. A key theoretical contribution is the derivation of the MMSE estimate of a quadratic form involving the mode vector of the Watson distribution. The experimental results demonstrate the effectiveness of the source counting approach at moderately low SNR. It is further shown that the VEM algorithm is more robust with respect to used threshold values.


international workshop on acoustic signal enhancement | 2014

Towards online source counting in speech mixtures applying a variational EM for complex Watson mixture models

Lukas Drude; Aleksej Chinaev; Dang Hai Tran Vu

This contribution describes a step-wise source counting algorithm to determine the number of speakers in an offline sce-nario. Each speaker is identified by a variational expectation maximization (VEM) algorithm for complex Watson mixture models and therefore directly yields beamforming vectors for a subsequent speech separation process. An observation selection criterion is proposed which improves the robustness of the source counting in noise. The algorithm is compared to an alternative VEM approach with Gaussian mixture models based on directions of arrival and shown to deliver improved source counting accuracy. The article concludes by extending the offline algorithm towards a low-latency online estimation of the number of active sources from the streaming input data.


international conference on acoustics, speech, and signal processing | 2012

Improved noise power spectral density tracking by a MAP-based postprocessor

Aleksej Chinaev; Alexander Krueger; Dang Hai Tran Vu

In this paper we present a novel noise power spectral density tracking algorithm and its use in single-channel speech enhancement. It has the unique feature that it is able to track the noise statistics even if speech is dominant in a given time-frequency bin. As a consequence it can follow non-stationary noise superposed by speech, even in the critical case of rising noise power. The algorithm requires an initial estimate of the power spectrum of speech and is thus meant to be used as a postprocessor to a first speech enhancement stage. An experimental comparison with a state-of-the-art noise tracking algorithm demonstrates lower estimation errors under low SNR conditions and smaller fluctuations of the estimated values, resulting in improved speech quality as measured by PESQ scores.


international conference on acoustics, speech, and signal processing | 2013

Improved single-channel nonstationary noise tracking by an optimized MAP-based postprocessor

Aleksej Chinaev; Jalal Taghia; Rainer Martin

In this paper we present an improved version of the recently proposed Maximum A-Posteriori (MAP) based noise power spectral density estimator. An empirical bias compensation and bandwidth adjustment reduce bias and variance of the noise variance estimates. The main advantage of the MAP-based postprocessor is its low estimation variance. The estimator is employed in the second stage of a two-stage single-channel speech enhancement system, where eight different state-of-the-art noise tracking algorithms were tested in the first stage. While the postprocessor hardly affects the results in stationary noise scenarios, it becomes the more effective the more nonstationary the noise is. The proposed postprocessor was able to improve all systems in babble noise w.r.t. the perceptual evaluation of speech quality performance.


conference of the international speech communication association | 2016

A priori SNR Estimation Using a Generalized Decision Directed Approach.

Aleksej Chinaev

In this contribution we investigate a priori signal-to-noise ratio (SNR) estimation, a crucial component of a single-channel speech enhancement system based on spectral subtraction. The majority of the state-of-the art a priori SNR estimators work in the power spectral domain, which is, however, not confirmed to be the optimal domain for the estimation. Motivated by the generalized spectral subtraction rule, we show how the estimation of the a priori SNR can be formulated in the so called generalized SNR domain. This formulation allows to generalize the widely used decision directed (DD) approach. An experimental investigation with different noise types reveals the superiority of the generalized DD approach over the conventional DD approach in terms of both the mean opinion score - listening quality objective measure and the output global SNR in the medium to high input SNR regime, while we show that the power spectrum is the optimal domain for low SNR. We further develop a parameterization which adjusts the domain of estimation automatically according to the estimated input global SNR. Index Terms: single-channel speech enhancement, a priori SNR estimation, generalized spectral subtraction


international conference on acoustics, speech, and signal processing | 2013

Map-based estimation of the parameters of a Gaussian Mixture Model in the presence of noisy observations

Aleksej Chinaev

In this contribution we derive the Maximum A-Posteriori (MAP) estimates of the parameters of a Gaussian Mixture Model (GMM) in the presence of noisy observations. We assume the distortion to be white Gaussian noise of known mean and variance. An approximate conjugate prior of the GMM parameters is derived allowing for a computationally efficient implementation in a sequential estimation framework. Simulations on artificially generated data demonstrate the superiority of the proposed method compared to the Maximum Likelihood technique and to the ordinary MAP approach, whose estimates are corrected by the known statistics of the distortion in a straightforward manner.


international conference on acoustics, speech, and signal processing | 2017

A generalized log-spectral amplitude estimator for single-channel speech enhancement

Aleksej Chinaev

The benefits of both a logarithmic spectral amplitude (LSA) estimation and a modeling in a generalized spectral domain (where short-time amplitudes are raised to a generalized power exponent, not restricted to magnitude or power spectrum) are combined in this contribution to achieve a better tradeoff between speech quality and noise suppression in single-channel speech enhancement. A novel gain function is derived to enhance the logarithmic generalized spectral amplitudes of noisy speech. Experiments on the CHiME-3 dataset show that it outperforms the famous minimum mean squared error (MMSE) LSA gain function of Ephraim and Malah in terms of noise suppression by 1.4 dB, while the good speech quality of the MMSE-LSA estimator is maintained.


itg symposium of speech communication | 2016

A Priori SNR Estimation Using Weibull Mixture Model.

Aleksej Chinaev; Jens Heitkaemper


conference of the international speech communication association | 2015

On Optimal Smoothing in Minimum Statistics Based Noise Tracking

Aleksej Chinaev

Collaboration


Dive into the Aleksej Chinaev's collaboration.

Top Co-Authors

Avatar

Lukas Drude

University of Paderborn

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jahn Heymann

University of Paderborn

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge