Gueorgui Pironkov
University of Mons
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gueorgui Pironkov.
IEEE Transactions on Audio, Speech, and Language Processing | 2017
Sean U. N. Wood; Jean Rouat; Stéphane Dupont; Gueorgui Pironkov
We present a blind source separation algorithm named GCC-NMF that combines unsupervised dictionary learning via non-negative matrix factorization (NMF) with spatial localization via the generalized cross correlation (GCC) method. Dictionary learning is performed on the mixture signal, with separation subsequently achieved by grouping dictionary atoms, at each point in time, according to their spatial origins. The resulting source separation algorithm is simple yet flexible, requiring no prior knowledge or information. Separation quality is evaluated for three tasks using stereo recordings from the publicly available SiSEC signal separation evaluation campaign: 3 and 4 concurrent speakers in reverberant environments, speech mixed with real-world background noise, and noisy recordings of a moving speaker. Performance is quantified using perceptually motivated and SNR-based measures with the PEASS and BSS Eval toolkits, respectively. We evaluate the effects of model parameters on separation quality, and compare our approach with other unsupervised and semi-supervised speech separation and enhancement approaches. We show that GCC-NMF is a flexible source separation algorithm, outperforming task-specific approaches in each of the three settings, including both blind as well as several informed approaches that require prior knowledge or information.
european signal processing conference | 2016
Gueorgui Pironkov; Stéphane Dupont; Thierry Dutoit
In order to address the commonly met issue of overfitting in speech recognition, this article investigates Multi-Task Learning, when the auxiliary task focuses on speaker classification. Overfitting occurs when the amount of training data is limited, leading to an over-sensible acoustic model. Multi-Task Learning is a method, among many other regularization methods, which decreases the overfitting impact by forcing the acoustic model to train jointly for multiple different, but related, tasks. In this paper, we consider speaker classification as an auxiliary task in order to improve the generalization abilities of the acoustic model, by training the model to recognize the speaker, or find the closest one inside the training set. We investigate this Multi-Task Learning setup on the TIMIT database, while the acoustic modeling is performed using a Recurrent Neural Network with Long Short-Term Memory cells.
ieee automatic speech recognition and understanding workshop | 2015
Gueorgui Pironkov; Stéphane Dupont; Thierry Dutoit
We propose an organized sparse deep neural network architecture for automatic speech recognition. The proposed method is inspired by the tonotopic organization in the auditory nerve/cortex. The approach consists of limiting the neurons connections between the hidden layers, in a manner that preserves frequency proximity, resulting in a diffuse integration of the spectral information inside the neural network. This method is put in perspective with related work on sparser neural network architectures for speech recognition (tonotopy, convolutional nets, dropout). The model is trained and tested on the TIMIT database, showing encouraging results compared to the traditional fully connected architecture.
international conference on pattern recognition | 2016
Gueorgui Pironkov; Stéphane Dupont; Thierry Dutoit
Overfitting is a commonly met issue in automatic speech recognition and is especially impacting when the amount of training data is limited. In order to address this problem, this article investigates acoustic modeling through Multi-Task Learning, with two speaker-related auxiliary tasks. Multi-Task Learning is a regularization method which aims at improving the networks generalization ability, by training a unique model to solve several different, but related tasks. In this article, two auxiliary tasks are jointly examined. On the one hand, we consider speaker classification as an auxiliary task by training the acoustic model to recognize the speaker, or find the closest one inside the training set. On the other hand, the acoustic model is also trained to extract i-vectors from the standard acoustic features. I-Vectors are efficiently applied in the speaker identification community in order to characterize a speaker and its acoustic environment. The core idea of using these auxiliary tasks is to give the network an additional inter-speaker awareness, and thus, reduce overfitting.We investigate this Multi-Task Learning setup on the TIMIT database, while the acoustic modeling is performed using a Recurrent Neural Network with Long Short-Term Memory cells.
International Conference on Statistical Language and Speech Processing | 2017
Gueorgui Pironkov; Stéphane Dupont; Sean U. N. Wood; Thierry Dutoit
Dealing with noise deteriorating the speech is still a major problem for automatic speech recognition. An interesting approach to tackle this problem consists of using multi-task learning. In this case, an efficient auxiliary task is clean-speech generation. This auxiliary task is trained in addition to the main speech recognition task and its goal is to help improve the results of the main task. In this paper, we investigate this idea further by generating features extracted directly from the audio file containing only the noise, instead of the clean-speech. After demonstrating that an improvement can be obtained through this multi-task learning auxiliary task, we also show that using both noise and clean-speech estimation auxiliary tasks leads to a 4% relative word error rate improvement in comparison to the classic single-task learning on the CHiME4 dataset.
spoken language technology workshop | 2016
Gueorgui Pironkov; Stéphane Dupont; Thierry Dutoit
I-Vectors have been successfully applied in the speaker identification community in order to characterize the speaker and its acoustic environment. Recently, i-vectors have also shown their usefulness in automatic speech recognition, when concatenated to standard acoustic features. Instead of directly feeding the acoustic model with i-vectors, we here investigate a Multi-Task Learning approach, where a neural network is trained to simultaneously recognize the phone-state posterior probabilities and extract i-vectors, using the standard acoustic features. Multi-Task Learning is a regularization method which aims at improving the networks generalization ability, by training a unique network to solve several different, but related tasks. The core idea of using i-vector extraction as an auxiliary task is to give the network an additional inter-speaker awareness, and thus, reduce overfitting. Overfitting is a commonly met issue in speech recognition and is especially impacting when the amount of training data is limited. The proposed setup is trained and tested on the TIMIT database, while the acoustic modeling is performed using a Recurrent Neural Network with Long Short-Term Memory cells.
Archive | 2018
Gueorgui Pironkov; Sean U. N. Wood; Stéphane Dupont; Thierry Dutoit
In order to properly train an automatic speech recognition system, speech with its annotated transcriptions is required. The amount of real annotated data recorded in noisy and reverberant conditions is extremely limited, especially compared to the amount of data that can be simulated by adding noise to clean annotated speech. Thus, using both real and simulated data is important in order to improve robust speech recognition. Another promising method applied to speech recognition in noisy and reverberant conditions is multi-task learning. A successful auxiliary task consists of generating clean speech features using a regression loss (as a denoising auto-encoder). But this auxiliary task uses as targets clean speech which implies that real data cannot be used. In order to tackle this problem a Hybrid-Task Learning system is proposed. This system switches frequently between multi and single-task learning depending on whether the input is real or simulated data respectively. We show that the relative improvement brought by the proposed hybrid-task learning architecture can reach up to 4.4% compared to the traditional single-task learning approach on the CHiME4 database.
Archive | 2016
Gueorgui Pironkov; Stéphane Dupont; Thierry Dutoit
MediaEval Benchmarking Initiative for Multimedia Evaluation, 2015 | 2015
Omar Seddati; Emre Külah; Gueorgui Pironkov; Stéphane Dupont; Saïd Mahmoudi; Thierry Dutoit
Proceedings of the eNTERFACE 2015 Workshop on Intelligent Interfaces | 2018
Stéphane Dupont; Ozan Can Altiok; Aysegül Bumin; Ceren Dikmen; Ivan Giangreco; Silvan Heller; Emre Külah; Gueorgui Pironkov; Luca Rossetto; Yusuf Sahillioglu; Heiko Schuldt; Omar Seddati; Yusuf Setinkaya; Metin Sezgin; Claudiu Tanase; Emre Toyan; Sean U. N. Wood; Doguhan Yeke