Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jort F. Gemmeke is active.

Publication


Featured researches published by Jort F. Gemmeke.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition

Jort F. Gemmeke; Tuomas Virtanen; Antti Hurmalainen

This paper proposes to use exemplar-based sparse representations for noise robust automatic speech recognition. First, we describe how speech can be modeled as a linear combination of a small number of exemplars from a large speech exemplar dictionary. The exemplars are time-frequency patches of real speech, each spanning multiple time frames. We then propose to model speech corrupted by additive noise as a linear combination of noise and speech exemplars, and we derive an algorithm for recovering this sparse linear combination of exemplars from the observed noisy speech. We describe how the framework can be used for doing hybrid exemplar-based/HMM recognition by using the exemplar-activations together with the phonetic information associated with the exemplars. As an alternative to hybrid recognition, the framework also allows us to take a source separation approach which enables exemplar-based feature enhancement as well as missing data mask estimation. We evaluate the performance of these exemplar-based methods in connected digit recognition on the AURORA-2 database. Our results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB. Although not as effective as two baseline recognizers at higher SNRs, the novel approach offers a promising direction of future research on exemplar-based ASR.


international conference on acoustics, speech, and signal processing | 2010

Noise robust exemplar-based connected digit recognition

Jort F. Gemmeke; Tuomas Virtanen

This paper proposes a noise robust exemplar-based speech recognition system where noisy speech is modeled as a linear combination of a set of speech and noise exemplars. The method works by finding a small number of labeled exemplars in a very large collection of speech and noise exemplars that jointly approximate the observed speech signal. We represent the exemplars using melenergies, which allows modeling the summation of speech and noise, and estimate the activations of the exemplars by minimizing the generalized Kullback-Leibler divergence between the observations and the model. The activations of the speech exemplars are directly being used for recognition. This approach proves to be promising, achieving up to 55.8% accuracy at signal-to-noise ratio −5 dB on the AURORA-2 connected digit recognition task.


IEEE Signal Processing Magazine | 2012

Exemplar-Based Processing for Speech Recognition: An Overview

Tara N. Sainath; Bhuvana Ramabhadran; David Nahamoo; Dimitri Kanevsky; Dirk Van Compernolle; Kris Demuynck; Jort F. Gemmeke; Jerome R. Bellegarda; Shiva Sundaram

Solving real-world classification and recognition problems requires a principled way of modeling the physical phenomena generating the observed data and the uncertainty in it. The uncertainty originates from the fact that many data generation aspects are influenced by nondirectly measurable variables or are too complex to model and hence are treated as random fluctuations. For example, in speech production, uncertainty could arise from vocal tract variations among different people or corruption by noise. The goal of modeling is to establish a generalization from the set of observed data such that accurate inference (classification, decision, recognition) can be made about the data yet to be observed, which we refer to as unseen data.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio

Tuomas Virtanen; Jort F. Gemmeke; Bhiksha Raj

This paper proposes a computationally efficient algorithm for estimating the non-negative weights of linear combinations of the atoms of large-scale audio dictionaries, so that the generalized Kullback-Leibler divergence between an audio observation and the model is minimized. This linear model has been found useful in many audio signal processing tasks, but the existing algorithms are computationally slow when a large number of atoms is used. The proposed algorithm is based on iteratively updating a set of active atoms, with the weights updated using the Newton method and the step size estimated such that the weights remain non-negative. Algorithm convergence evaluations on representing audio spectra that are mixtures of two speakers show that with all the tested dictionary sizes the proposed method reaches a much lower value of the divergence than can be obtained by conventional algorithms, and is up to 8 times faster. A source separation evaluation revealed that when using large dictionaries, the proposed method produces a better separation quality in less time.


IEEE Signal Processing Magazine | 2015

Compositional Models for Audio Processing: Uncovering the structure of sound mixtures

Tuomas Virtanen; Jort F. Gemmeke; Bhiksha Raj; Paris Smaragdis

Many classes of data are composed as constructive combinations of parts. By constructive combination, we mean additive combination that does not result in subtraction or diminishment of any of the parts. We will refer to such data as compositional data. Typical examples include population or counts data, where the total count of a population is obtained as the sum of counts of subpopulations. To characterize such data, various mathematical models have been developed in the literature. These models, in conformance with the nature of the data, represent them as nonnegative linear combinations of parts, which themselves are also nonnegative to ensure that such a combination does not result in subtraction or diminishment. We will refer to such models as compositional models.


international conference on acoustics, speech, and signal processing | 2012

Non-negative matrix factorization for highly noise-robust ASR: To enhance or to recognize?

Felix Weninger; Martin Wöllmer; Jürgen T. Geiger; Björn W. Schuller; Jort F. Gemmeke; Antti Hurmalainen; Tuomas Virtanen; Gerhard Rigoll

This paper proposes a multi-stream speech recognition system that combines information from three complementary analysis methods in order to improve automatic speech recognition in highly noisy and reverberant environments, as featured in the 2011 PASCAL CHiME Challenge. We integrate word predictions by a bidirectional Long Short-Term Memory recurrent neural network and non-negative sparse classification (NSC) into a multi-stream Hidden Markov Model using convolutive non-negative matrix factorization (NMF) for speech enhancement. Our results suggest that NMF-based enhancement and NSC are complementary despite their overlap in methodology, reaching up to 91.9% average keyword accuracy on the Challenge test set at signal-to-noise ratios from -6 to 9 dB-the best result reported so far on these data.


IEEE Transactions on Audio, Speech, and Language Processing | 2015

Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback–Leibler divergence

Meng Sun; Yinan Li; Jort F. Gemmeke; Xiongwei Zhang

A key stage in speech enhancement is noise estimation which usually requires prior models for speech or noise or both. However, prior models can sometimes be difficult to obtain. In this paper, without any prior knowledge of speech and noise, sparse and low-rank nonnegative matrix factorization (NMF) with Kullback-Leibler divergence is proposed to noise and speech estimation by decomposing the input noisy magnitude spectrogram into a low-rank noise part and a sparse speech-like part. This initial unsupervised speech-noise estimation allows us to set a subsequent regularized version of NMF or convolutional NMF to reconstruct the noise and speech spectrogram, either by estimating a speech dictionary on the fly (categorized as unsupervised approaches) or by using a pre-trained speech dictionary on utterances with disjoint speakers (categorized as semi-supervised approaches). Information fusion was investigated by taking the geometric mean of the outputs from multiple enhancement algorithms. The performance of the algorithms were evaluated on five metrics (PESQ, SDR, SNR, STOI, and OVERALL) by making experiments on TIMIT with 15 noise types. The geometric means of the proposed unsupervised approaches outperformed spectral subtraction (SS), minimum mean square estimation (MMSE) under low input SNR conditions. All the proposed semi-supervised approaches showed superiority over SS and MMSE and also obtained better performance than the state-of-the-art algorithms which utilized a prior noise or speech dictionary under low SNR conditions.


workshop on applications of signal processing to audio and acoustics | 2013

An exemplar-based NMF approach to audio event detection

Jort F. Gemmeke; Lode Vuegen; Peter Karsmakers; Bart Vanrumste; Hugo Van hamme

We present a novel, exemplar-based method for audio event detection based on non-negative matrix factorisation. Building on recent work in noise robust automatic speech recognition, we model events as a linear combination of dictionary atoms, and mixtures as a linear combination of overlapping events. The weights of activated atoms in an observation serve directly as evidence for the underlying event classes. The atoms in the dictionary span multiple frames and are created by extracting all possible fixed-length exemplars from the training data. To combat data scarcity of small training datasets, we propose to artificially augment the amount of training data by linear time warping in the feature domain at multiple rates. The method is evaluated on the Office Live and Office Synthetic datasets released by the AASP Challenge on Detection and Classification of Acoustic Scenes and Events.


international conference on acoustics, speech, and signal processing | 2011

Non-negative matrix deconvolution in noise robust speech recognition

Antti Hurmalainen; Jort F. Gemmeke; Tuomas Virtanen

High noise robustness has been achieved in speech recognition by using sparse exemplar-based methods with spectrogram windows spanning up to 300 ms. A downside is that a large exemplar dictionary is required to cover sufficiently many spectral patterns and their temporal alignments within windows. We propose a recognition system based on a shift-invariant convolutive model, where exemplar activations at all the possible temporal positions jointly reconstruct an utterance. Recognition rates are evaluated using the AURORA-2 database, containing spoken digits with noise ranging from clean speech to −5 dB SNR. We obtain results superior to those, where the activations were found independently for each overlapping window.


Computer Speech & Language | 2013

Modelling non-stationary noise with spectral factorisation in automatic speech recognition

Antti Hurmalainen; Jort F. Gemmeke; Tuomas Virtanen

Speech recognition systems intended for everyday use must be able to cope with a large variety of noise types and levels, including highly non-stationary multi-source mixtures. This study applies spectral factorisation algorithms and long temporal context for separating speech and noise from mixed signals. To adapt the system to varying environments, noise models are acquired from the context, or learnt from the mixture itself without prior information. We also propose methods for reducing the size of the bases used for speech and noise modelling by 20-40 times for better practical applicability. We evaluate the performance of the methods both as a standalone classifier and as a signal-enhancing front-end for external recognisers. For the CHiME noisy speech corpus containing non-stationary multi-source household noises at signal-to-noise ratios ranging from +9 to -6dB, we report average keyword recognition rates up to 87.8% using a single-stream sparse classification algorithm.

Collaboration


Dive into the Jort F. Gemmeke's collaboration.

Top Co-Authors

Avatar

Hugo Van hamme

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Tuomas Virtanen

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Bert Cranen

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Antti Hurmalainen

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Bart Ons

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Lou Boves

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar

Emre Yilmaz

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Louis ten Bosch

Radboud University Nijmegen

View shared research outputs
Researchain Logo
Decentralizing Knowledge