Joris Driesen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joris Driesen is active.

Explore More

Publication

Featured researches published by Joris Driesen.

Neurocomputing | 2011

Modelling vocabulary acquisition, adaptation and generalization in infants using adaptive Bayesian PLSA

Joris Driesen; H. Van Hamme

During the early stages of language acquisition, young infants face the task of learning a basic vocabulary without the aid of prior linguistic knowledge. Attempts have been made to model this complex behaviour computationally, using a variety of machine learning algorithms, a.o. non-negative matrix factorization (NMF). In this paper, we replace NMF in a vocabulary learning setting with a conceptually similar algorithm, probabilistic latent semantic analysis (PLSA), which can learn word representations incrementally by Bayesian updating. We further show that this learning framework is capable of modelling certain cognitive behaviours, e.g. forgetting, in a simple way.

international conference on acoustics, speech, and signal processing | 2012

Weakly supervised keyword learning using sparse representations of speech

Joris Driesen; Jort F. Gemmeke; Hugo Van hamme

When applied to speech, Non-negative Matrix Factorization is capable of learning a small vocabulary of words, foregoing any prior linguistic knowledge. This makes it adequate for small-scale speech applications where flexibility is of the utmost importance, e.g. assistive technology for the speech impaired. However, its performance depends on the way its inputs are represented. We propose the use of exemplar-based sparse representations of speech, and explore the influence of some of these representations basic parameters, such as the total number of exemplars considered and the sparseness imposed on them. We show that the resulting learning performance compares favorably with those of previously proposed approaches.

ieee automatic speech recognition and understanding workshop | 2013

Lightly supervised automatic subtitling of weather forecasts

Joris Driesen; Steve Renals

Since subtitling television content is a costly process, there are large potential advantages to automating it, using automatic speech recognition (ASR). However, training the necessary acoustic models can be a challenge, since the available training data usually lacks verbatim orthographic transcriptions. If there are approximate transcriptions, this problem can be overcome using light supervision methods. In this paper, we perform speech recognition on broadcasts of Weatherview, BBCs daily weather report, as a first step towards automatic subtitling. For training, we use a large set of past broadcasts, using their manually created subtitles as approximate transcriptions. We discuss and and compare two different light supervision methods, applying them to this data. The best training set finally obtained with these methods is used to create a hybrid deep neural network-based recognition system, which yields high recognition accuracies on three separate Weatherview evaluation sets.

text speech and dialogue | 2009

On a Computational Model for Language Acquisition: Modeling Cross-Speaker Generalisation

Louis ten Bosch; Joris Driesen; Hugo Van hamme; Lou Boves

The discovery of words by young infants involves two interrelated processes: (a) the detection of recurrent word-like acoustic patterns in the speech signal, and (b) cross-modal association between auditory and visual information. This paper describes experimental results obtained by a computational model that simulates these two processes. The model is able to build word-like representations on the basis of multimodal input data (stimuli) without the help of an a priori specified lexicon. Each input stimulus consists of a speech signal accompanied by an abstract visual representation of the concepts referred to in the speech signal. In this paper we investigate how internal representations generalize across speakers. In doing so, we also analyze the cognitive plausibility of the model.

international conference on acoustics, speech, and signal processing | 2012

Fast word acquisition in an NMF-based learning framework

Joris Driesen; Hugo Van hamme

A speech recognition system that automatically learns word models for a small vocabulary from examples of its usage, without using prior linguistic information, can be of great use in cognitive robotics, human-machine interfaces, and assistive devices. In the latter case, the users speech capabilities may also be affected. In this paper, we consider a NMF-based learning framework capable of doing this, and experimentally show that its learning rate crucially depends on how the speech data is represented. Higher-level units of speech, which hide some of the complex variability of the acoustics, are found to yield faster learning rates.

Signal Processing | 2012

Supervised input space scaling for non-negative matrix factorization

Joris Driesen; H. Van Hamme

Discovering structure within a collection of high-dimensional input vectors is a problem that often recurs in the area of machine learning. A very suitable and widely used algorithm for solving such tasks is Non-negative Matrix Factorization (NMF). The high-dimensional vectors are arranged as columns in a data matrix, which is decomposed into two non-negative matrix factors of much lower rank. Here, we adopt the NMF learning scheme proposed by Van hamme (2008) [1]. It involves combining the training data with supervisory data, which imposes the low-dimensional structure known to be present. The reconstruction of such supervisory data on previously unseen inputs then reveals their underlying structure in an explicit way. It has been noted that for many problems, not all features of the training data correlate equally well with the underlying structure. In other words, some features are relevant for detecting patterns in the data, while others are not. In this paper, we propose an algorithm that builds upon the learning scheme of Van hamme (2008) [1], and automatically weights each input feature according to its relevance. Applications include both data improvement and feature selection. We experimentally show that our algorithm outperforms similar techniques on both counts.

spoken language technology workshop | 2010

Learning from images and speech with Non-negative Matrix Factorization enhanced by input space scaling

Joris Driesen; Hugo Van hamme; W. Bastiaan Kleijn

Computional learning from multimodal data is often done with matrix factorization techniques such as NMF (Non-negative Matrix Factorization), pLSA (Probabilistic Latent Semantic Analysis) or LDA (Latent Dirichlet Allocation). The different modalities of the input are to this end converted into features that are easily placed in a vectorized format. An inherent weakness of such a data representation is that only a subset of these data features actually aids the learning. In this paper, we first describe a simple NMF-based recognition framework operating on speech and image data. We then propose and demonstrate a novel algorithm that scales the inputs of this framework in order to optimize its recognition performance.

language resources and evaluation | 2008