Joris Driesen
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joris Driesen.
Neurocomputing | 2011
Joris Driesen; H. Van Hamme
During the early stages of language acquisition, young infants face the task of learning a basic vocabulary without the aid of prior linguistic knowledge. Attempts have been made to model this complex behaviour computationally, using a variety of machine learning algorithms, a.o. non-negative matrix factorization (NMF). In this paper, we replace NMF in a vocabulary learning setting with a conceptually similar algorithm, probabilistic latent semantic analysis (PLSA), which can learn word representations incrementally by Bayesian updating. We further show that this learning framework is capable of modelling certain cognitive behaviours, e.g. forgetting, in a simple way.
international conference on acoustics, speech, and signal processing | 2012
Joris Driesen; Jort F. Gemmeke; Hugo Van hamme
When applied to speech, Non-negative Matrix Factorization is capable of learning a small vocabulary of words, foregoing any prior linguistic knowledge. This makes it adequate for small-scale speech applications where flexibility is of the utmost importance, e.g. assistive technology for the speech impaired. However, its performance depends on the way its inputs are represented. We propose the use of exemplar-based sparse representations of speech, and explore the influence of some of these representations basic parameters, such as the total number of exemplars considered and the sparseness imposed on them. We show that the resulting learning performance compares favorably with those of previously proposed approaches.
ieee automatic speech recognition and understanding workshop | 2013
Joris Driesen; Steve Renals
Since subtitling television content is a costly process, there are large potential advantages to automating it, using automatic speech recognition (ASR). However, training the necessary acoustic models can be a challenge, since the available training data usually lacks verbatim orthographic transcriptions. If there are approximate transcriptions, this problem can be overcome using light supervision methods. In this paper, we perform speech recognition on broadcasts of Weatherview, BBCs daily weather report, as a first step towards automatic subtitling. For training, we use a large set of past broadcasts, using their manually created subtitles as approximate transcriptions. We discuss and and compare two different light supervision methods, applying them to this data. The best training set finally obtained with these methods is used to create a hybrid deep neural network-based recognition system, which yields high recognition accuracies on three separate Weatherview evaluation sets.
text speech and dialogue | 2009
Louis ten Bosch; Joris Driesen; Hugo Van hamme; Lou Boves
The discovery of words by young infants involves two interrelated processes: (a) the detection of recurrent word-like acoustic patterns in the speech signal, and (b) cross-modal association between auditory and visual information. This paper describes experimental results obtained by a computational model that simulates these two processes. The model is able to build word-like representations on the basis of multimodal input data (stimuli) without the help of an a priori specified lexicon. Each input stimulus consists of a speech signal accompanied by an abstract visual representation of the concepts referred to in the speech signal. In this paper we investigate how internal representations generalize across speakers. In doing so, we also analyze the cognitive plausibility of the model.
international conference on acoustics, speech, and signal processing | 2012
Joris Driesen; Hugo Van hamme
A speech recognition system that automatically learns word models for a small vocabulary from examples of its usage, without using prior linguistic information, can be of great use in cognitive robotics, human-machine interfaces, and assistive devices. In the latter case, the users speech capabilities may also be affected. In this paper, we consider a NMF-based learning framework capable of doing this, and experimentally show that its learning rate crucially depends on how the speech data is represented. Higher-level units of speech, which hide some of the complex variability of the acoustics, are found to yield faster learning rates.
Signal Processing | 2012
Joris Driesen; H. Van Hamme
Discovering structure within a collection of high-dimensional input vectors is a problem that often recurs in the area of machine learning. A very suitable and widely used algorithm for solving such tasks is Non-negative Matrix Factorization (NMF). The high-dimensional vectors are arranged as columns in a data matrix, which is decomposed into two non-negative matrix factors of much lower rank. Here, we adopt the NMF learning scheme proposed by Van hamme (2008) [1]. It involves combining the training data with supervisory data, which imposes the low-dimensional structure known to be present. The reconstruction of such supervisory data on previously unseen inputs then reveals their underlying structure in an explicit way. It has been noted that for many problems, not all features of the training data correlate equally well with the underlying structure. In other words, some features are relevant for detecting patterns in the data, while others are not. In this paper, we propose an algorithm that builds upon the learning scheme of Van hamme (2008) [1], and automatically weights each input feature according to its relevance. Applications include both data improvement and feature selection. We experimentally show that our algorithm outperforms similar techniques on both counts.
spoken language technology workshop | 2010
Joris Driesen; Hugo Van hamme; W. Bastiaan Kleijn
Computional learning from multimodal data is often done with matrix factorization techniques such as NMF (Non-negative Matrix Factorization), pLSA (Probabilistic Latent Semantic Analysis) or LDA (Latent Dirichlet Allocation). The different modalities of the input are to this end converted into features that are easily placed in a vectorized format. An inherent weakness of such a data representation is that only a subset of these data features actually aids the learning. In this paper, we first describe a simple NMF-based recognition framework operating on speech and image data. We then propose and demonstrate a novel algorithm that scales the inputs of this framework in order to optimize its recognition performance.
language resources and evaluation | 2008
Catia Cucchiarini; Joris Driesen; H. Van Hamme; Eric Sanders
conference of the international speech communication association | 2012
Jort F. Gemmeke; Janneke van de Loo; Guy De Pauw; Joris Driesen; Hugo Van hamme; Walter Daelemans
Fundamenta Informaticae | 2009
Joris Driesen; Louis ten Bosch; Hugo Van hamme