W. Bastiaan Kleijn | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where W. Bastiaan Kleijn is active.

Explore More

Publication

Featured researches published by W. Bastiaan Kleijn.

systems man and cybernetics | 2011

Graph-Preserving Sparse Nonnegative Matrix Factorization With Application to Facial Expression Recognition

Ruicong Zhi; Markus Flierl; Qiuqi Ruan; W. Bastiaan Kleijn

In this paper, a novel graph-preserving sparse nonnegative matrix factorization (GSNMF) algorithm is proposed for facial expression recognition. The GSNMF algorithm is derived from the original NMF algorithm by exploiting both sparse and graph-preserving properties. The latter may contain the class information of the samples. Therefore, GSNMF can be conducted as an unsupervised or a supervised dimension reduction method. A sparse representation of the facial images is obtained by minimizing the -norm of the basis images. Furthermore, according to the graph embedding theory, the neighborhood of the samples is preserved by retaining the graph structure in the mapped space. The GSNMF decomposition transforms the high-dimensional facial expression images into a locality-preserving subspace with sparse representation. To guarantee convergence, we use the projected gradient method to calculate the nonnegative solution of GSNMF. Experiments are conducted on the JAFFE database and the Cohn-Kanade database with unoccluded and partially occluded facial images. The results show that the GSNMF algorithm provides better facial representations and achieves higher recognition rates than nonnegative matrix factorization. Moreover, GSNMF is also more robust to partial occlusions than other tested methods.

European Transactions on Telecommunications | 2010

The RCELP speech‐coding algorithm

W. Bastiaan Kleijn; Peter Kroon; Dror Nahumi

At bit rates between 4 and 16 kbit/s, many state-of-the-art speech coding algorithms fall into the class of linear-prediction based analysis-by-synthesis (LPAS) speech coders. At the lower bit rates the waveform-matching, on which LPAS coders rely, constrains the speech quality. To overcome this drawback, we present a coder (RCELP) that uses a generalization of the analysis-by-synthesis paradigm. This generalization relaxes the waveform-matching constraints without affecting speech quality. We describe several implementations at bit rates between 4 and 6 kbit/s. MOS tests show that a 6 kbit/s RCELP has a quality similar or better than the 13 kbit/s GSM full-rate coder, and a 4.4 kbit/s RCELP has a speech quality significantly better than the 4.8 kbit/s FS1016 standard.

international conference on acoustics, speech, and signal processing | 1995

Spectral dynamics is more important than spectral distortion

H. Petter Knagenhjelm; W. Bastiaan Kleijn

Linear prediction coefficients are used to describe the power-spectrum envelope in the majority of low-bit-rate coders. The performance of quantizers for the linear-prediction coefficients is generally evaluated in terms of spectral distortion. This paper shows that the audible distortion in low-bit-rate coders is often more a function of the dynamics of the power-spectrum envelope than of the spectral distortion as usually evaluated. Smoothing the evolution of the power-spectrum envelope over time increases the reconstructed speech quality. A reasonable objective is to find the smoothest path that keeps the quantized parameters within the Voronoi regions associated with the transmitted quantization index. We demonstrate increased quantizer performance by such smoothing of the line-spectral frequencies.Linear prediction coefficients are used to describe the power-spectrum envelope in the majority of low-bit-rate coders. The performance of quantizers for the linear-prediction coefficients is generally evaluated in terms of spectral distortion. This paper shows that the audible distortion in low-bit-rate coders is often more a function of the dynamics of the power-spectrum envelope than of the spectral distortion as usually evaluated. Smoothing the evolution of the power-spectrum envelope over time increases the reconstructed speech quality. A reasonable objective is to find the smoothest path that keeps the quantized parameters within the Voronoi regions associated with the transmitted quantization index. We demonstrate increased quantizer performance by such smoothing of the line-spectral frequencies.

Journal of the Acoustical Society of America | 1993

On the use of neural networks in articulatory speech synthesis

Mazin G. Rahim; Colin C. Goodyear; W. Bastiaan Kleijn; Juergen Schroeter; Man Mohan Sondhi

A long‐standing problem in the analysis and synthesis of speech by articulatory description is the estimation of the vocal tract shape parameters from natural input speech. Methods to relate spectral parameters to articulatory positions are feasible if a sufficiently large amount of data is available. This, however, results in a high computational load and large memory requirements. Further, one needs to accommodate ambiguities in this mapping due to the nonuniqueness problem (i.e., several vocal tract shapes can result in identical spectral envelopes). This paper describes the use of artificial neural networks for acoustic to articulatory parameter mapping. Experimental results show that a single feed‐forward neural net is unable to perform this mapping sufficiently well when trained on a large data set. An alternative procedure is proposed, based on an assembly of neural networks. Each network is designated to a specific region in the articulatory space, and performs a mapping from cepstral values into ...

international conference on acoustics speech and signal processing | 1999

On speech coding in a perceptual domain

Gernot Kubin; W. Bastiaan Kleijn

For speech coders which fall within the class of waveform coders, the reconstructed signal approaches the original with increasing bit rate. In such coders, the distortion criterion generally operates on the speech signal or a signal obtained by adaptive linear filtering of the speech signal. To satisfy computational and delay constraints, the distortion criterion must be reduced to a very simple approximation of the auditory system. This drawback of conventional approaches motivates a new speech coding paradigm in which the coding is performed in a domain where the single-letter squared-error criterion forms an accurate representation of perception. The new paradigm requires a model of the auditory periphery which is accurate, can be be inverted with relatively low computational effort, and which represents the signal with relatively few parameters. We develop such a model of the auditory periphery and discuss its suitability for speech coding. The results indicate that the new paradigm in general and our auditory model in particular form a promising basis for the coding of both speech and audio at low bit rates.

Digital Signal Processing | 1991

Methods for waveform interpolation in speech coding

W. Bastiaan Kleijn; Wolfgang Granzow

Most speech coding algorithms operating at rates of around 8 kb/s attempt to reproduce the original speech waveform. Their efficiency of reproducing the waveform is obtained by using models which exploit knowledge of the generation of the speech signal. In contrast, most coders operating at rates of around 2.4 kb/s are completely parametric, usually transmitting parameters describing the pitch and the spectral envelope at regular intervals. However, because of model inadequacies, the quality of reconstruction of current parametric methods never reaches that of the original signal, even at high bit rates. In this paper, a new method which is positioned between the waveform coders and the parametric coders is presented. It is based on the assumption that, for voiced speech, a perceptually accurate speech signal can be reconstructed from a description of the waveform of a single, representative pitch cycle per interval of 20-30 ms. Figure 1 shows the smooth evolution of the shape of the pitch cycle, which is typical for voiced speech signals. We will show how such a signal can be reconstructed by interpolatingprototype pitch cycles between the updates. The prototypewaveform interpolation (PWI) method retains the natural quality typical of coders which encode the entire waveform, but requires a bit rate close to that of the parametric coders. We discuss PWI methods based on linear prediction (LP). In LP-based speech coders, the signal is reconstructed from knowledge of the predictor coefficients and a description of the excitation signal. Of the existing LP-based algorithms, the code-excited linear-prediction (CELP) algorithm [l] and the LP vocoder [ 2 ] are examples of waveform and parametric coders, respectively. In the simplest form of CELP the speech waveform is described by time-varying LP filter coefficients and a filter excitation consisting of the concatenation of scaled fixed-length vectors from a codebook. To achieve high efficiency during voiced speech, most implementations include a long-term predictor [ 3 1, or adaptive codebook [ 41, to facilitate periodicity of the reconstructed signal. Despite recent improvements [ 5,6], inaccurate reproduction of the periodicity remains the main source of perceptual distortion in the current CELP algorithms at rates below 6 kb/s. In the LP-based vocoders the voiced speech signal is modeled by a single pulse per pitch cycle. Because of excessive periodicity, this often leads to a buzzy character of the reconstructed speech. Recent work has shown that the speech quality can be improved significantly by adding more information about the evolving waveform shape. Using a cluster of pulses for each pitch cycle, with blockwise shape adaptation, in combination with a smoothly varying overall gain produced good results [ 71. Alternatively, good-quality voiced speech can be obtained at rates of around 3 kb/s by careful placement of the single-pulse locations [ 8,9]. Although significantly improved over the LP-based vocoders, and similar in quality to 4.8 kb/s CELP, such single-pulse excited ( SPE) speech coders still suffer from some buzziness. Both the CELP and the SPE methods attempt to reproduce the original waveform by using a (spectrally weighted) signal-to-noise ratio (SNR) of the reconstructed speech signal as a criterion to determine the excitation sequence. However, maintaining the periodicity of the original speech signal is important for its perceptual quality, and maximization of the SNR often leads to a nonoptimal degree of periodicity. Thus, it was found in both the CELP [ 61 and the SPE coders [ 91 that improved speech quality can be obtained by increasing the periodicity, despite an associated reduction in SNR.

IEEE Transactions on Audio, Speech, and Language Processing | 2006

On causal algorithms for speech enhancement

Volodya Grancharov; Jonas Samuelsson; W. Bastiaan Kleijn

Kalman filtering is a powerful technique for the estimation of a signal observed in noise that can be used to enhance speech observed in the presence of acoustic background noise. In a speech communication system, the speech signal is typically buffered for a period of 10-40 ms and, therefore, the use of either a causal or a noncausal filter is possible. We show that the causal Kalman algorithm is in conflict with the basic properties of human perception and address the problem of improving its perceptual quality. We discuss two approaches to improve perceptual performance. The first is based on a new method that combines the causal Kalman algorithm with pre- and postfiltering to introduce perceptual shaping of the residual noise. The second is based on the conventional Kalman smoother. We show that a short lag removes the conflict resulting from the causality constraint and we quantify the minimum lag required for this purpose. The results of our objective and subjective evaluations confirm that both approaches significantly outperform the conventional causal implementation. Of the two approaches, the Kalman smoother performs better if the signal statistics are precisely known, if this is not the case the perceptually weighted Kalman filter performs better.

international conference on computer vision | 2015

Domain Generalization for Object Recognition with Multi-task Autoencoders

Muhammad Ghifary; W. Bastiaan Kleijn; Mengjie Zhang; David Balduzzi

The problem of domain generalization is to take knowledge acquired from a number of related domains, where training data is available, and to then successfully apply it to previously unseen domains. We propose a new feature learning algorithm, Multi-Task Autoencoder (MTAE), that provides good generalization performance for cross-domain object recognition. The algorithm extends the standard denoising autoencoder framework by substituting artificially induced corruption with naturally occurring inter-domain variability in the appearance of objects. Instead of reconstructing images from noisy versions, MTAE learns to transform the original image into analogs in multiple related domains. It thereby learns features that are robust to variations across domains. The learnt features are then used as inputs to a classifier. We evaluated the performance of the algorithm on benchmark image recognition datasets, where the task is to learn features from multiple datasets and to then predict the image label from unseen datasets. We found that (denoising) MTAE outperforms alternative autoencoder-based models as well as the current state-of-the-art algorithms for domain generalization.

pacific rim international conference on artificial intelligence | 2014

Domain Adaptive Neural Networks for Object Recognition

Muhammad Ghifary; W. Bastiaan Kleijn; Mengjie Zhang

We propose a simple neural network model to deal with the domain adaptation problem in object recognition. Our model incorporates the Maximum Mean Discrepancy (MMD) measure as a regularization in the supervised learning to reduce the distribution mismatch between the source and target domains in the latent space. From experiments, we demonstrate that the MMD regularization is an effective tool to provide good domain adaptation models on both SURF features and raw image pixels of a particular image data set. We also show that our proposed model, preceded by the denoising auto-encoder pretraining, achieves better performance than recent benchmark models on the same data sets. This work represents the first study of MMD measure in the context of neural networks.

international conference on acoustics, speech, and signal processing | 2013

Auto-localization in ad-hoc microphone arrays

Nikolay D. Gaubitch; W. Bastiaan Kleijn; Richard Heusdens

We present a method for automatic microphone localization in adhoc microphone arrays. The localization is based on time-of-arrival (TOA) measurements obtained from spatially distributed acoustic events. In practice, measured TOAs are an incomplete representation of the true TOAs due to unknown onset times of the acoustic events and internal delays in the capturing devices and make the localization problem insoluble if not addressed appropriately. The main contribution of the proposed method is an algorithm that identifies and corrects for such internal delays and acoustic event onset times in the measured TOAs. Experimental results using both simulated and real-world data demonstrate the performance of the method and highlight the significance of correct estimation of the internal delays and onset times.

Explore More