D. Van Compernolle
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by D. Van Compernolle.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
M. De Wachter; Mike Matton; Kris Demuynck; Patrick Wambacq; Ronald Cools; D. Van Compernolle
Despite their known weaknesses, hidden Markov models (HMMs) have been the dominant technique for acoustic modeling in speech recognition for over two decades. Still, the advances in the HMM framework have not solved its key problems: it discards information about time dependencies and is prone to overgeneralization. In this paper, we attempt to overcome these problems by relying on straightforward template matching. The basis for the recognizer is the well-known DTW algorithm. However, classical DTW continuous speech recognition results in an explosion of the search space. The traditional top-down search is therefore complemented with a data-driven selection of candidates for DTW alignment. We also extend the DTW framework with a flexible subword unit mechanism and a class sensitive distance measure-two components suggested by state-of-the-art HMM systems. The added flexibility of the unit selection in the template-based framework leads to new approaches to speaker and environment adaptation. The template matching system reaches a performance somewhat worse than the best published HMM results for the Resource Management benchmark, but thanks to complementarity of errors between the HMM and DTW systems, the combination of both leads to a decrease in word error rate with 17% compared to the HMM results
international conference on acoustics, speech, and signal processing | 1990
D. Van Compernolle
Switching adaptive filters, suitable for speech beamforming, with no prior knowledge about the speech source are presented. The filters have two sections, of which only one section at any given time is allowed to adapt its coefficients. The switch between both is controlled by a speech detection function. The first section implements an adaptive look direction and cues in on the desired speech. This section only adapts when speech is present. The second section acts as a multichannel adaptive noise canceller. The obtained noise references are typically very bad; hence, adaptation must be restricted to silence-only periods. Several ideas were explored for the first section. The most robust solution, and the one with the best sound quality, was given by the simplest solution, i.e. a delay and sum beamformer that cues in on the direct path only and neglects all multipath contributions. Tests were performed with a four-microphone array in a highly reverberant room with both music and fan type noise as jammers, SNR improvements of 10 dB were typical with no audible distortion. >Switching adaptive filters, suitable for speech beamforming, with no prior knowledge about the speech source are presented. The filters have two sections, of which only one section at any given time is allowed to adapt its coefficients. The switch between both is controlled by a speech detection function. The first section implements an adaptive look direction and cues in on the desired speech. This section only adapts when speech is present. The second section acts as a multichannel adaptive noise canceller. The obtained noise references are typically very bad; hence, adaptation must be restricted to silence-only periods. Several ideas were explored for the first section. The most robust solution, and the one with the best sound quality, was given by the simplest solution, i.e. a delay and sum beamformer that cues in on the direct path only and neglects all multipath contributions. Tests were performed with a four-microphone array in a highly reverberant room with both music and fan type noise as jammers, SNR improvements of 10 dB were typical with no audible distortion.<<ETX>>
international conference on acoustics, speech, and signal processing | 1987
Amir Averbuch; Lalit R. Bahl; Raimo Bakis; Peter F. Brown; G. Daggett; Subhro Das; K. Davies; S. De Gennaro; P. V. de Souza; Edward A. Epstein; D. Fraleigh; Frederick Jelinek; Burn L. Lewis; Robert Leroy Mercer; J. Moorhead; Arthur Nádas; Deebitsudo Nahamoo; Michael Picheny; G. Shichman; P. Spinelli; D. Van Compernolle; H. Wilkens
The Speech Recognition Group at IBM Research in Yorktown Heights has developed a real-time, isolated-utterance speech recognizer for natural language based on the IBM Personal Computer AT and IBM Signal Processors. The system has recently been enhanced by expanding the vocabulary from 5,000 words to 20,000 words and by the addition of a speech workstation to support usability studies on document creation by voice. The system supports spelling and interactive personalization to augment the vocabularies. This paper describes the implementation, user interface, and comparative performance of the recognizer.
IEEE Transactions on Speech and Audio Processing | 1998
T. Claes; Ioannis Dologlou; L. ten Bosch; D. Van Compernolle
This paper proposes a method to transform acoustic models that have been trained with a certain group of speakers for use on different speech in hidden Markov model based (HMM-based) automatic speech recognition. Features are transformed on the basis of assumptions regarding the difference in vocal tract length between the groups of speakers. First, the vocal tract length (VTL) of these groups has been estimated based on the average third formant F/sub 3/. Second, the linear acoustic theory of speech production has been applied to warp the spectral characteristics of the existing models so as to match the incoming speech. The mapping is composed of subsequent nonlinear submappings. By locally linearizing it and comparing results in the output, a linear approximation for the exact mapping was obtained which is accurate as long as the warping is reasonably small. The feature vector, which is computed from a speech frame, consists of the mel scale cepstral coefficients (MFCC) along with delta and delta/sup 2/-cepstra as well as delta and delta/sup 2/ energy. The method has been tested for TI digits data base, containing adult and children speech, consisting of isolated digits and digit strings of different length. The word error rate when trained on adults and tested on children with transformed adult models is decreased by more than a factor of two compared to the nontransformed case.
IEEE Signal Processing Letters | 1994
P Le Cerf; D. Van Compernolle
Variable frame rate (VFR) analysis is a technique used in speech processing and recognition for discarding frames that are too much alike. The article introduces a new method for VFR. Instead of calculating the distance between frames, the norm of the derivative parameters is used in deciding to retain or to discard a frame, informal inspection of speech spectrograms shows that this new method puts more emphasis on the transient regions of the speech signal. Experimental results with a hidden Markov model (HMM) based system show that the new method outperforms the classical method. >
international conference on acoustics, speech, and signal processing | 1994
Fei Xie; D. Van Compernolle
In this paper we present a family of nonlinear spectral estimators for noise reduction which are approximated and implemented by a multilayer perceptron neural network. The estimators are approximations of the true minimum mean square error estimator in the logarithmic or a related perceptual domain. Training data for the neural networks is generated from relevant statistical speech and noise models. One single estimator network is generated for all frequency channels. Parameters describing both the noise and speech distribution are estimated on line and provided as extra inputs to the neural net. Including these parameters significantly improves performance over standard spectral estimators which are based on a global speech model and a noise model described by a single parameter, the noise mean.<<ETX>>
IEEE Transactions on Speech and Audio Processing | 1994
P Le Cerf; Weiye Ma; D. Van Compernolle
A novel combination of multilayer perceptrons (MLPs) and hidden Markov models (HMMs) is presented. Instead of using MLPs as probability generators for HMMs, the authors propose to use MLPs as labelers for discrete parameter HMMs. Compared with the probabilistic interpretation of MLPs, this gives them the advantage of flexibility in system design (e.g., the use of word models instead of phonetic models while using the same MLPs). Moreover, since they do not need to reach a global minimum, they can do with MLPs with fewer hidden nodes, which can be trained faster. In addition, they do not need to retrain the MLPs with segmentations generated by a Viterbi alignment. Compared with Euclidean labeling, their method has the advantages of needing fewer HMM parameters per state and obtaining a higher recognition accuracy. Several improvements of the baseline MLP labeling are investigated. When using one MLP, the best results are obtained when giving the labels a fuzzy interpretation. It is also possible to use parallel MLPs where each is based on a different parameter set (e.g., basic parameters, their time derivatives, and their second-order time derivatives). This strategy increases the recognition results considerably. A final improvement is the training of MLPs for subphoneme classification. >
international conference on acoustics, speech, and signal processing | 2011
Geoffrey Zweig; Patrick Nguyen; D. Van Compernolle; Kris Demuynck; L. Atlas; Pascal Clark; Gregory Sell; M. Wang; Fei Sha; Hynek Hermansky; Damianos Karakos; Aren Jansen; Samuel Thomas; S. Bowman; Justine T. Kao
This paper summarizes the 2010 CLSP Summer Workshop on speech recognition at Johns Hopkins University. The key theme of the workshop was to improve on state-of-the-art speech recognition systems by using Segmental Conditional Random Fields (SCRFs) to integrate multiple types of information. This approach uses a state-of-the-art baseline as a springboard from which to add a suite of novel features including ones derived from acoustic templates, deep neural net phoneme detections, duration models, modulation features, and whole word point-process models. The SCRF framework is able to appropriately weight these different information sources to produce significant gains on both the Broadcast News and Wall Street Journal tasks.
international conference on acoustics, speech, and signal processing | 1994
S Van Gerven; D. Van Compernolle
In this paper we describe a method to separate a scalar mixture of two signals. The method is based on the use of decorrelation as a signal separation criterion. It is proven analytically that decorrelating the output signals at different time lags is sufficient provided that the normalised autocorrelation functions of the source signals are sufficiently distinct. The method involves an iterative least-squares solution of a set of nonlinear equations. Alternatively, a gradient search algorithm can also be used to find the minimum of the sum of squares of these equations. Both time- and frequency-domain formulations are given. Some convergence and stability issues are discussed and a small example is given at the end.<<ETX>>
international conference on acoustics, speech, and signal processing | 1994
J Smolders; T. Claes; G. Sablon; D. Van Compernolle
One of the problems with speech recognition in the car is the position of the far talk microphone. This position not only implies more or less noise, coming from the car (engine, tires,...) or from other sources (traffic, wind noise,...) but also a different acoustical transfer function. In order to compare the microphone positions in the car, we recorded a multispeaker database in a car with 7 different positions and compared them on the basis of SNR and recognition rate. The position at the ceiling right in front of the speaker gave the best results.<<ETX>>