D. Van Compernolle | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where D. Van Compernolle is active.

Explore More

Publication

Featured researches published by D. Van Compernolle.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

Template-Based Continuous Speech Recognition

M. De Wachter; Mike Matton; Kris Demuynck; Patrick Wambacq; Ronald Cools; D. Van Compernolle

Despite their known weaknesses, hidden Markov models (HMMs) have been the dominant technique for acoustic modeling in speech recognition for over two decades. Still, the advances in the HMM framework have not solved its key problems: it discards information about time dependencies and is prone to overgeneralization. In this paper, we attempt to overcome these problems by relying on straightforward template matching. The basis for the recognizer is the well-known DTW algorithm. However, classical DTW continuous speech recognition results in an explosion of the search space. The traditional top-down search is therefore complemented with a data-driven selection of candidates for DTW alignment. We also extend the DTW framework with a flexible subword unit mechanism and a class sensitive distance measure-two components suggested by state-of-the-art HMM systems. The added flexibility of the unit selection in the template-based framework leads to new approaches to speaker and environment adaptation. The template matching system reaches a performance somewhat worse than the best published HMM results for the Resource Management benchmark, but thanks to complementarity of errors between the HMM and DTW systems, the combination of both leads to a decrease in word error rate with 17% compared to the HMM results

international conference on acoustics, speech, and signal processing | 1990

Switching adaptive filters for enhancing noisy and reverberant speech from microphone array recordings

D. Van Compernolle

Switching adaptive filters, suitable for speech beamforming, with no prior knowledge about the speech source are presented. The filters have two sections, of which only one section at any given time is allowed to adapt its coefficients. The switch between both is controlled by a speech detection function. The first section implements an adaptive look direction and cues in on the desired speech. This section only adapts when speech is present. The second section acts as a multichannel adaptive noise canceller. The obtained noise references are typically very bad; hence, adaptation must be restricted to silence-only periods. Several ideas were explored for the first section. The most robust solution, and the one with the best sound quality, was given by the simplest solution, i.e. a delay and sum beamformer that cues in on the direct path only and neglects all multipath contributions. Tests were performed with a four-microphone array in a highly reverberant room with both music and fan type noise as jammers, SNR improvements of 10 dB were typical with no audible distortion. >Switching adaptive filters, suitable for speech beamforming, with no prior knowledge about the speech source are presented. The filters have two sections, of which only one section at any given time is allowed to adapt its coefficients. The switch between both is controlled by a speech detection function. The first section implements an adaptive look direction and cues in on the desired speech. This section only adapts when speech is present. The second section acts as a multichannel adaptive noise canceller. The obtained noise references are typically very bad; hence, adaptation must be restricted to silence-only periods. Several ideas were explored for the first section. The most robust solution, and the one with the best sound quality, was given by the simplest solution, i.e. a delay and sum beamformer that cues in on the direct path only and neglects all multipath contributions. Tests were performed with a four-microphone array in a highly reverberant room with both music and fan type noise as jammers, SNR improvements of 10 dB were typical with no audible distortion.<<ETX>>

international conference on acoustics, speech, and signal processing | 1987

Experiments with the Tangora 20,000 word speech recognizer

Amir Averbuch; Lalit R. Bahl; Raimo Bakis; Peter F. Brown; G. Daggett; Subhro Das; K. Davies; S. De Gennaro; P. V. de Souza; Edward A. Epstein; D. Fraleigh; Frederick Jelinek; Burn L. Lewis; Robert Leroy Mercer; J. Moorhead; Arthur Nádas; Deebitsudo Nahamoo; Michael Picheny; G. Shichman; P. Spinelli; D. Van Compernolle; H. Wilkens

The Speech Recognition Group at IBM Research in Yorktown Heights has developed a real-time, isolated-utterance speech recognizer for natural language based on the IBM Personal Computer AT and IBM Signal Processors. The system has recently been enhanced by expanding the vocabulary from 5,000 words to 20,000 words and by the addition of a speech workstation to support usability studies on document creation by voice. The system supports spelling and interactive personalization to augment the vocabularies. This paper describes the implementation, user interface, and comparative performance of the recognizer.

IEEE Transactions on Speech and Audio Processing | 1998

A novel feature transformation for vocal tract length normalization in automatic speech recognition

T. Claes; Ioannis Dologlou; L. ten Bosch; D. Van Compernolle

This paper proposes a method to transform acoustic models that have been trained with a certain group of speakers for use on different speech in hidden Markov model based (HMM-based) automatic speech recognition. Features are transformed on the basis of assumptions regarding the difference in vocal tract length between the groups of speakers. First, the vocal tract length (VTL) of these groups has been estimated based on the average third formant F/sub 3/. Second, the linear acoustic theory of speech production has been applied to warp the spectral characteristics of the existing models so as to match the incoming speech. The mapping is composed of subsequent nonlinear submappings. By locally linearizing it and comparing results in the output, a linear approximation for the exact mapping was obtained which is accurate as long as the warping is reasonably small. The feature vector, which is computed from a speech frame, consists of the mel scale cepstral coefficients (MFCC) along with delta and delta/sup 2/-cepstra as well as delta and delta/sup 2/ energy. The method has been tested for TI digits data base, containing adult and children speech, consisting of isolated digits and digit strings of different length. The word error rate when trained on adults and tested on children with transformed adult models is decreased by more than a factor of two compared to the nontransformed case.

IEEE Signal Processing Letters | 1994

A new variable frame analysis method for speech recognition

P Le Cerf; D. Van Compernolle

Variable frame rate (VFR) analysis is a technique used in speech processing and recognition for discarding frames that are too much alike. The article introduces a new method for VFR. Instead of calculating the distance between frames, the norm of the derivative parameters is used in deciding to retain or to discard a frame, informal inspection of speech spectrograms shows that this new method puts more emphasis on the transient regions of the speech signal. Experimental results with a hidden Markov model (HMM) based system show that the new method outperforms the classical method. >

international conference on acoustics, speech, and signal processing | 1994

A family of MLP based nonlinear spectral estimators for noise reduction

Fei Xie; D. Van Compernolle

In this paper we present a family of nonlinear spectral estimators for noise reduction which are approximated and implemented by a multilayer perceptron neural network. The estimators are approximations of the true minimum mean square error estimator in the logarithmic or a related perceptual domain. Training data for the neural networks is generated from relevant statistical speech and noise models. One single estimator network is generated for all frequency channels. Parameters describing both the noise and speech distribution are estimated on line and provided as extra inputs to the neural net. Including these parameters significantly improves performance over standard spectral estimators which are based on a global speech model and a noise model described by a single parameter, the noise mean.<<ETX>>

IEEE Transactions on Speech and Audio Processing | 1994

Multilayer perceptrons as labelers for hidden Markov models

P Le Cerf; Weiye Ma; D. Van Compernolle

A novel combination of multilayer perceptrons (MLPs) and hidden Markov models (HMMs) is presented. Instead of using MLPs as probability generators for HMMs, the authors propose to use MLPs as labelers for discrete parameter HMMs. Compared with the probabilistic interpretation of MLPs, this gives them the advantage of flexibility in system design (e.g., the use of word models instead of phonetic models while using the same MLPs). Moreover, since they do not need to reach a global minimum, they can do with MLPs with fewer hidden nodes, which can be trained faster. In addition, they do not need to retrain the MLPs with segmentations generated by a Viterbi alignment. Compared with Euclidean labeling, their method has the advantages of needing fewer HMM parameters per state and obtaining a higher recognition accuracy. Several improvements of the baseline MLP labeling are investigated. When using one MLP, the best results are obtained when giving the labels a fuzzy interpretation. It is also possible to use parallel MLPs where each is based on a different parameter set (e.g., basic parameters, their time derivatives, and their second-order time derivatives). This strategy increases the recognition results considerably. A final improvement is the training of MLPs for subphoneme classification. >

international conference on acoustics, speech, and signal processing | 2011

Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop

Geoffrey Zweig; Patrick Nguyen; D. Van Compernolle; Kris Demuynck; L. Atlas; Pascal Clark; Gregory Sell; M. Wang; Fei Sha; Hynek Hermansky; Damianos Karakos; Aren Jansen; Samuel Thomas; S. Bowman; Justine T. Kao

This paper summarizes the 2010 CLSP Summer Workshop on speech recognition at Johns Hopkins University. The key theme of the workshop was to improve on state-of-the-art speech recognition systems by using Segmental Conditional Random Fields (SCRFs) to integrate multiple types of information. This approach uses a state-of-the-art baseline as a springboard from which to add a suite of novel features including ones derived from acoustic templates, deep neural net phoneme detections, duration models, modulation features, and whole word point-process models. The SCRF framework is able to appropriately weight these different information sources to produce significant gains on both the Broadcast News and Wall Street Journal tasks.

international conference on acoustics, speech, and signal processing | 1994

On the use of decorrelation in scalar signal separation

S Van Gerven; D. Van Compernolle

In this paper we describe a method to separate a scalar mixture of two signals. The method is based on the use of decorrelation as a signal separation criterion. It is proven analytically that decorrelating the output signals at different time lags is sufficient provided that the normalised autocorrelation functions of the source signals are sufficiently distinct. The method involves an iterative least-squares solution of a set of nonlinear equations. Alternatively, a gradient search algorithm can also be used to find the minimum of the sum of squares of these equations. Both time- and frequency-domain formulations are given. Some convergence and stability issues are discussed and a small example is given at the end.<<ETX>>

international conference on acoustics, speech, and signal processing | 1994

On the importance of the microphone position for speech recognition in the car

J Smolders; T. Claes; G. Sablon; D. Van Compernolle

One of the problems with speech recognition in the car is the position of the far talk microphone. This position not only implies more or less noise, coming from the car (engine, tires,...) or from other sources (traffic, wind noise,...) but also a different acoustical transfer function. In order to compare the microphone positions in the car, we recorded a multispeaker database in a car with 7 different positions and compared them on the basis of SNR and recognition rate. The position at the ceiling right in front of the speaker gave the best results.<<ETX>>

Explore More