Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dirk Van Compernolle is active.

Publication


Featured researches published by Dirk Van Compernolle.


Computer Speech & Language | 1989

Noise adaptation in a hidden Markov model speech recognition system

Dirk Van Compernolle

Abstract Several ways for making the signal processing in an isolated word speech recognition system more robust against large variations in the background noise level are presented. Isolated word recognition systems are sensitive to accurate silence detection, and are easily overtrained on the specific noise circumstances of the training environment. Spectral subtraction provides good noise immunity in the cases where the noise level is lower or slightly higher in the testing environment than during training. Differences in residual noise energy after spectral subtraction between a clean training and noisy testing environment can still cause severe problems. The usability of spectral subtraction is largely increased if complemented with some extra noise immunity processing. This is achieved by the addition of artificial noise after spectral subtraction or by adaptively re-estimating the noise statistics during a training session. Both techniques are almost equally successful in dealing with the noise. Noise addition achieves the additional robustness that the system will never be allowed to learn about low amplitude events that might not be observable in all environments; this is achieved, however, at a cost that some information is consistently thrown away in the most favorable noise situations.


Speech Communication | 1990

Speech recognition in noisy environments with the aid of microphone arrays

Dirk Van Compernolle; Weiye Ma; Fei Xie; Marc Van Diest

Abstract This paper presents a microphone array adaptive beamformer with a dual function. The noise enhanced output is suited to transmission as well as to use as input to speech recognition systems. The areas of use envisaged are the car. the factory floor and noisy offices. The underlying structure is a steered Griffiths-Jim beamformer, with an added speech detection switch for the selective adaptation of both sections. This beamformer is effective in suppressing both stationary and non-stationary interference and is therefore a preprocessor for a wider range of speech recognition applications than any single channel noise suppression scheme could handle. Experiments were performed in a reverberant room with a 4-microphone array. Typical SNR improvements for communication purposes range from 4 to 12 dB. The effective SNR improvement for speech recognition purposes ranges from 4 to 8 dB.


Speech Communication | 2001

Recognizing speech of goats, wolves, sheep and...non-natives

Dirk Van Compernolle

Abstract This paper reviews the current understanding of acoustic–phonetic issues and the problems arising when trying to recognize speech from non-native speakers. Conceptually, regional accents are well modeled by systematic shifts in pronunciation. Therefore, simultaneous recognition of multiple regional variants may be performed by using multiple acoustic models in parallel, or by adding pronunciation variants in the dictionary. Recognition of non-native speech is much more difficult because it is influenced both by the native language of the speaker and non-native target language. It is characterized by a much greater speaker variability due to different levels of proficiency. A few language-pair specific transformation rules describing prototypical nativized pronunciations was found to be useful both in general speech recognition and in dedicated applications. However, due to the nature of the errors and the cross-language transformations, non-native speech recognition will remain inherently much harder. Moreover, the trend in speech recognition towards more detailed modeling seems to be counterproductive for the recognition of non-native speech and limits progress in this field.


IEEE Signal Processing Magazine | 2012

Exemplar-Based Processing for Speech Recognition: An Overview

Tara N. Sainath; Bhuvana Ramabhadran; David Nahamoo; Dimitri Kanevsky; Dirk Van Compernolle; Kris Demuynck; Jort F. Gemmeke; Jerome R. Bellegarda; Shiva Sundaram

Solving real-world classification and recognition problems requires a principled way of modeling the physical phenomena generating the observed data and the uncertainty in it. The uncertainty originates from the fact that many data generation aspects are influenced by nondirectly measurable variables or are too complex to model and hence are treated as random fluctuations. For example, in speech production, uncertainty could arise from vocal tract variations among different people or corruption by noise. The goal of modeling is to establish a generalization from the set of observed data such that accurate inference (classification, decision, recognition) can be made about the data yet to be observed, which we refer to as unseen data.


Speech Communication | 2000

An efficient search space representation for large vocabulary continuous speech recognition

Kris Demuynck; Jacques Duchateau; Dirk Van Compernolle; Patrick Wambacq

Abstract In pursuance of better performance, current speech recognition systems tend to use more and more complicated models for both the acoustic and the language component. Cross-word context dependent (CD) phone models and long-span statistical language models (LMs) are now widely used. In this paper, we present a memory-efficient search topology that enables the use of such detailed acoustic and language models in a one pass time-synchronous recognition system. Characteristic of our approach is (1) the decoupling of the two basic knowledge sources, namely pronunciation information and LM information, and (2) the representation of pronunciation information – the lexicon in terms of CD units – by means of a compact static network. The LM information is incorporated into the search at run-time by means of a slightly modified token-passing algorithm. The decoupling of the LM and lexicon allows great flexibility in the choice of LMs, while the static lexicon representation avoids the cost of dynamic tree expansion and facilitates the integration of additional pronunciation information such as assimilation rules. Moreover, the network representation results in a compact structure when words have various pronunciations, and due to its construction, it offers partial LM forwarding at no extra cost.


international conference on acoustics, speech, and signal processing | 1992

Signal separation in a symmetric adaptive noise canceler by output decorrelation

Dirk Van Compernolle; S Van Gerven

An algorithm for adaptive noise cancellation and signal separation is presented. It is an extension of the classical Widrow least-mean-square (LMS) noise canceler for the case of signal leakage into the noise reference. The algorithm is derived intuitively from the interpolation of the adaptive noise canceler as a decorrelator between signal estimate and noise in which the noise reference is replaced by a signal-free noise estimate. Thus, a symmetric adaptive decorrelator is obtained for signal separation, as the artificial distinction between signal and noise concepts has disappeared. The algorithm has its limitations as convergence to the desired solution and stability around it can only be guaranteed for a subclass of signal separation problems. These restrictions are rarely violated in real life problems, however, and seem to be fundamental to the signal separation problem rather than algorithm dependent.<<ETX>>


Speech Communication | 1998

Fast and accurate acoustic modelling with semi-continuous HMMs

Jacques Duchateau; Kris Demuynck; Dirk Van Compernolle

Abstract In this paper the design of accurate Semi-Continuous Density Hidden Markov Models (SC-HMMs) for acoustic modelling in large vocabulary continuous speech recognition is presented. Two methods are described to improve drastically the efficiency of the observation likelihood calculations for the SC-HMMs. First, reduced SC-HMMs are created, where each state does not share all the – gaussian – probability density functions ( pdfs ) but only those which are important for it. It is shown how the average number of gaussians per state can be reduced to 70 for a total set of 10u2008000 gaussians. Second, a novel scalar selection algorithm is presented reducing to 5% the number of gaussians which have to be calculated on the total set of 10u2008000, without any degradation in recognition performance. Furthermore, the concept of tied state context-dependent modelling with phonetic decision trees is adapted to SC-HMMs. In fact, a node splitting criterion appropriate for SC-HMMs is introduced: it is based on a distance measure between the mixtures of gaussian pdfs as involved in SC-HMM state modelling. This contrasts with other criteria from literature which are based on simplified pdfs to manage the algorithmic complexity. On the ARPA Resource Management task, a relative reduction in word error rate of 8% was achieved with the proposed criterion, comparing with two known criteria based on simplified pdfs .


ieee automatic speech recognition and understanding workshop | 2009

The ESAT 2008 system for N-Best Dutch speech recognition benchmark

Kris Demuynck; Antti Puurula; Dirk Van Compernolle; Patrick Wambacq

This paper describes the ESAT 2008 Broadcast News transcription system for the N-Best 2008 benchmark, developed in part for testing the recent SPRAAK Speech Recognition Toolkit. ESAT system was developed for the Southern Dutch Broadcast News subtask of N-Best using standard methods of modern speech recognition. A combination of improvements were made in commonly overlooked areas such as text normalization, pronunciation modeling, lexicon selection and morphological modeling, virtually solving the out-of-vocabulary (OOV) problem for Dutch by reducing OOV-rate to 0.06% on the N-Best development data and 0.23% on the evaluation data. Recognition experiments were run with several configurations comparing one-pass vs. two-pass decoding, high-order vs. low-order n-gram models, lexicon sizes and different types of morphological modeling. The system achieved 7.23% word error rate (WER) on the broadcast news development data and 20.3% on the much more difficult evaluation data of N-Best.


Speech Communication | 2010

Feature subset selection for improved native accent identification

Tingyao Wu; Jacques Duchateau; Jean-Pierre Martens; Dirk Van Compernolle

In this paper, we develop methods to identify accents of native speakers. Accent identification differs from other speaker classification tasks because accents may differ in a limited number of phonemes only and moreover the differences can be quite subtle. In this paper, it is shown that in such cases it is essential to select a small subset of discriminative features that can be reliably estimated and at the same time discard non-discriminative and noisy features. For identification purposes a speaker is modeled by a supervector containing the mean values for the features for all phonemes. Initial accent models are obtained as class means from the speaker supervectors. Then feature subset selection is performed by applying either ANOVA (analysis of variance), LDA (linear discriminant analysis), SVM-RFE (support vector machine-recursive feature elimination), or their hybrids, resulting in a reduced dimensionality of the speaker vector and more importantly a significantly enhanced recognition performance. We also compare the performance of GMM, LDA and SVM as classifiers on a full or a reduced feature subset. The methods are tested on a Flemish read speech database with speakers classified in five regions. The difficulty of the task is confirmed by a human listening experiment. We show that a relative improvement of more than 20% in accent recognition rate can be achieved with feature subset selection irrespective of the choice of classifier. We finally show that the construction of speaker-based supervectors significantly enhances results over a reference GMM system that uses the raw feature vectors directly as input, both in text dependent and independent conditions.


international conference on acoustics, speech, and signal processing | 2011

Integrating meta-information into exemplar-based speech recognition with segmental conditional random fields

Kris Demuynck; Dino Seppi; Dirk Van Compernolle; Patrick Nguyen; Geoffrey Zweig

Exemplar based recognition systems are characterized by the fact that, instead of abstracting large amounts of data into compact models, they store the observed data enriched with some annotations and infer on-the-fly from the data by finding those exemplars that resemble the input speech best. One advantage of exemplar based systems is that next to deriving what the current phone or word is, one can easily derive a wealth of meta-information concerning the chunk of audio under investigation. In this work we harvest meta-information from the set of best matching exemplars, that is thought to be relevant for the recognition such as word boundary predictions and speaker entropy. Integrating this meta-information into the recognition framework using segmental conditional random fields, reduced the WER of the exemplar based system on the WSJ Nov92 20k task from 8.2% to 7.6%. Adding the HMM-score and multiple HMM phone detectors as features further reduced the error rate to 6.6%.

Collaboration


Dive into the Dirk Van Compernolle's collaboration.

Top Co-Authors

Avatar

Kris Demuynck

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Jacques Duchateau

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Patrick Wambacq

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

S Van Gerven

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Reza Sahraeian

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Hugo Van hamme

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dino Seppi

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Jort F. Gemmeke

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Mathias De Wachter

Katholieke Universiteit Leuven

View shared research outputs
Researchain Logo
Decentralizing Knowledge