L.W.J. Boves | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where L.W.J. Boves is active.

Explore More

Publication

Featured researches published by L.W.J. Boves.

Journal of the Acoustical Society of America | 1985

Pressure measurements during speech production using semiconductor miniature pressure transducers: Impact on models for speech production

Bert Cranen; L.W.J. Boves

It appears that temperature instabilities are a major obstacle hindering the use of semiconductor strain gauge pressure transducers in speech research, especially when absolute pressure data are mandatory. In this paper a simple and reliable method for an in vivo calibration of this kind of transducer is described. The most important error source, the drift of the zero pressure level due to temperature changes, is discussed, and an estimation of the measurement accuracy which can be obtained is given. Moreover, some registrations of subglottal, supraglottal, and transglottal pressure are presented. It is shown that the pressure recordings allow us to obtain estimates of the volume flow in the trachea and pharynx. Analysis of those waveforms appears to lead to new insights into the physical processes underlying voice production. Specifically, an independent glottal contribution to the skewing of the glottal flow pulses is identified.

Journal of Fluency Disorders | 1992

Perceptual evaluation of the speech before and after fluency shaping stuttering therapy

Marie-Christine Franken; L.W.J. Boves; Herman F.M. Peters; Ronald L. Webster

Abstract An often-cited criterion for assessing the effect of a stuttering therapy is the ability of the stutterers to produce normally fluent speech. Many modern stuttering therapies use special techniques that may produce stutter-free speech that does not sound completely normal. The present study investigates this problem in the framework of the Dutch adaptation of the Precision Fluency Shaping Program. Pre-, post-, and 1 2 -year follow-up therapy speech samples of 32 severe stutterers who were treated in a four-week intensive therapy are compared with comparable samples of 20 nonstutterers. For that aim the samples were rated on 14 bipolar scales by groups of about 20 listeners. The results show that the speech of the stutterers in all three conditions differs significantly from the speech of the nonstutterers. The pretherapy speech takes an extreme position on a Distorted Speech dimension, due to the large proportion of disfluencies. The posttherapy speech has extremely low scores on a Dynamics/Prosody dimension, a`1 while the follow-up therapy speech differs from the normal speech on both dimensions, but now the distances are smaller. These results are discussed in relation to the severity of the stuttering problem in the group of treated stutterers. Finally, implications for future research on therapy evaluation are discussed.

ieee international conference on cognitive informatics | 2007

ACORNS - towards computational modeling of communication and recognition skills

L.W.J. Boves; L.F.M. ten Bosch; Roger K. Moore

In this paper the FP6 Future and Emerging Technologies project ACORNS is introduced. This project aims at simulating embodied language learning, inspired by the memory-prediction theory of intelligence. ACORNS intends to build a full computational implementation of sensory information processing. ACORNS considers linguistic units as emergent patterns. Thus, the research will not only address the issues conventionally investigated in statistical pattern recognition, but also the representations that are formed in memory. The paper discusses details of the memory and processing architecture that will be implemented in ACORNS, and explains how this architecture merges the basic concepts of the Memory-Prediction theory with results form previous research in the field of memory.

International Journal of Speech Technology | 1998

Evaluation of the Dutch train timetable information system developed in the ARISE project

A.A. Sanderman; Janienke Sturm; E.A. den Os; L.W.J. Boves; A.H.M. Cremers

In this paper we describe the evaluation of a version of a train time table information system that combines explicit verification with mixed-initiative dialogue control in the first part of the interaction. In the second part of the interaction callers were given more freedom in negotiation and navigation. The evaluation is based on the responses of 68 subjects who called the service from their homes and completed a questionnaire; plus ten subjects who performed the same tasks in the laboratory. All subjects carried out three scenarios, of increasing complexity. It appeared that the explicit verification does not add more turn to the dialogue compared to implicit verification. Subjects found it difficult to deal with the open questions in the second part of an interaction, that were meant to facilitate navigation.

Journal of the Acoustical Society of America | 2004

Evaluation of formant-like features on an automatic vowel classification task

F. de Wet; Katrin Weber; L.W.J. Boves; Bert Cranen; Samy Bengio

Numerous attempts have been made to find low-dimensional, formant-related representations of speech signals that are suitable for automatic speech recognition. However, it is often not known how these features behave in comparison with true formants. The purpose of this study was to compare two sets of automatically extracted formant-like features, i.e., robust formants and HMM2 features, to hand-labeled formants. The robust formant features were derived by means of the split Levinson algorithm while the HMM2 features correspond to the frequency segmentation of speech signals obtained by two-dimensional hidden Markov models. Mel-frequency cepstral coefficients (MFCCs) were also included in the investigation as an example of state-of-the-art automatic speech recognition features. The feature sets were compared in terms of their performance on a vowel classification task. The speech data and hand-labeled formants that were used in this study are a subset of the American English vowels database presented in Hillenbrand et al. [J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. Classification performance was measured on the original, clean data and in noisy acoustic conditions. When using clean data, the classification performance of the formant-like features compared very well to the performance of the hand-labeled formants in a gender-dependent experiment, but was inferior to the hand-labeled formants in a gender-independent experiment. The results that were obtained in noisy acoustic conditions indicated that the formant-like features used in this study are not inherently noise robust. For clean and noisy data as well as for the gender-dependent and gender-independent experiments the MFCCs achieved the same or superior results as the formant features, but at the price of a much higher feature dimensionality.

Journal of Phonetics | 1995

Downtrend in F0 and Psb

Helmer Strik; L.W.J. Boves

Abstract In the present paper we examine the simultaneous downtrend in fundamental frequency and subglottal pressure that is often observed for running speech. In particular, we will test the hypothesis that the downtrend in fundamental frequency is caused by a gradual decrease in subglottal pressure during the course of an utterance. In the literature, various ways to model the downtrend in fundamental frequency have been proposed. Our conclusion is that whether the hypothesis stated above is true depends on the model of downtrend adopted.

International Journal of Speech Technology | 2001

Annotation in the SpeechDat Projects

Henk van den Heuvel; L.W.J. Boves; Asunción Moreno; Maurizio Omologo; Gaël Richard; Eric Sanders

A large set of spoken language resources (SLR) for various European languages is being compiled in several SpeechDat projects with the aim to train and test speech recognizers for voice driven services, mainly over telephone lines. This paper is focused on the annotation conventions applied for the Speechdat SLR. These SLR contain typical examples of short monologue speech utterances with simple orthographic transcriptions in a hierarchically simple annotation structure. The annotation conventions and their underlying principles are described and compared to approaches used for related SLR. The synchronization of the orthographic transcriptions with the corresponding speech files is addressed, and the impact of the selected approach for capturing specific phonological and phonetic phenomena is discussed. In the SpeechDat projects a number of tools have been developed to carry out the transcription of the speech. In this paper, a short description of these tools and their properties is provided. For all SpeechDat projects, an internal validity check of the databases and their annotations is carried out. The procedure of this validation campaign, the performed evaluations, and some of the results are presented.

Speech Communication | 1998

Channel normalization techniques for automatic speech recognition over the telephone

Johan de Veth; L.W.J. Boves

Abstract In this paper we aim to identify the underlying causes that can explain the performance of different channel normalization techniques. To this aim we compared four different channel normalization techniques within the context of connected digit recognition over telephone lines: cepstrum mean subtraction, the dynamic cepstrum representation, RASTA filtering and phase-corrected RASTA. We used context-dependent and context-independent hidden Markov models that were trained using a wide range of different model complexities. The results of our recognition experiments indicate that each channel normalization technique should preserve the modulation frequencies in the range between 2 and 16 Hz in the spectrum of the speech signals. At the same time, DC components in the modulation spectrum should be effectively removed. With context-independent models the channel normalization filter should have a flat phase response. Finally, for our connected digit recognition task it appeared that cepstrum mean subtraction and phase-corrected RASTA performed equally well for context-dependent and context-independent models when equal amounts of model parameters were used.

international conference on spoken language processing | 1996

Comparison of channel normalisation techniques for automatic speech recognition over the phone

J.M. de Veth; L.W.J. Boves

We compared three different channel normalisation (CN) methods in the context of a connected digit recognition task over the phone: cepstrum mean substraction (CMS), RASTA filtering and the Gaussian dynamic cepstrum representation (GDCR). Using a small set of context independent (CI) continuous Gaussian mixture hidden Markov models (HMMs), we found that CMS and RASTA outperformed the GDCR technique. We show that the main cause for the superiority of CMS compared to RASTA is the phase distortion introduced by the RASTA filter. Recognition results for a phase corrected RASTA technique are identical to those of CMS. Our results indicate that an ideal cepstrum based CN method should: (1) effectively remove the DC component; (2) at least preserve modulation frequencies in the range 2-16 Hz; and (3) introduce no phase distortion in case CI HMMs are used for recognition.

Journal of the Acoustical Society of America | 1988

On the measurement of glottal flow

Bert Cranen; L.W.J. Boves

For developing a comprehensive description of voiced speech sounds in terms of a phonation and an articulation component, it is necessary to know to what extent the volume flow modulations at the entrance of the vocal tract are due to vocal fold motions and to what extent they are due to variations in the transglottal pressure. In order to be able to study this problem, it is important that the flow at the glottis can be measured during normal speech production in a reliable fashion. In this article, a flow measurement technique is described that differs from the more usual inverse filtering approach to the extent that the flow is not measured at the mouth, but much closer to the glottis. The technique is based on the measurement of pressure gradient. It is shown that the proposed method also leads to an inverse filtering problem, but that, since this problem is much simpler, the gradient method yields more reliable estimates of the shape of the glottal flow waveform, though without the zero flow level (dc component) and without a magnitude scale. By means of theoretical considerations about velocity profiles in pulsatile flow in cylindrical tubes, it is shown that the method for measuring flow during phonation proposed in this article may be expected to yield reasonable flow waveform estimates in a frequency region from any normal fundamental frequency to an upper frequency determined by the transducer sensitivity and separation and vocal tract geometry. In this case, the frequency limitation was estimated to be 1000 Hz.(ABSTRACT TRUNCATED AT 250 WORDS)

Explore More