Bert Cranen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bert Cranen is active.

Explore More

Publication

Featured researches published by Bert Cranen.

Journal of the Acoustical Society of America | 1985

Pressure measurements during speech production using semiconductor miniature pressure transducers: Impact on models for speech production

Bert Cranen; L.W.J. Boves

It appears that temperature instabilities are a major obstacle hindering the use of semiconductor strain gauge pressure transducers in speech research, especially when absolute pressure data are mandatory. In this paper a simple and reliable method for an in vivo calibration of this kind of transducer is described. The most important error source, the drift of the zero pressure level due to temperature changes, is discussed, and an estimation of the measurement accuracy which can be obtained is given. Moreover, some registrations of subglottal, supraglottal, and transglottal pressure are presented. It is shown that the pressure recordings allow us to obtain estimates of the volume flow in the trachea and pharynx. Analysis of those waveforms appears to lead to new insights into the physical processes underlying voice production. Specifically, an independent glottal contribution to the skewing of the glottal flow pulses is identified.

Journal of Phonetics | 1995

Modeling a leaky glottis

Bert Cranen; Juergen Schroeter

Abstract The fact that the oral flow of both males and females normallycontains an appreciable do component during the “closed glottis” interval of vowel sounds produced at normal loudness levels indicates that glottal leakage is a very common phenomenon. In this paper the acoustic consequences of glottal leakage are studied by means of a computer simulation. The effects of two different types of leaks were studied, i.e., of (a) a linked leak: an opening (at least partly) situated in the membranous glottis and caused by abduction, and (b) a parallel chink: a leak that can be viewed as an opening which is essentially separated from (parallel to) the time-varying part of the glottis. The results of our simulations show that a moderate leak may give rise to appreciable source-tract interaction which becomes most apparent for a parallel chink. In the time domain it manifests itself as a ripple in the glottal flow waveform just after closure. In the frequency domain, the spectrum of the flow through a glottis with a leak (both linked and parallel) is characterized by zeros at the formant frequencies. The major difference in spectral effects of a linked leak and a parallel chink is the spectral slope. For a parallel chink it is of the same order of magnitude as in the no leakage case or even slightly flatter. In the case of a linked leak, the spectral slope falls off much more rapidly. These findings suggest that the amount of do flow alone is not a very good measure to use for voice efficiency measures.

Hearing Research | 1981

The Phonochrome — A Coherent Spectro-Temporal Representation of Sound

P. I. M. Johannesma; Ad Aertsen; Bert Cranen; Leon J.Th.O. van Erning

Representation of simple stationary sounds can be given either in the temporal form by display of the waveform as function of time or in the spectral form by intensity and phase as function of frequency. For complex nonstationary sounds, e.g. animal vocalisations and human speech, a combined spectro-temporal representation is more directly associated with auditory perception. The well-known sonogram or dynamic power spectrum has a fixed spectro-temporal resolution and neglects phase relations of different spectral and temporal sound components. In this paper the complex spectro-temporal intensity density CoSTID) is presented as a coherent spectro-temporal image of a sound, based on the analytic signal representation. The CoSTID allows an arbitrary form of the spectro-temporal resolution and preserves phase relations of different sound components. Since the CoSTID is a complex function of two variables, it leads naturally to the use of colour images for the spectro-temporal representation of sound: the phonochrome. The phonochromes are shown for different technical and natural sounds. Applications of this technique for study of phonation and audition and for biomedical signal processing are indicated.

Journal of the Acoustical Society of America | 2004

Evaluation of formant-like features on an automatic vowel classification task

F. de Wet; Katrin Weber; L.W.J. Boves; Bert Cranen; Samy Bengio

Numerous attempts have been made to find low-dimensional, formant-related representations of speech signals that are suitable for automatic speech recognition. However, it is often not known how these features behave in comparison with true formants. The purpose of this study was to compare two sets of automatically extracted formant-like features, i.e., robust formants and HMM2 features, to hand-labeled formants. The robust formant features were derived by means of the split Levinson algorithm while the HMM2 features correspond to the frequency segmentation of speech signals obtained by two-dimensional hidden Markov models. Mel-frequency cepstral coefficients (MFCCs) were also included in the investigation as an example of state-of-the-art automatic speech recognition features. The feature sets were compared in terms of their performance on a vowel classification task. The speech data and hand-labeled formants that were used in this study are a subset of the American English vowels database presented in Hillenbrand et al. [J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. Classification performance was measured on the original, clean data and in noisy acoustic conditions. When using clean data, the classification performance of the formant-like features compared very well to the performance of the hand-labeled formants in a gender-dependent experiment, but was inferior to the hand-labeled formants in a gender-independent experiment. The results that were obtained in noisy acoustic conditions indicated that the formant-like features used in this study are not inherently noise robust. For clean and noisy data as well as for the gender-dependent and gender-independent experiments the MFCCs achieved the same or superior results as the formant features, but at the price of a much higher feature dimensionality.

Speech Communication | 1996

Physiologically motivated modelling of the voice source in articulatory analysis/synthesis

Bert Cranen; Juergen Schroeter

Abstract This paper describes the implementation of a new parametric model of the glottal geometry aimed at improving male and female speech synthesis in the framework of articulatory analysis synthesis. The model represents glottal geometry in terms of inlet and outlet area waveforms and is controlled by parameters that are tightly coupled to physiology, such as vocal fold abduction. It is embedded in an articulatory analysis synthesis system (articulatory speech mimic). To introduce naturally occurring details in our synthetic glottal flow waveforms, we modelled two different kinds of leakage: a “linked leak” and a “parallel chink”. While the first is basically an incomplete glottal closure, the latter models a second glottal duct that is independent of the membranous (vibrating) part of the glottis. Characteristic for both types of leaks is that they increase de-flow and source/tract interaction. A linked leak, however, gives rise to a steeper roll-off of the entire glottal flow spectrum, whereas a parallel chink decreases the energy of the lower frequencies more than the higher frequencies. In fact, for a parallel chink, the slope at the higher freqencies is more or less the same as in the no-leakage case.

Journal of the Acoustical Society of America | 1988

On the measurement of glottal flow

Bert Cranen; L.W.J. Boves

For developing a comprehensive description of voiced speech sounds in terms of a phonation and an articulation component, it is necessary to know to what extent the volume flow modulations at the entrance of the vocal tract are due to vocal fold motions and to what extent they are due to variations in the transglottal pressure. In order to be able to study this problem, it is important that the flow at the glottis can be measured during normal speech production in a reliable fashion. In this article, a flow measurement technique is described that differs from the more usual inverse filtering approach to the extent that the flow is not measured at the mouth, but much closer to the glottis. The technique is based on the measurement of pressure gradient. It is shown that the proposed method also leads to an inverse filtering problem, but that, since this problem is much simpler, the gradient method yields more reliable estimates of the shape of the glottal flow waveform, though without the zero flow level (dc component) and without a magnitude scale. By means of theoretical considerations about velocity profiles in pulsatile flow in cylindrical tubes, it is shown that the method for measuring flow during phonation proposed in this article may be expected to yield reasonable flow waveform estimates in a frequency region from any normal fundamental frequency to an upper frequency determined by the transducer sensitivity and separation and vocal tract geometry. In this case, the frequency limitation was estimated to be 1000 Hz.(ABSTRACT TRUNCATED AT 250 WORDS)

Speech Communication | 2001

Acoustic backing-off as an implementation of missing feature theory

Johan de Veth; Bert Cranen; L.W.J. Boves

Abstract In this paper, we discuss acoustic backing-off as a method to improve automatic speech recognition robustness. Acoustic backing-off aims to achieve the same objective as the marginalization approach of missing feature theory: the detrimental influence of outlier values is effectively removed from the local distance computation in the Viterbi algorithm. The proposed method is based on one of the principles of robust statistical pattern matching: during recognition the local distance function (LDF) is modeled using a mixture of the distribution observed during training and a distribution describing observations not previously seen. In order to assess the effectiveness of the new method, we used artificial distortions of the acoustic vectors in connected digit recognition over telephone lines. We found that acoustic backing-off is capable of restoring recognition performance almost to the level observed for the undisturbed features, even in cases where a conventional LDF completely fails. These results show that recognition robustness can be improved using a marginalization approach, where making the distinction between reliable and corrupted feature values is wired into the recognition process. In addition, the results show that application of acoustic backing-off is not limited to feature representations based on filter bank outputs. Finally, the results indicate that acoustic backing-off is much less effective when local distortions are smeared over all vector elements. Therefore, the acoustic pre-processing steps should be chosen with care, so that the dispersion of distortions over all acoustic vector elements as a result of within-vector feature transformations is minimal.

Speech Communication | 1996

Speaker variability in the coarticulation of /a,i,u/

H. van den Heuvel; Bert Cranen; Toni Rietveld

Abstract Speaker variability in the coarticulation of the vowels /a,i,u/ was investigated in / C 1 ,VC 2 ∈/ pseudo-words, containing the consonants /p,t,k,d,s,m,n,r/. These words were read out in isolation by fifteen male speakers of Dutch. The formants F 1–3 (in Bark) were extracted from the steady-state of each vowel /a,i,u/. Coarticulation in each of 1200 realisations per vowel was measured in F 1–3 as a function of consonantal context, using a score-model based measure called COART. The largest amount of coarticulation was found in /u/ where nasals and alveolars in C 1 -position had the largest effect on the formant positions, especially on F 2 . Coarticulation in /a,u/ proved to be speaker-specific. For these vowels the speaker variability of COART in a context was larger, generally, if COART itself was larger. Studied in a speaker identification task, finally, COART improved identification results only when three conditions were combined: (a) if COART was used as an additional parameter to F 1–3 ; (b) if the COART-values for the vowel were high; (c) if all vowel contexts were pooled in the analysis. The two main conclusions from this study are that coarticulation cannot be investigated speaker-independently and that COART can be contributive to speaker identification, but only in very restricted conditions.

annual meeting of the special interest group on discourse and dialogue | 2001

Adding extra input/output modalities to a spoken dialogue system

Janienke Sturm; Fusi Wang; Bert Cranen

This paper describes a prototype of a multimodal railway information system that was built by extending an existing speech-only system. The purpose of the extensions is to alleviate a number of shortcomings of speech-only interfaces.

international conference on acoustics, speech, and signal processing | 1990

Extraction of control parameters for the voice source in a text-to-speech system

J. de Veth; Bert Cranen; H. Strik; L.W.J. Boves

In order to derive voice source control rules from natural speech, the parameters of a source model must be derived from the acoustic signal. This is done by parameterizing the results of glottal inverse filtering. A number of different inverse filtering procedures and the ease with which their results can be parameterized are compared. It is shown that closed glottic interval covariance linear predictive coding is as powerful as more sophisticated techniques, because it is the only known method that can strictly be limited to the closed glottis interval.<<ETX>>

Explore More