L.C.W. Pols
University of Amsterdam
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by L.C.W. Pols.
Speech Communication | 1999
R.J.J.H. van Son; L.C.W. Pols
Abstract The acoustic consequences of the articulatory reduction of consonants remain largely unknown. Much more is known about acoustic vowel reduction. Whether the acoustical and perceptual consequences of articulatory consonant reduction are comparable in kind and extent to the consequences of vowel reduction is still an open question. In this study we compare acoustic data for 791 VCV realizations, containing 17 Dutch intervocalic consonants and 13 vowels, extracted from read speech from a single male speaker, to otherwise identical segments isolated from spontaneous speech. Five acoustic correlates of reduction were studied. Acoustic tracers of articulation were based on F 2 slope differences and locus equations. Speech effort was assessed by measuring duration, spectral balance, and the intervocalic sound energy difference of consonants. On a global level, it shows that consonants reduce acoustically like vowels on all investigated accounts when the speaking style becomes informal or syllables become unstressed. Methods that are sensitive to speech effort proved to be more reliable indicators of reduction than F 2 based measures. On a more detailed level there are differences related to the type of consonant. The acoustic results suggest that articulatory reduction will decrease the intelligibility of consonants and vowels in comparable ways.
Journal of the Acoustical Society of America | 1990
R.J.J.H. van Son; L.C.W. Pols
Speaking rate is thought to affect the spectral features of vowels. Target‐undershoot models of vowel production predict more spectral reduction and coarticulation of vowels in fast‐rate speech than in normal‐rate speech. To test this prediction, a meaningful Dutch text of about 850 words was read twice by an experienced newscaster, once at a normal speaking rate and once as fast as possible. All realizations of seven different vowels and some realizations of the schwa (/E/) were isolated. The first and second formant frequency values of all realizations were measured at five different points, each time by making cross sections at different points in the vowel realization. The different selections of these points are based on procedures used in literature, such as maximal F1 or mean formant value. No spectral vowel reduction was found that could be attributed to a faster speaking rate, neither was a change in coarticulation found. The only systematic effect was a higher F1 value in fast‐rate speech irresp...
Journal of the Acoustical Society of America | 1992
R.J.J.H. van Son; L.C.W. Pols
Speaking rate in general, and vowel duration more specifically, is thought to affect the dynamic structure of vowel formant tracks. To test this, a single, professional speaker read a long text at two different speaking rates, fast and normal. The present project investigated the extent to which the first and second formant tracks of eight Dutch vowels varied under the two different speaking rate conditions. A total of 549 pairs of vowel realizations from various contexts were selected for analysis. The formant track shape was assessed on a point-by-point basis, using 16 samples at the same relative positions in the vowels. Differences in speech rate only resulted in a uniform change in F1 frequency. Within each speaking rate, there was only evidence of a weak leveling off of the F1 tracks of the open vowels /a a/ with shorter durations. When considering sentence stress or vowel realizations from a more uniform, alveolar-vowel-alveolar context, these same conclusions were reached. These results indicate a much more active adaptation to speaking rate than implied by the target undershoot model.
Journal of the Acoustical Society of America | 1967
R. Plomp; L.C.W. Pols; J.P. van der Geer
Traditionally, the formant frequencies are regarded as the most important characteristics of the frequency spectra of vowels. It is possible, however, to approach the differences between vowel spectra in a more general way by means of a dimensional analysis. For a particular vowel, the sound‐pressure levels in each of a number of frequency passbands can be considered as coordinates of a point in a multidimensional Euclidean space. Different vowel spectra will result in different points. Frequency spectra of 15 Dutch vowels were determined with 18 bandpass filters (10 speakers). The analysis indicated that the “cloud” of 150 points can be described by four independent dimensions that are linear combinations of the original 18. The percentage of total variance “explained” by these dimensions were 37.2%, 31.2%, 9.0%, and 6.7%, respectively. This approach presents interesting perspectives for the development of vowel‐discrimination equipment.
IEEE Transactions on Computers | 1971
L.C.W. Pols
First a survey is given of a number of published vowel and word recognition systems. Then a new real-time word recognition system is described that uses only a small computer (8K memory) and a few analog peripherals. The essentials of the procedure are as follows. During the pronunciation of a word, a spectral analysis is carried out by a bank of 17 1/3-octave bandpass filters. The outputs of the filters are logarithmically amplified and the maximal amplitude of the envelope is determined and sampled every 15 ms. In this way a word is characterized by a sequence of sample points in a 17-dimensional space. Then a principal components analysis is performed, reducing the original 17 dimensions of the space to 3. After a linear time normalization, the 3-dimensional trace of the spoken word is compared with 20 reference traces, representing the 20 possible utterances (the digits, plus 10 computer commands). The machine responds by naming the best fitting trace. With the 20 speakers of the design set, the machine is correct 98.8 percent of the time.
Speech Communication | 1996
L.C.W. Pols; X. Wang; Louis ten Bosch
Abstract As indicated by Bourlard et al. (1996), the best and simplest solution so far in standard ASR technology to implement durational knowledge, seems to consist of imposing a (trained) minimum segment duration, simply by duplicating or adding states that cannot be skipped. We want to argue that recognition performance can be further improved by incorporating “specific knowledge” (such as duration and pitch) into the recognizer. This can be achieved by optimising the probabilistic acoustic and language models, and probably also by a postprocessing step that is fully based on this specific knowledge. The widely available, hand-segmented, TIMIT database was used by us to extract duration regularities, that persist despite the great speaker variability. Two main approaches were used. In the first approach, duration distributions are considered for single phones, as well as for various broader classes, such as those specified by long or short vowels, word stress, syllable position within the word and within an utterance, post-vocalic consonants, and utterance speaking rate. The other approach is to use a hierarchically structured analysis of variance to study the numerical contributions of 11 different factors to the variation in duration. Several systematic effects have been found, but several other effects appeared to be obscured by the inherent variability in this speech material. Whether this specific use of knowledge about duration in a post-processor will actually improve recognition performance still has to be shown. However, in line with the prophetic message of Bourlard et al.s paper, we here consider the improvement of performance as of secondary importance.
Speech Communication | 1993
L.C.W. Pols; R.J.J.H. van Son
Abstract Some 550 vowel segments have been excised from a text read by a Dutch speaker, both at normal rate and at fast rate. The duration of each segment is measured, as well as static and dynamic formant characteristics, such as midpoint formant frequencies, and descriptions of the formant tracks in terms of 16 equidistant points per segment, or Legendre polynomial functions. We examined these formant characteristics as a function of vowel duration, but found no indication for duration-dependent undershoot. Instead, this speaker showed very consistent consonant-specific coarticulatory behavior and adapted his speaking style to the speaking rate in order to reach the same midpoint formant frequencies. Various (parabolically stylized) formant tracks, at various durations, in isolation or in CVC contexts, were synthesized and presented to listeners for identification. Net shifts in vowel responses, compared to stationary stimuli, showed no indication of perceptual overshoot. A weighted averaging method with the greatest weight to formant frequencies in the final part of the vowel tokens, explained the results best.
Speech Communication | 1999
R.J.J.H. van Son; L.C.W. Pols
Abstract In two papers, Nearey (1992, 1997) discusses the fact that theories on phoneme identification generally favor strong cues that are localized in the speech signal. He proposes an alternative view in which cues to phoneme identity are relatively weak and dispersed. In the present listening experiment, Dutch subjects identified speech tokens containing fragments of vowel and consonant realizations and their immediate neighbors, taken from connected read speech. Using a measure of listener confusion based on the perplexity of the confusion matrix, it is possible to quantify the amount of information extracted by the listeners from different parts of the speech signal. Around half the information needed for the identification task was extracted from only a short, 40–50 ms, speech fragment. Considerable amounts of additional information were extracted from parts of the signal at, and beyond, the conventional boundaries of the segment, here called perisegmental speech. Speech in front of the target segment improved identification more than speech following the target segment, even if this speech was actually not part of the target phoneme itself. Correct identification of pre-vocalic consonants correlated with the correct identification of the following vowel, and vice versa. The identification of post-vocalic consonants was not correlated with the identification of the vowel in front. It is concluded that human listeners extract an important fraction of the information needed to identify phonemes from outside the conventional segment boundaries. This supports the proposal of Nearey that extended, “weak” cues might play an important part in the identification of phonemes.
Journal of the Acoustical Society of America | 1995
Astrid van Wieringen; L.C.W. Pols
Two discrimination experiments were performed to determine auditory sensitivity for single and complex consonant–vowel (CV)‐and vowel–consonant (VC)‐like formant transitions. In experiment 1 difference limens in end‐point frequency were determined by means of same/different paired comparison tasks for 20‐, 30‐, and 50‐ms second formant (F2) speechlike transitions, followed or preceded by an 80‐ms vowel‐like steady state in initial or final position, respectively. The F2 transition was either single or part of a multiformant (complex) stimulus, also containing a fixed F1 transition with a steady state, a stationary third formant, and a 20‐ms voice bar. Just‐noticeable differences in end‐point frequency decrease with increasing transition duration in all conditions and are smaller for single transitions than for transitions in a multiformant complex. Although difference limens in end‐point frequency increase with increase in frequency extent, they are smaller in final than in initial position. As for relati...
Speech Communication | 1983
L.C.W. Pols
Abstract Dutch consonants, spoken in lists of two-syllable nonsense words of the type CVCVC which were embedded in short carrier phrases, were identified by listeners under various acoustic disturbance conditions. The 28 conditions were a mixture of four reverberation times, five signal-to-noise rations, and five different noise spectra. The identification results were summated over the six talkers and five listeners. In this way we achieved 28 confusion matrices per constant position (initial, medial, and final). These sets of matrices were processed by individual differences multidimensional scaling programs, and more specifically by TUCKALS (Kroonenberg and de Leeuw, [9]). The resulting three-dimensional stimulus configuration for the initial consonants is very stable and can be represented as a tetrahedron with /z, s/, /m, n/, /p, t, k, b, d/, and /f, v, χ/ at the four corner points and /l, r, w, j, h/ in the centre. This consonant configuration is discussed with respect to its relevance to the Dutch language given the experimental conditions. The representation of the 28 conditions turns out to be almost exclusively one-dimensional despite the three different aspects (reverberation time, noise level, noise spectrum) of the acoustic disturbances.