Volker Dellwo
University of Zurich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Volker Dellwo.
Speaker Classification I | 2007
Volker Dellwo; Mark Huckvale; Michael Ashby
As well as conveying a message in words and sounds, the speech signal carries information about the speakers own anatomy, physiology, linguistic experience and mental state. These speaker characteristics are found in speech at all levels of description: from the spectral information in the sounds to the choice of words and utterances themselves. This chapter presents an introduction to speech production and to the phonetic description of speech to facilitate discussion of how speech can be a carrier for speaker characteristics as well as a carrier for messages. The chapter presents an overview of the physical structures of the human vocal tract used in speech, it introduces the standard phonetic classification system for the description of spoken gestures and it presents a catalogue of the different ways in which individuality can be expressed through speech. The chapter ends with a brief description of some applications which require access to information about speaker characteristics in speech.
Journal of the Acoustical Society of America | 2015
Volker Dellwo; Adrian Leemann; Marie-José Kolly
Between-speaker variability of acoustically measurable speech rhythm [%V, ΔV(ln), ΔC(ln), and Δpeak(ln)] was investigated when within-speaker variability of (a) articulation rate and (b) linguistic structural characteristics was introduced. To study (a), 12 speakers of Standard German read seven lexically identical sentences under five different intended tempo conditions (very slow, slow, normal, fast, very fast). To study (b), 16 speakers of Zurich Swiss German produced 16 spontaneous utterances each (256 in total) for which transcripts were made and then read by all speakers (4096 sentences; 16 speaker × 256 sentences). Between-speaker variability was tested using analysis of variance with repeated measures on within-speaker factors. Results revealed strong and consistent between-speaker variability while within-speaker variability as a function of articulation rate and linguistic characteristics was typically not significant. It was concluded that between-speaker variability of acoustically measurable speech rhythm is strong and robust against various sources of within-speaker variability. Idiosyncratic articulatory movements were found to be the most plausible factor explaining between-speaker differences.
Journal of the Acoustical Society of America | 2008
Francisco Gutiérrez Díez; Volker Dellwo; Núria Gavaldà; Stuart Rosen
It has been demonstrated repeatedly that durational characteristics of consonantal (C) and vocalic (V) intervals are robust acoustic correlates of rhythm class (stress-timed, syllable-timed, mora-timed). Here, we investigate how objective rhythm measures develop during the acquisition of a second language. In a longitudinal study, nine native speakers of Spanish were recorded reading a text in English before and after a year of English language training at university level. A control group of nine native English speakers read the same text. Automatic forced alignment of speech segment boundaries using hidden Markov models allowed the calculation of C and V interval durations. Standard rhythm metrics (%V, deltaC, deltaV, PVI) were calculated for all recordings. First results show that durational C interval characteristics between native English and Spanish English (pre- and post-training) do not differ. However, V interval characteristics (deltaV, nPVI, %V) are lowest for English natives, higher for the Spanish post training group and highest for the Spanish pretraining group. The results suggest that (a) deficits of speech rhythm competence in a second language are mostly revealed on a vocalic level and (b) an increase in competence in a second language is reflected well by measurable speech rhythm.
conference of the international speech communication association | 2016
Lei He; Volker Dellwo
A speech signal can be viewed as a high frequency carrier signal containing the temporal fine structure (TFS) that is modulated by a low frequency envelope (ENV). A widely used method to decompose a speech signal into the TFS and ENV is the Hilbert transform. Although this method has been available for about one century and is widely applied in various kinds of speech processing tasks (e.g. speech chimeras), there are only very few speech processing packages that contain readily available functions for the Hilbert transform, and there is very little textbook type literature tailored for speech scientists to explain the processes behind the transform. With this paper we provide the code for carrying out the Hilbert operation to obtain the TFS and ENV in the widely used speech processing software Praat, and explain the basics of the procedure. To verify our code, we compare the Hilbert transform in Praat with a widely applied function for the same purpose in MATLAB (“hilbert(...)”). We can confirm that both methods arrive at identical outputs.
Archive | 2016
Adrian Leemann; Marie-José Kolly; Stephan Schmid; Volker Dellwo
This volume was inspired by the 9th edition of the Phonetik and Phonologie conference, held in Zurich in October 2013. It includes state of the art research on phonetics and phonology in various languages and from interdisciplinary contributors. The volume is structured into the following eight sections: segmentals, suprasegmentals, articulation in spoken and sign language, perception, phonology, crowdsourcing phonetic data, second language speech, and arts (with inevitable overlap between these areas).
Pellegrino, Elisa; He, Lei; Dellwo, Volker (2018). The Effect of Ageing on Speech Rhythm: A Study on Zurich German. In: Speech Prosody 2018, Poznan, 13 June 2018 - 16 June 2018, 133-137. | 2018
Elisa Pellegrino; Lei He; Volker Dellwo
Speech segmental and suprasegmental characteristics vary considerably across the life span, for example, due to degenerative changes in speech production mechanisms and neuro-muscolar control. A great deal of research on the acoustic correlates of adult speakers’ voice has focussed on changes in voice quality, vowel formant patterns, f0, amplitude and speech rate. Only little attention has been paid on speech rhythm variability due to advancing age. Here we quantified age-related rhythmic variability in terms of the durational characteristics of consonantal and vocalic intervals (henceforth CV intervals). We compared the segmental durational variability of two groups of Zurich German speakers. Group 1: 16 young adults, aged from 18 to 32 years; group 2: 10 older adults, aged from 66 to 81 years. For both groups we analyzed 20 sentences in Zurich German from the TEVOID Corpus. Between-speaker durational variability across age was quantified through a variety of interval-based metrics: segment rate, %V, deltaC, deltaV, VarcoC, VarcoV, rPVI-C and nPVI-V. Results showed that rhythmic differences between younger and older adults are largely accountable for by speech rate differences. Segment rate, %V and raw measures of CV interval durational variability (deltaV, deltaC and rPVI-C) showed effects between younger and older adults. Rate normalized metrics (VarcoC, VarcoV and n-PVI-V) did not differ significantly between the two age-groups.
Journal of the Acoustical Society of America | 2018
Thayabaran Kathiresan; Dieter Maurer; Heidy Suter; Volker Dellwo
When investigating formant pattern and spectral shape ambiguity in Klatt synthesis, an earlier study showed that the perceived vowel quality of Standard German vowel sounds can be changed by varying fundamental frequency only [Maurer et al. (2017). J. Acoust. Soc. Am. 141(5):3469-3470]. In this follow-up study, the previous original synthesis experiment was repeated twice, firstly, with fundamental frequencies (f o) of the corresponding sounds lowered by one octave, and secondly, with different ratios of the first and second formant amplitudes. Here, the role of the f o range and the formant amplitudes for the investigation of formant pattern and spectral shape ambiguity in vowel synthesis was further examined. The same five phonetic expert listeners that participated in the previous experiment also identified all of the newly synthesised sounds in a multiple-choice identification task. Results revealed that the perceived vowel quality only changes for f os above 200 Hz and that, for back vowels, the ratio of the formant amplitudes used in the Klatt synthesis also affects vowel recognition. Thus, the results of the experiments confirm earlier indications of a non-systematic relation between f o or pitch and formant patterns or spectral envelopes for vowel recognition.When investigating formant pattern and spectral shape ambiguity in Klatt synthesis, an earlier study showed that the perceived vowel quality of Standard German vowel sounds can be changed by varying fundamental frequency only [Maurer et al. (2017). J. Acoust. Soc. Am. 141(5):3469-3470]. In this follow-up study, the previous original synthesis experiment was repeated twice, firstly, with fundamental frequencies (f o) of the corresponding sounds lowered by one octave, and secondly, with different ratios of the first and second formant amplitudes. Here, the role of the f o range and the formant amplitudes for the investigation of formant pattern and spectral shape ambiguity in vowel synthesis was further examined. The same five phonetic expert listeners that participated in the previous experiment also identified all of the newly synthesised sounds in a multiple-choice identification task. Results revealed that the perceived vowel quality only changes for f os above 200 Hz and that, for back vowels, the rati...
Journal of the Acoustical Society of America | 2018
Dieter Maurer; Heidy Suter; Thayabaran Kathiresan; Volker Dellwo
In the literature, the recognition of sinewave vowels replicating statistical formant patterns is reported as impaired when compared to natural sounds. However, the corresponding formant simulating sinusoids were harmonically unrelated, with synthesised signals only accidentally being quasi-periodic, and vowel confusion was indicated to relate to vowel height. Involving five phonetic expert listeners, the present study tested vowel and pitch recognition of three sinusoid replicas based on statistical F 1-F 2-F 3 patterns of the Standard German closed and mid-open vowels /i-y-e-o-o-u/ for women, “corrected” approximations of these patterns with harmonically related sinusoids, and harmonical patterns with fixed first and third sinusoids, yet varying only the second sinusoid so as to effect a change in harmonic relation. The results showed strong vowel confusions for mid-open but only limited confusions for closed vowels. Additional effects on vowel recognition were indicated to concern harmonicity and speci...
Journal of the Acoustical Society of America | 2017
Lei He; Volker Dellwo
We model the amplitude envelope of a speech signal as a kinematic system and calculate its basic parameters: displacement, velocity, and acceleration. Such system captures the smoothed amplitude fluctuation pattern over time, illustrating how energy is distributed across the signal. Although the pulmonic air pressure is the primary energy source of speech, the amplitude modulation pattern is largely determined by articulatory behaviors, especially mandible and lip movements. Therefore, there should be a correspondence between signal envelope kinematics and articulator kinematics. Previous research showed that a tremendous amount of speaker idiosyncrasies in articulation existed. Such idiosyncrasies should therefore be reflected in the envelope kinematics as well. From the signal envelope kinematics, it may be possible to infer individual articulatory behaviors. This is particularly useful for forensic phoneticians who usually have no access to articulatory data, and clinical speech pathologists who usually find it difficult to make articulatory measurements in clinical consultations.We model the amplitude envelope of a speech signal as a kinematic system and calculate its basic parameters: displacement, velocity, and acceleration. Such system captures the smoothed amplitude fluctuation pattern over time, illustrating how energy is distributed across the signal. Although the pulmonic air pressure is the primary energy source of speech, the amplitude modulation pattern is largely determined by articulatory behaviors, especially mandible and lip movements. Therefore, there should be a correspondence between signal envelope kinematics and articulator kinematics. Previous research showed that a tremendous amount of speaker idiosyncrasies in articulation existed. Such idiosyncrasies should therefore be reflected in the envelope kinematics as well. From the signal envelope kinematics, it may be possible to infer individual articulatory behaviors. This is particularly useful for forensic phoneticians who usually have no access to articulatory data, and clinical speech pathologists who usuall...
Journal of the Acoustical Society of America | 2017
Daniel Friedrichs; Dieter Maurer; Stuart Rosen; Volker Dellwo
The phonological function of vowels can be maintained at fundamental frequencies (fo) up to 880 Hz [Friedrichs, Maurer, and Dellwo (2015). J. Acoust. Soc. Am. 138, EL36-EL42]. Here, the influence of talker variability and multiple response options on vowel recognition at high fos is assessed. The stimuli (n = 264) consisted of eight isolated vowels (/i y e ø ε a o u/) produced by three female native German talkers at 11 fos within a range of 220-1046 Hz. In a closed-set identification task, 21 listeners were presented excised 700-ms vowel nuclei with quasi-flat fo contours and resonance trajectories. The results show that listeners can identify the point vowels /i a u/ at fos up to almost 1 kHz, with a significant decrease for the vowels /y ε/ and a drop to chance level for the vowels /e ø o/ toward the upper fos. Auditory excitation patterns reveal highly differentiable representations for /i a u/ that can be used as landmarks for vowel category perception at high fos. These results suggest that theories of vowel perception based on overall spectral shape will provide a fuller account of vowel perception than those based solely on formant frequency patterns.