Teija Waaramaa
University of Tampere
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Teija Waaramaa.
Journal of Voice | 2010
Teija Waaramaa; Anne-Maria Laukkanen; Matti Airas; Paavo Alku
This study aimed to investigate the role of voice source and formant frequencies in the perception of emotional valence and psychophysiological activity level from short vowel samples (approximately 150 milliseconds). Nine professional actors (five males and four females) read a prose passage simulating joy, tenderness, sadness, anger, and a neutral emotional state. The stress carrying vowel [a:] was extracted from continuous speech during the Finnish word [ta:k:ahan] and analyzed for duration, fundamental frequency (F0), equivalent sound level (L(eq)), alpha ratio, and formant frequencies F1-F4. Alpha ratio was calculated by subtracting the L(eq) (dB) in the range 50 Hz-1 kHz from the L(eq) in the range 1-5 kHz. The samples were inverse filtered by Iterative Adaptive Inverse Filtering and the estimates of the glottal flow obtained were parameterized with the normalized amplitude quotient (NAQ = f(AC)/(d(peak)T)). Fifty listeners (mean age 28.5 years) identified the emotional valences from the randomized samples. Multinomial Logistic Regression Analysis was used to study the interrelations of the parameters for perception. It appeared to be possible to identify valences from vowel samples of short duration ( approximately 150 milliseconds). NAQ tended to differentiate between the valences and activity levels perceived in both genders. Voice source may not only reflect variations of F0 and L(eq), but may also have an independent role in expression, reflecting phonation types. To some extent, formant frequencies appeared to be related to valence perception but no clear patterns could be identified. Coding of valence tends to be a complicated multiparameter phenomenon with wide individual variation.
Logopedics Phoniatrics Vocology | 2006
Teija Waaramaa; Paavo Alku; Anne-Maria Laukkanen
The present study investigates the role of F3 in the perception of valence of emotional expressions by using a vowel [a:] with different F3 values: the original, one with F3 either lowered or raised by 30% in frequency, and one with F3 removed. The vowel [a:] was extracted from the simulated emotions, inverse filtered and manipulated. The resulting 12 synthesized samples were randomized and presented to 30 listeners who evaluated the valence (positiveness/negativeness) of the expressions. The vowel with raised F3 was perceived more often as positive than the sample with original (p=0.063), lowered (p=0.006) or removed F3 (p=0.066). F3 may affect perception of valence if the signal has sufficient energy in high frequency range.
Logopedics Phoniatrics Vocology | 2006
Juhani Toivanen; Teija Waaramaa; Paavo Alku; Anne-Maria Laukkanen; Tapio Seppänen; Eero Väyrynen; Matti Airas
The aim of this investigation is to study how well voice quality conveys emotional content that can be discriminated by human listeners and the computer. The speech data were produced by nine professional actors (four women, five men). The speakers simulated the following basic emotions in a unit consisting of a vowel extracted from running Finnish speech: neutral, sadness, joy, anger, and tenderness. The automatic discrimination was clearly more successful than human emotion recognition. Human listeners thus apparently need longer speech samples than vowel-length units for reliable emotion discrimination than the machine, which utilizes quantitative parameters effectively for short speech samples.
Folia Phoniatrica Et Logopaedica | 2008
Teija Waaramaa; Anne-Maria Laukkanen; Paavo Alku; Eero Väyrynen
Fundamental frequency (F₀) and intensity are known to be important variables in the communication of emotions in speech. In singing, however, pitch is predetermined and yet the voice should convey emotions. Hence, other vocal parameters are needed to express emotions. This study investigated the role of voice source characteristics and formant frequencies in the communication of emotions in monopitched vowel samples [a:], [i:] and [u:]. Student actors (5 males, 8 females) produced the emotional samples simulating joy, tenderness, sadness, anger and a neutral emotional state. Equivalent sound level (Leq), alpha ratio [SPL (1–5 kHz) – SPL (50 Hz–1 kHz)] and formant frequencies F1–F4 were measured. The [a:] samples were inverse filtered and the estimated glottal flows were parameterized with the normalized amplitude quotient [NAQ = fAC/(dpeakT)]. Interrelations of acoustic variables were studied by ANCOVA, considering the valence and psychophysiological activity of the expressions. Forty participants listened to the randomized samples (n = 210) for identification of the emotions. The capacity of monopitched vowels for conveying emotions differed. Leq and NAQ differentiated activity levels. NAQ also varied independently of Leq. In [a:], filter (formant frequencies F1–F4) was related to valence. The interplay between voice source and F1–F4 warrants a synthesis study.
Logopedics Phoniatrics Vocology | 2013
Teija Waaramaa; Elina Kankare
Abstract The aim of the present study was to investigate whether the glottal and filter variables of emotional expressions vary by emotion and valence expressed. Prolonged emotional vowels (n = 96) were produced by professional actors and actresses (n = 4) expressing joy, surprise, interest, sadness, fear, anger, disgust, and a neutral emotional state. Acoustic parameters and the contact quotient from the electroglottographic signal (CQEGG) were calculated. Statistics were calculated for the parameters. Vocal fold contact time differed significantly between the emotional expressions reflecting differences in phonation types. It was concluded that CQEGG may vary simultaneously and inversely with F3 and F4 in emotional expressions of positive emotions. Changes in the lower pharynx and larynx may affect the higher formant frequencies.
Frontiers in Psychology | 2013
Teija Waaramaa; Timo Leisiö
The present study focused on voice quality and the perception of the basic emotions from speech samples in cross-cultural conditions. It was examined whether voice quality, cultural, or language background, age, or gender were related to the identification of the emotions. Professional actors (n2) and actresses (n2) produced non-sense sentences (n32) and protracted vowels (n8) expressing the six basic emotions, interest, and a neutral emotional state. The impact of musical interests on the ability to distinguish between emotions or valence (on an axis positivity – neutrality – negativity) from voice samples was studied. Listening tests were conducted on location in five countries: Estonia, Finland, Russia, Sweden, and the USA with 50 randomly chosen participants (25 males and 25 females) in each country. The participants (total N = 250) completed a questionnaire eliciting their background information and musical interests. The responses in the listening test and the questionnaires were statistically analyzed. Voice quality parameters and the share of the emotions and valence identified correlated significantly with each other for both genders. The percentage of emotions and valence identified was clearly above the chance level in each of the five countries studied, however, the countries differed significantly from each other for the identified emotions and the gender of the speaker. The samples produced by females were identified significantly better than those produced by males. Listeners age was a significant variable. Only minor gender differences were found for the identification. Perceptual confusion in the listening test between emotions seemed to be dependent on their similar voice production types. Musical interests tended to have a positive effect on the identification of the emotions. The results also suggest that identifying emotions from speech samples may be easier for those listeners who share a similar language or cultural background with the speaker.
International Journal of Listening | 2018
Teija Waaramaa; Tarja Kukkonen; Molly Stoltz; Ahmed Geneid
In the present pilot study, the researchers investigated how people with impaired hearing identify emotions from auditory and visual stimuli, with people with normal hearing acting as their controls. Two separate experiments were conducted. The viewpoint was in the communication and social function of emotion perception. Professional actors of both genders produced emotional nonsense samples without linguistic content, samples in the Finnish language, and prolonged vowel samples. In Experiment 1, nine Cochlear implant users and nine controls participated in the listening test. In Experiment 2, nine users of a variety of hearing aids and nine controls participated in the perception test. The results of both experiments showed a statistically significant difference between the two testing groups, people with hearing impairment and people with normal hearing, in the emotion identification and valence perception from both auditory and visual stimuli. The results suggest that hearing aids and cochlear implants do not transfer well enough the nuances within emotions conveyed by the voice. The results also suggest difficulties in the visual perception among people with hearing impairment. This warrants further studies with larger samples.
Logopedics Phoniatrics Vocology | 2015
Teija Waaramaa
The present study focused on the identification of emotions in cross-cultural conditions on different continents and among subjects with divergent language backgrounds. The aim was to investigate whether the perception of the basic emotions from nonsense vocal samples was universal, dependent on voice quality, musicality, and/or gender. Listening tests for 350 participants were conducted on location in a variety of cultures: China, Egypt, Estonia, Finland, Russia, Sweden, and the USA. The results suggested that the voice quality parameters played a role in the identification of emotions without the linguistic content. Cultural background may affect the interpretation of the emotions more than the presumed universality. Musical interest tended to facilitate emotion identification. No gender differences were found.
Logopedics Phoniatrics Vocology | 2015
Teija Waaramaa; Pertti Palo; Elina Kankare
Vocal emotions are expressed either by speech or singing. The difference is that in singing the pitch is predetermined while in speech it may vary freely. It was of interest to study whether there were voice quality differences between freely varying and mono-pitched vowels expressed by professional actors. Given their profession, actors have to be able to express emotions both by speech and singing. Electroglottogram and acoustic analyses of emotional utterances embedded in expressions of freely varying vowels [a:], [i:], [u:] (96 samples) and mono-pitched protracted vowels (96 samples) were studied. Contact quotient (CQEGG) was calculated using 35%, 55%, and 80% threshold levels. Three different threshold levels were used in order to evaluate their effects on emotions. Genders were studied separately. The results suggested significant gender differences for CQEGG 80% threshold level. SPL, CQEGG, and F4 were used to convey emotions, but to a lesser degree, when F0 was predetermined. Moreover, females showed fewer significant variations than males. Both genders used more hypofunctional phonation type in mono-pitched utterances than in the expressions with freely varying pitch. The present material warrants further study of the interplay between CQEGG threshold levels and formant frequencies, and listening tests to investigate the perceptual value of the mono-pitched vowels in the communication of emotions.
Logopedics Phoniatrics Vocology | 2005
Juhani Toivanen; Teija Waaramaa
Authentic Finnish-English speech data was collected as part of an English conversation class in a Finnish college. Intonation was coded utilizing the framework involving ‘tone’, ‘key’ and ‘termination’. A categorization of voice quality was chosen (e.g., ‘modal voice’, ‘creak’, ‘breathy’, and ‘tense’). The tempo of speech was transcribed with such descriptors as, e.g., ‘fast’ and ‘slow’. The majority of dispreferred turns in the data represented mitigated disagreement, with structural complexity (involving, e.g., wordiness). The p tone was predominant: a low/mid key often accompanied mitigated disagreement. The r and r+ tones were virtually absent in the mitigated dispreferred turns. Instead, the speakers often used a very lax/breathy voice quality and a slow/decelerating tempo, often resulting in creak near a transition relevant place.