Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Martti Vainio is active.

Publication


Featured researches published by Martti Vainio.


Cognitive Brain Research | 1999

Pre-attentive detection of vowel contrasts utilizes both phonetic and auditory memory representations

István Winkler; Anne Lehtokoski; Paavo Alku; Martti Vainio; István Czigler; Valéria Csépe; Olli Aaltonen; Ilkka Raimo; Kimmo Alho; Heikki Lang; Antti Iivonen; Risto Näätänen

Event-related brain potentials (ERP) were recorded to infrequent changes of a synthesized vowel (standard) to another vowel (deviant) in speakers of Hungarian and Finnish language, which are remotely related to each other with rather similar vowel systems. Both language groups were presented with identical stimuli. One standard-deviant pair represented an across-vowel category contrast in Hungarian, but a within-category contrast in Finnish, with the other pair having the reversed role in the two languages. Both within- and across-category contrasts elicited the mismatch negativity (MMN) ERP component in the native speakers of either language. The MMN amplitude was larger in across- than within-category contrasts in both language groups. These results suggest that the pre-attentive change-detection process generating the MMN utilized both auditory (sensory) and phonetic (categorical) representations of the test vowels.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering

Tuomo Raitio; Antti Suni; Junichi Yamagishi; Hannu Pulakka; Jani Nurminen; Martti Vainio; Paavo Alku

This paper describes an hidden Markov model (HMM)-based speech synthesizer that utilizes glottal inverse filtering for generating natural sounding synthetic speech. In the proposed method, speech is first decomposed into the glottal source signal and the model of the vocal tract filter through glottal inverse filtering, and thus parametrized into excitation and spectral features. The source and filter features are modeled individually in the framework of HMM and generated in the synthesis stage according to the text input. The glottal excitation is synthesized through interpolating and concatenating natural glottal flow pulses, and the excitation signal is further modified according to the spectrum of the desired voice source characteristics. Speech is synthesized by filtering the reconstructed source signal with the vocal tract filter. Experiments show that the proposed system is capable of generating natural sounding speech, and the quality is clearly better compared to two HMM-based speech synthesis systems based on widely used vocoder techniques.


European Journal of Neuroscience | 2006

Selective tuning of cortical sound-feature processing by language experience.

Mari Tervaniemi; Thomas Jacobsen; Stefan Röttger; Teija Kujala; Andreas Widmann; Martti Vainio; Risto Näätänen; Erich Schröger

In ‘quantity‐languages’, such as Japanese or Finnish, sound duration is linguistically relevant. We showed that quantity‐language speakers were superior to speakers of a non‐quantity language in discriminating the duration of even non‐speech sounds. In contrast, there was no group difference in the discrimination of sound frequency. This result, obtained both by behavioural and neural indices at attentive and automatic levels of processing, indicates precise feature‐specific tuning of the auditory‐cortex functions by the mother tongue.


international conference on acoustics, speech, and signal processing | 2011

Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis

Tuomo Raitio; Antti Suni; Hannu Pulakka; Martti Vainio; Paavo Alku

This paper describes a source modeling method for hidden Markov model (HMM) based speech synthesis for improved naturalness. A speech corpus is first decomposed into the glottal source signal and the model of the vocal tract filter using glottal inverse filtering, and parametrized into excitation and spectral features. Additionally, a library of glottal source pulses is extracted from the estimated voice source signal. In the synthesis stage, the excitation signal is generated by selecting appropriate pulses from the library according to the target cost of the excitation features and a concatenation cost between adjacent glottal source pulses. Finally, speech is synthesized by filtering the excitation signal by the vocal tract filter. Experiments show that the naturalness of the synthetic speech is better or equal, and speaker similarity is better, compared to a system using only single glottal source pulse.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Evaluation of an Artificial Speech Bandwidth Extension Method in Three Languages

Hannu Pulakka; Laura Laaksonen; Martti Vainio; Jouni Pohjalainen; Paavo Alku

Quality and intelligibility of narrowband telephone speech can be improved by artificial bandwidth extension (ABE), which extends the speech bandwidth using only information available in the narrowband speech signal. This paper reports a three-language evaluation of an ABE method that has recently been launched in several of Nokias mobile telephone models. The method extends the speech bandwidth to frequencies above the telephone band by first utilizing spectral folding and then modifying the magnitude spectrum of the extension band with spline curves. The performance of the method was evaluated by formal listening tests in American English, Russian, and Mandarin Chinese. The results of the listening tests indicate that ABE processing improved the subjective quality of coded narrowband speech in all these languages. Differences between bandwidth-extended American English test sentences and their original wideband counterparts were also evaluated using both an objective distance measure that simulates the characteristics of human hearing and a conventional spectral distortion measure. The average objective error was calculated for different categories of speech sounds. The error was found to be smallest in nasals and semivowels and largest in fricative sounds.


Frontiers in Psychology | 2013

Music and speech prosody: a common rhythm

Maija Hausen; Ritva Torppa; Viljami R. Salmela; Martti Vainio; Teppo Särkämö

Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).


Journal of the Acoustical Society of America | 2013

Formant frequency estimation of high-pitched vowels using weighted linear prediction

Paavo Alku; Jouni Pohjalainen; Martti Vainio; Anne-Maria Laukkanen; Brad H. Story

All-pole modeling is a widely used formant estimation method, but its performance is known to deteriorate for high-pitched voices. In order to address this problem, several all-pole modeling methods robust to fundamental frequency have been proposed. This study compares five such previously known methods and introduces a technique, Weighted Linear Prediction with Attenuated Main Excitation (WLP-AME). WLP-AME utilizes temporally weighted linear prediction (LP) in which the square of the prediction error is multiplied by a given parametric weighting function. The weighting downgrades the contribution of the main excitation of the vocal tract in optimizing the filter coefficients. Consequently, the resulting all-pole model is affected more by the characteristics of the vocal tract leading to less biased formant estimates. By using synthetic vowels created with a physical modeling approach, the results showed that WLP-AME yields improved formant frequencies for high-pitched sounds in comparison to the previously known methods (e.g., relative error in the first formant of the vowel [a] decreased from 11% to 3% when conventional LP was replaced with WLP-AME). Experiments conducted on natural vowels indicate that the formants detected by WLP-AME changed in a more regular manner between repetitions of different pitch than those computed by conventional LP.


Journal of Phonetics | 2006

Tonal features, intensity, and word order in the perception of prominence

Martti Vainio; Juhani Järvikivi

Abstract The perception of prominence as a function of sentence stress in Finnish was investigated in four experiments. Listeners judged the relative prominence of two consecutive nouns in a three-word utterance, where the accentuation of the nouns was systematically varied by tonal means. Experiments 1 and 2 investigated both the tonal features underlying the subjects’ responses as well as the influence of word order on the perceived prominence of the two accented words. The results showed that similar tonal features regardless of other phonetic differences conditioned the subjects’ judgments of prominence. They further showed that changing the word order influenced the distribution of responses in the two experiments. Two further experiments were administered to check the possible influence of slight tonal and intensity differences in the first two experiments. Only intensity was found to affect the distribution of judgments. Furthermore, the influence was local and only affected the last of the two words. Overall the results suggest that the most important tonal features responsible for the perception of prominence form a so-called flat-hat pattern. That also indicates that different kinds of focus structure influence the perception of prominence even when the judgments are based on decisions about the place of sentence stress.


Journal of the Acoustical Society of America | 2007

Focus in production:Tonal shape, intensity and word order

Martti Vainio; Juhani Järvikivi

The effect of word order and prosodic focus on the tonal shape and intensity in the production of prosody was studied. The results show that the production of focus in Finnish follows a global pattern with regard to tonal features. The relative pitch height difference between contrasted words is the most important pitch-related factor in signaling narrow prosodic focus. Narrow focus is not localized to prosodically emphasized words only but relates to the utterance as a whole. It was also found that syntactic structure with respect to both intensity and tonal structure modulated relative prosodic prominence of individual words.


International Journal of Audiology | 2014

The perception of prosody and associated auditory cues in early-implanted children: The role of auditory working memory and musical activities

Ritva Torppa; Andrew Faulkner; Minna Huotilainen; Juhani Järvikivi; Jari Lipsanen; Marja Laasonen; Martti Vainio

Abstract Objective: To study prosodic perception in early-implanted children in relation to auditory discrimination, auditory working memory, and exposure to music. Design: Word and sentence stress perception, discrimination of fundamental frequency (F0), intensity and duration, and forward digit span were measured twice over approximately 16 months. Musical activities were assessed by questionnaire. Study sample: Twenty-one early-implanted and age-matched normal-hearing (NH) children (4–13 years). Results: Children with cochlear implants (CIs) exposed to music performed better than others in stress perception and F0 discrimination. Only this subgroup of implanted children improved with age in word stress perception, intensity discrimination, and improved over time in digit span. Prosodic perception, F0 discrimination and forward digit span in implanted children exposed to music was equivalent to the NH group, but other implanted children performed more poorly. For children with CIs, word stress perception was linked to digit span and intensity discrimination: sentence stress perception was additionally linked to F0 discrimination. Conclusions: Prosodic perception in children with CIs is linked to auditory working memory and aspects of auditory discrimination. Engagement in music was linked to better performance across a range of measures, suggesting that music is a valuable tool in the rehabilitation of implanted children.

Collaboration


Dive into the Martti Vainio's collaboration.

Top Co-Authors

Avatar

Antti Suni

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lari Vainio

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge