Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paavo Alku is active.

Publication


Featured researches published by Paavo Alku.


conference of the international speech communication association | 1992

Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering

Paavo Alku

Abstract A new glottal wave analysis method, Pitch Synchronous Iterative Adaptive Inverse Filtering (PSIAIF) is presented. The algorithm is based on a previously developed method, Iterative Adaptive Inverse Filtering (IAIF). In the IAIF-method the glottal contribution to the speech spectrum is first estimated with an iterative structure. The vocal tract transfer function is modeled after eliminating the average glottal contribution. The glottal excitation is obtained by cancelling the effects of the vocal tract and lip radiation by inverse filtering. In the new PSIAIF-method the glottal pulseform is computed by applying the IAIF-algorithm twice to the same signal. The first IAIF-analysis gives as a result a glottal excitation that spans over several pitch periods. This pulseform is used in order to determine positions and lengths of frames for the pitch synchronous analysis. The final result is obtained by analysing the original speech signal with the IAIF-algorithm one fundamental period at a time. The PSIAIF-algorithm was applied in glottal wave analysis using both synthetic and natural vowels. The results show that the method is able to give a fairly accurate estimate for the glottal flow excluding the analysis of vowels with a low first formant that are produced with a pressed phonation type.


Proceedings of the National Academy of Sciences of the United States of America | 2003

Speech–sound-selective auditory impairment in children with autism: They can perceive but do not attend

Rita Ceponiene; T. Lepistö; Anna Shestakova; Raija Vanhala; Paavo Alku; Risto Näätänen; Kyoshi Yaguchi

In autism, severe abnormalities in social behavior coexist with aberrant attention and deficient language. In the attentional domain, attention to people and socially relevant stimuli is impaired the most. Because socially meaningful stimulus events are physically complex, a deficiency in sensory processing of complex stimuli has been suggested to contribute to aberrant attention and language in autism. This study used event-related brain potentials (ERP) to examine the sensory and early attentional processing of sounds of different complexity in high-functioning children with autism. Acoustically matched simple tones, complex tones, and vowels were presented in separate oddball sequences, in which a repetitive “standard” sound was occasionally replaced by an infrequent “deviant” sound differing from the standard in frequency (by 10%). In addition to sensory responses, deviant sounds elicited an ERP index of automatic sound-change discrimination, the mismatch negativity, and an ERP index of attentional orienting, the P3a. The sensory sound processing was intact in the high-functioning children with autism and was not affected by sound complexity or “speechness.” In contrast, their involuntary orienting was affected by stimulus nature. It was normal to both simple- and complex-tone changes but was entirely abolished by vowel changes. These results demonstrate that, first, auditory orienting deficits in autism cannot be explained by sensory deficits and, second, that orienting deficit in autism might be speech–sound specific.


Psychophysiology | 1999

Brain responses reveal the learning of foreign language phonemes

Istvańn Winkler; Teija Kujala; Hannu Tiitinen; Päivi Sivonen; Paavo Alku; Anne Lehtokoski; István Czigler; Valéria Csépe; Risto J. Ilmoniemi; Risto Näätänen

Learning to speak a new language requires the formation of recognition patterns for the speech sounds specific to the newly acquired language. The present study demonstrates the dynamic nature of cortical memory representations for phonemes in adults by using the mismatch negativity (MMN) event-related potential. We studied Hungarian and Finnish subjects, dividing the Hungarians into a naive (no knowledge of Finnish) and a fluent (in Finnish) group. We found that the MMN for a contrast between two Finnish phonemes was elicited in the fluent Hungarians but not in the naive Hungarians. This result indicates that the fluent Hungarians developed cortical memory representations for the Finnish phoneme system that enabled them to preattentively categorize phonemes specific to this language.


NeuroImage | 2001

Memory traces for words as revealed by the Mismatch Negativity

Friedemann Pulvermüller; Teija Kujala; Yury Shtyrov; Jaana Simola; Hannu Tiitinen; Paavo Alku; Kimmo Alho; Sami Martinkauppi; Risto J. Ilmoniemi; Risto Näätänen

Brain responses to the same spoken syllable completing a Finnish word or a pseudo-word were studied. Native Finnish-speaking subjects were instructed to ignore the sound stimuli and watch a silent movie while the mismatch negativity (MMN), an automatic index of experience-dependent auditory memory traces, was recorded. The MMN to each syllable was larger when it completed a word than when it completed a pseudo-word. This enhancement, reaching its maximum amplitude at about 150 ms after the words recognition point, did not occur in foreign subjects who did not know any Finnish. These results provide the first demonstration of the presence of memory traces for individual spoken words in the human brain. Using whole-head magnetoencephalography, the major intracranial source of this word-related MMN was found in the left superior temporal lobe.


Brain Research | 2005

The discrimination of and orienting to speech and non-speech sounds in children with autism.

T. Lepistö; Teija Kujala; Raija Vanhala; Paavo Alku; Minna Huotilainen; Risto Näätänen

The present study aimed to find out how different stages of cortical auditory processing (sound encoding, discrimination, and orienting) are affected in children with autism. To this end, auditory event-related potentials (ERP) were studied in 15 children with autism and their controls. Their responses were recorded for pitch, duration, and vowel changes in speech stimuli, and for corresponding changes in the non-speech counterparts of the stimuli, while the children watched silent videos and ignored the stimuli. The responses to sound repetition were diminished in amplitude in the children with autism, reflecting impaired sound encoding. The mismatch negativity (MMN), an ERP indexing sound discrimination, was enhanced in the children with autism as far as pitch changes were concerned. This is consistent with earlier studies reporting auditory hypersensitivity and good pitch-processing abilities, as well as with theories proposing enhanced perception of local stimulus features in individuals with autism. The discrimination of duration changes was impaired in these children, however. Finally, involuntary orienting to sound changes, as reflected by the P3a ERP, was more impaired for speech than non-speech sounds in the children with autism, suggesting deficits particularly in social orienting. This has been proposed to be one of the earliest symptoms to emerge, with pervasive effects on later development.


Cognitive Brain Research | 1999

Pre-attentive detection of vowel contrasts utilizes both phonetic and auditory memory representations

István Winkler; Anne Lehtokoski; Paavo Alku; Martti Vainio; István Czigler; Valéria Csépe; Olli Aaltonen; Ilkka Raimo; Kimmo Alho; Heikki Lang; Antti Iivonen; Risto Näätänen

Event-related brain potentials (ERP) were recorded to infrequent changes of a synthesized vowel (standard) to another vowel (deviant) in speakers of Hungarian and Finnish language, which are remotely related to each other with rather similar vowel systems. Both language groups were presented with identical stimuli. One standard-deviant pair represented an across-vowel category contrast in Hungarian, but a within-category contrast in Finnish, with the other pair having the reversed role in the two languages. Both within- and across-category contrasts elicited the mismatch negativity (MMN) ERP component in the native speakers of either language. The MMN amplitude was larger in across- than within-category contrasts in both language groups. These results suggest that the pre-attentive change-detection process generating the MMN utilized both auditory (sensory) and phonetic (categorical) representations of the test vowels.


Journal of the Acoustical Society of America | 2002

Normalized amplitude quotient for parametrization of the glottal flow

Paavo Alku; Tom Bäckström; Erkki Vilkman

Normalized amplitude quotient (NAQ) is presented as a method to parametrize the glottal closing phase using two amplitude-domain measurements from waveforms estimated by inverse filtering. In this technique, the ratio between the amplitude of the ac flow and the negative peak amplitude of the flow derivative is first computed using the concept of equivalent rectangular pulse, a hypothetical signal located at the instant of the main excitation of the vocal tract. This ratio is then normalized with respect to the length of the fundamental period. Comparison between NAQ and its counterpart among the conventional time-domain parameters, the closing quotient, shows that the proposed parameter is more robust against distortion such as measurement noise that make the extraction of conventional time-based parameters of the glottal flow problematic. Experiments with breathy, normal, and pressed vowels indicate that NAQ is also able to separate the type of phonation effectively.


Clinical Neurophysiology | 1999

A method for generating natural-sounding speech stimuli for cognitive brain research

Paavo Alku; Hannu Tiitinen; Risto Näätänen

OBJECTIVE In response to the rapidly increasing interest in using human voice in cognitive brain research, a new method, semisynthetic speech generation (SSG), is presented for generation of speech stimuli. METHODS The method synthesizes speech stimuli as a combination of purely artificial processes and processes that originate from the natural human speech production mechanism. SSG first estimates the source of speech, the glottal flow, from a natural utterance using an inverse filtering technique. The glottal flow obtained is then used as an excitation to an artificial digital filter that models the formant structure of speech. RESULTS SSG is superior to commercial voice synthesizers because it yields speech stimuli of a highly natural quality due to the contribution of the man-originating glottal excitation. CONCLUSION The artificial modelling of the vocal tract enables one to adjust the formant frequencies of the stimuli as desired, thus making SSG suitable for cognitive experiments using speech sounds as stimuli.


BMC Neuroscience | 2009

Statistical language learning in neonates revealed by event-related brain potentials.

Tuomas Teinonen; Vineta Fellman; Risto Näätänen; Paavo Alku; Minna Huotilainen

BackgroundStatistical learning is a candidate for one of the basic prerequisites underlying the expeditious acquisition of spoken language. Infants from 8 months of age exhibit this form of learning to segment fluent speech into distinct words. To test the statistical learning skills at birth, we recorded event-related brain responses of sleeping neonates while they were listening to a stream of syllables containing statistical cues to word boundaries.ResultsWe found evidence that sleeping neonates are able to automatically extract statistical properties of the speech input and thus detect the word boundaries in a continuous stream of syllables containing no morphological cues. Syllable-specific event-related brain responses found in two separate studies demonstrated that the neonatal brain treated the syllables differently according to their position within pseudowords.ConclusionThese results demonstrate that neonates can efficiently learn transitional probabilities or frequencies of co-occurrence between different syllables, enabling them to detect word boundaries and in this way isolate single words out of fluent natural speech. The ability to adopt statistical structures from speech may play a fundamental role as one of the earliest prerequisites of language acquisition.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering

Tuomo Raitio; Antti Suni; Junichi Yamagishi; Hannu Pulakka; Jani Nurminen; Martti Vainio; Paavo Alku

This paper describes an hidden Markov model (HMM)-based speech synthesizer that utilizes glottal inverse filtering for generating natural sounding synthetic speech. In the proposed method, speech is first decomposed into the glottal source signal and the model of the vocal tract filter through glottal inverse filtering, and thus parametrized into excitation and spectral features. The source and filter features are modeled individually in the framework of HMM and generated in the synthesis stage according to the text input. The glottal excitation is synthesized through interpolating and concatenating natural glottal flow pulses, and the excitation signal is further modified according to the spectrum of the desired voice source characteristics. Speech is synthesized by filtering the reconstructed source signal with the vocal tract filter. Experiments show that the proposed system is capable of generating natural sounding speech, and the quality is clearly better compared to two HMM-based speech synthesis systems based on widely used vocoder techniques.

Collaboration


Dive into the Paavo Alku's collaboration.

Top Co-Authors

Avatar

Erkki Vilkman

Helsinki University Central Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Antti Suni

University of Helsinki

View shared research outputs
Researchain Logo
Decentralizing Knowledge