Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hannu Pulakka is active.

Publication


Featured researches published by Hannu Pulakka.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering

Tuomo Raitio; Antti Suni; Junichi Yamagishi; Hannu Pulakka; Jani Nurminen; Martti Vainio; Paavo Alku

This paper describes an hidden Markov model (HMM)-based speech synthesizer that utilizes glottal inverse filtering for generating natural sounding synthetic speech. In the proposed method, speech is first decomposed into the glottal source signal and the model of the vocal tract filter through glottal inverse filtering, and thus parametrized into excitation and spectral features. The source and filter features are modeled individually in the framework of HMM and generated in the synthesis stage according to the text input. The glottal excitation is synthesized through interpolating and concatenating natural glottal flow pulses, and the excitation signal is further modified according to the spectrum of the desired voice source characteristics. Speech is synthesized by filtering the reconstructed source signal with the vocal tract filter. Experiments show that the proposed system is capable of generating natural sounding speech, and the quality is clearly better compared to two HMM-based speech synthesis systems based on widely used vocoder techniques.


international conference on acoustics, speech, and signal processing | 2011

Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis

Tuomo Raitio; Antti Suni; Hannu Pulakka; Martti Vainio; Paavo Alku

This paper describes a source modeling method for hidden Markov model (HMM) based speech synthesis for improved naturalness. A speech corpus is first decomposed into the glottal source signal and the model of the vocal tract filter using glottal inverse filtering, and parametrized into excitation and spectral features. Additionally, a library of glottal source pulses is extracted from the estimated voice source signal. In the synthesis stage, the excitation signal is generated by selecting appropriate pulses from the library according to the target cost of the excitation features and a concatenation cost between adjacent glottal source pulses. Finally, speech is synthesized by filtering the excitation signal by the vocal tract filter. Experiments show that the naturalness of the synthetic speech is better or equal, and speaker similarity is better, compared to a system using only single glottal source pulse.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum

Hannu Pulakka; Paavo Alku

The limited audio bandwidth used in narrowband telephone systems degrades both the quality and the intelligibility of speech. This paper presents a new method for the bandwidth extension of telephone speech. Frequency components are added to the frequency band 4-8 kHz using only the information in the narrowband speech. A neural network is used to estimate the mel spectrum in the extension band in short time frames based on features calculated from the narrowband speech. A wideband excitation signal is generated by spectral folding from the narrowband linear prediction residual and a filter bank is utilized to divide the excitation into four sub-bands that cover the extension band. These sub-bands are weighted such that the estimated mel spectrum is realized. Bandwidth-extended speech is obtained by summing the weighted sub-bands and the original narrowband signal. Listening tests show that this new method improves speech quality compared with narrowband telephone speech and with a previously published bandwidth extension method.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Evaluation of an Artificial Speech Bandwidth Extension Method in Three Languages

Hannu Pulakka; Laura Laaksonen; Martti Vainio; Jouni Pohjalainen; Paavo Alku

Quality and intelligibility of narrowband telephone speech can be improved by artificial bandwidth extension (ABE), which extends the speech bandwidth using only information available in the narrowband speech signal. This paper reports a three-language evaluation of an ABE method that has recently been launched in several of Nokias mobile telephone models. The method extends the speech bandwidth to frequencies above the telephone band by first utilizing spectral folding and then modifying the magnitude spectrum of the extension band with spline curves. The performance of the method was evaluated by formal listening tests in American English, Russian, and Mandarin Chinese. The results of the listening tests indicate that ABE processing improved the subjective quality of coded narrowband speech in all these languages. Differences between bandwidth-extended American English test sentences and their original wideband counterparts were also evaluated using both an objective distance measure that simulates the characteristics of human hearing and a conventional spectral distortion measure. The average objective error was calculated for different categories of speech sounds. The error was found to be smallest in nasals and semivowels and largest in fricative sounds.


Logopedics Phoniatrics Vocology | 2007

High-speed registration of phonation-related glottal area variation during artificial lengthening of the vocal tract

Anne-Maria Laukkanen; Hannu Pulakka; Paavo Alku; E. Vilkman; Stellan Hertegård; Per-Åke Lindestad; Hans Larsson; Svante Granqvist

Vocal exercises that increase the vocal tract impedance are widely used in voice training and therapy. The present study applies a versatile methodology to investigate phonation during varying artificial extension of the vocal tract. Two males and one female phonated into a hard-walled plastic tube (φ 2 cm), whose physical length was randomly pair-wise changed between 30 cm, 60 cm and 100 cm. High-speed image (1900 f/sec) sequences of the vocal folds were obtained via a rigid endoscope. Acoustic and electroglottographic signals (EGG) were recorded. Oral pressure during shuttering of the tube was used to give an estimate of subglottic pressure (Psub). The only trend observed was that with the two longer tubes compared to the shortest one, fundamental frequency was lower, open time of the glottis shorter, and Psub higher. The results may partly reflect increased vocal tract impedance as such and partly the increased vocal effort to compensate for it. In other parameters there were individual differences in tube length-related changes, suggesting complexity of the coupling between supraglottic space and the glottis.


international conference on acoustics, speech, and signal processing | 2011

Speech bandwidth extension using Gaussian mixture model-based estimation of the highband mel spectrum

Hannu Pulakka; Ulpu Remes; Kalle J. Palomäki; Mikko Kurimo; Paavo Alku

The quality and intelligibility of narrowband telephone speech can be enhanced by artifical bandwidth extension. This study combines Gaussian mixture model-based (GMM) mel spectrum extension with a filter bank implementation for generating the missing spectral content in the highband at 4–8 kHz. The narrowband mel spectrum is calculated from input speech and the GMM is used to estimate the mel spectrum in the highband. An excitation signal for the highband is generated as a combination of upsampled linear prediction residual and modulated noise. The excitation is divided into sub-bands that are weighted and summed to realize the estimated mel spectrum. The bandwidth-extended output is obtained as the sum of the artificial highband signal and narrowband speech. Listening tests indicate that this method is preferred over narrowband speech and over a previously presented artificial bandwidth extension method which is implemented in some mobile phone models.


IEEE Transactions on Consumer Electronics | 2009

Development, evaluation and implementation of an artificial bandwidth extension method of telephone speech in mobile terminal

Laura Laaksonen; Hannu Pulakka; Ville Myllylä; Paavo Alku

Artificial bandwidth extension methods aim to improve the quality and intelligibility of narrowband telephone speech by adding new, artificially generated spectral content to the highband of the received voice signal. The development cycle of an artificial bandwidth extension method from the initial idea to the implementation in a mobile terminal is discussed in this paper. Developing the algorithm in the Matlab environment was the first step in the process. The method was then evaluated in formal listening tests and simulations to verify its performance in different scenarios. Finally, the utilization of this technology in a product included its DSP implementation combined with the acoustical design of the user terminal.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Bandwidth Extension of Telephone Speech to Low Frequencies Using Sinusoidal Synthesis and a Gaussian Mixture Model

Hannu Pulakka; Ulpu Remes; Santeri Yrttiaho; Kalle J. Palomäki; Mikko Kurimo; Paavo Alku

The quality of narrowband telephone speech is degraded by the limited audio bandwidth. This paper describes a method that extends the bandwidth of telephone speech to the frequency range 0-300 Hz. The method generates the lowest harmonics of voiced speech using sinusoidal synthesis. The energy in the extension band is estimated from spectral features using a Gaussian mixture model. The amplitudes and phases of the synthesized sinusoidal components are adjusted based on the amplitudes and phases of the narrowband input speech, which provides adaptivity to varying input bandwidth characteristics. The proposed method was evaluated with listening tests in combination with another bandwidth extension method for the frequency range 4-8 kHz. While the low-frequency bandwidth extension was not found to improve perceived quality, the method reduced dissimilarity with wideband speech.


Journal of the Acoustical Society of America | 2012

Signal-to-noise ratio adaptive post-filtering method for intelligibility enhancement of telephone speech.

Emma Jokinen; Santeri Yrttiaho; Hannu Pulakka; Martti Vainio; Paavo Alku

Post-filtering can be utilized to improve the quality and intelligibility of telephone speech. Previous studies have shown that energy reallocation with a high-pass type filter works effectively in improving the intelligibility of speech in difficult noise conditions. The present study introduces a signal-to-noise ratio adaptive post-filtering method that utilizes energy reallocation to transfer energy from the first formant to higher frequencies. The proposed method adapts to the level of the background noise so that, in favorable noise conditions, the post-filter has a flat frequency response and the effect of the post-filtering is increased as the level of the ambient noise increases. The performance of the proposed method is compared with a similar post-filtering algorithm and unprocessed speech in subjective listening tests which evaluate both intelligibility and listener preference. The results indicate that both of the post-filtering methods maintain the quality of speech in negligible noise conditions and are able to provide intelligibility improvement over unprocessed speech in adverse noise conditions. Furthermore, the proposed post-filtering algorithm performs better than the other post-filtering method under evaluation in moderate to difficult noise conditions, where intelligibility improvement is mostly required.


international conference on acoustics, speech, and signal processing | 2016

A subjective listening test of six different artificial bandwidth extension approaches in English, Chinese, German, and Korean

Johannes Abel; Magdalena Kaniewska; Cyril Guillaume; Wouter Tirry; Hannu Pulakka; Ville Myllylä; Jari Sjoberg; Paavo Alku; Itai Katsir; David Malah; Israel Cohen; M. A. Tugtekin Turan; Engin Erzin; Thomas Schlien; Peter Vary; Amr H. Nour-Eldin; Peter Kabal; Tim Fingscheidt

In studies on artificial bandwidth extension (ABE), there is a lack of international coordination in subjective tests between multiple methods and languages. Here we present the design of absolute category rating listening tests evaluating 12 ABE variants of six approaches in multiple languages, namely in American English, Chinese, German, and Korean. Since the number of ABE variants caused a higher-than-recommended length of the listening test, ABE variants were distributed into two separate listening tests per language. The paper focuses on the listening test design, which aimed at merging the subjective scores of both tests and thus allows for a joint analysis of all ABE variants under test at once. A language-dependent analysis, evaluating ABE variants in the context of the underlying coded narrowband speech condition showed statistical significant improvement in English, German, and Korean for some ABE solutions.

Collaboration


Dive into the Hannu Pulakka's collaboration.

Top Co-Authors

Avatar

Ulpu Remes

Helsinki University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Antti Suni

University of Helsinki

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge