Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kalle J. Palomäki is active.

Publication


Featured researches published by Kalle J. Palomäki.


Speech Communication | 2004

A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation

Kalle J. Palomäki; Guy J. Brown; DeLiang Wang

In this study we describe a binaural auditory model for recognition of speech in the presence of spatially separated noise intrusions, under small-room reverberation conditions. The principle underlying the model is to identify time–frequency regions which constitute reliable evidence of the speech signal. This is achieved both by determining the spatial location of the speech source, and by grouping the reliable regions according to common azimuth. Reliable time–frequency regions are passed to a missing data speech recogniser, which performs decoding based on this partial description of the speech signal. In order to obtain robust estimates of spatial location in reverberant conditions, we incorporate some aspects of precedence effect processing into the auditory model. We show that the binaural auditory model improves speech recognition performance in small room reverberation conditions in the presence of spatially separated noise, particularly for conditions in which the spatial separation is 20� or larger. We also demonstrate that the binaural system outperforms a single channel approach, notably in cases where the target speech and noise intrusion have substantial spectral overlap. � 2004 Elsevier B.V. All rights reserved.


Speech Communication | 2004

Techniques for handling convolutional distortion with `missing data' automatic speech recognition

Kalle J. Palomäki; Guy J. Brown; Jon Barker

In this study we describe two techniques for handling convolutional distortion with ‘missing data’ speech recognition using spectral features. The missing data approach to automatic speech recognition (ASR) is motivated by a model of human speech perception, and involves the modification of a hidden Markov model (HMM) classifier to deal with missing or unreliable features. Although the missing data paradigm was proposed as a means of handling additive noise in ASR, we demonstrate that it can also be effective in dealing with convolutional distortion. Firstly, we propose a normalisation technique for handling spectral distortions and changes of input level (possibly in the presence of additive noise). The technique computes a normalising factor only from the most intense regions of the speech spectrum, which are likely to remain intact across various noise conditions. We show that the proposed normalisation method improves performance compared to a conventional missing data approach with spectrally distorted and noise contaminated speech, and in conditions where the gain of the input signal varies. Secondly, we propose a method for handling reverberated speech which attempts to identify time-frequency regions that are not badly contaminated by reverberation and have strong speech energy. This is achieved by using modulation filtering to identify ‘reliable’ regions of the speech spectrum. We demonstrate that our approach improves recognition performance in cases where the reverberation time T60 exceeds 0.7 s, compared to a baseline system which uses acoustic features derived from perceptual linear prediction and the modulation-filtered spectrogram. � 2004 Elsevier B.V. All rights reserved.


Neuroreport | 2000

Sound localization in the human brain: neuromagnetic observations

Kalle J. Palomäki; Paavo Alku; Ville Mäkinen; Patrick J. C. May; Hannu Tiitinen

&NA; Sound location processing in the human auditory cortex was studied with magnetoencephalography (MEG) by producing spatial stimuli using a modern stimulus generation methodology utilizing head‐related transfer functions (HRTFs). The stimulus set comprised wideband noise bursts filtered through HRTFs in order to produce natural spatial sounds. Neuromagnetic responses for stimuli representing eight equally spaced sound source directions in the azimuthal plane were measured from 10 subjects. The most prominent response, the cortically generated N1m, was investigated above the left and right hemisphere. We found, firstly, that the HRTF‐based stimuli presented from different directions elicited contralaterally prominent N1m responses. Secondly, we found that cortical activity reflecting the processing of spatial sound stimuli was more pronounced in the right than in the left hemisphere.


international conference on acoustics, speech, and signal processing | 2002

Missing data speech recognition in reverberant conditions

Kalle J. Palomäki; Guy J. Brown; Jon Barker

In this study we describe an auditory processing front-end for missing data speech recognition, which is robust in the presence of reverberation. The model attempts to identify time-frequency regions that are not badly contaminated by reverberation and have strong speech energy. This is achieved by applying reverberation masking. Subsequently, reliable time-frequency regions are passed to a ‘missing data’ speech recogniser for classification. We demonstrate that the model improves recognition performance in three different virtual rooms where reverberation time T60 varies from 0.7 sec to 2.7 sec. We also discuss the advantages of our approach over RASTA and modulation filtered spectrograms.


Neuroscience Letters | 2001

The Periodic Structure of Vowel Sounds is Reflected in Human Electromagnetic Brain Responses

Paavo Alku; Päivi Sivonen; Kalle J. Palomäki; Hannu Tiitinen

Periodicity, which is caused by the vibration of the vocal folds, is an inherent feature of vowel sounds. Whether this periodic structure is reflected in cerebral processing of vowels was addressed via the use of non-invasive brain research methods combined with advanced stimulus production methodology. We removed the contribution of the source of the periodic structure, the glottal excitation produced by the vocal folds, from vowel stimuli and found that electromagnetic responses generated in the auditory cortex reflect this removal. The N1(m) amplitude decreased even though the rest of the acoustical features of the stimuli were identical. Thus, we conclude that speech production mechanisms have significant effects on human brain dynamics as reflected by magnetoencephalography and electroencephalograph.


international conference on acoustics, speech, and signal processing | 2011

Speech bandwidth extension using Gaussian mixture model-based estimation of the highband mel spectrum

Hannu Pulakka; Ulpu Remes; Kalle J. Palomäki; Mikko Kurimo; Paavo Alku

The quality and intelligibility of narrowband telephone speech can be enhanced by artifical bandwidth extension. This study combines Gaussian mixture model-based (GMM) mel spectrum extension with a filter bank implementation for generating the missing spectral content in the highband at 4–8 kHz. The narrowband mel spectrum is calculated from input speech and the GMM is used to estimate the mel spectrum in the highband. An excitation signal for the highband is generated as a combination of upsampled linear prediction residual and modulated noise. The excitation is divided into sub-bands that are weighted and summed to realize the estimated mel spectrum. The bandwidth-extended output is obtained as the sum of the artificial highband signal and narrowband speech. Listening tests indicate that this method is preferred over narrowband speech and over a previously presented artificial bandwidth extension method which is implemented in some mobile phone models.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Estimating Uncertainty to Improve Exemplar-Based Feature Enhancement for Noise Robust Speech Recognition

Heikki Kallasjoki; Jort F. Gemmeke; Kalle J. Palomäki

We present a method of improving automatic speech recognition performance under noisy conditions by using a source separation approach to extract the underlying clean speech signal. The feature enhancement processing is complemented with heuristic estimates of the uncertainty of the source separation, that are used to further assist the recognition. The uncertainty heuristics are converted to estimates of variance for the extracted clean speech using a Gaussian Mixture Model based mapping, and applied in the decoding stage under the observation uncertainty framework. We propose six heuristics, and evaluate them using both artificial and real-world noisy data, and with acoustic models trained on clean speech, a multi-condition noisy data set, and the multi-condition set processed with the source separation front-end. Taking the uncertainty of the enhanced features into account is shown to improve recognition performance when the acoustic models are trained on unenhanced data, while training on enhanced noisy data yields the lowest error rates.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Bandwidth Extension of Telephone Speech to Low Frequencies Using Sinusoidal Synthesis and a Gaussian Mixture Model

Hannu Pulakka; Ulpu Remes; Santeri Yrttiaho; Kalle J. Palomäki; Mikko Kurimo; Paavo Alku

The quality of narrowband telephone speech is degraded by the limited audio bandwidth. This paper describes a method that extends the bandwidth of telephone speech to the frequency range 0-300 Hz. The method generates the lowest harmonics of voiced speech using sinusoidal synthesis. The energy in the extension band is estimated from spectral features using a Gaussian mixture model. The amplitudes and phases of the synthesized sinusoidal components are adjusted based on the amplitudes and phases of the narrowband input speech, which provides adaptivity to varying input bandwidth characteristics. The proposed method was evaluated with listening tests in combination with another bandwidth extension method for the frequency range 4-8 kHz. While the low-frequency bandwidth extension was not found to improve perceived quality, the method reduced dissimilarity with wideband speech.


international conference on acoustics, speech, and signal processing | 2006

Recognition of Reverberant Speech using Full Cepstral Features and Spectral Missing Data

Kalle J. Palomäki; Guy J. Brown; Jon Barker

We describe a novel approach to feature combination within the missing data (MD) framework for automatic speech recognition, and show its application to reverberated speech. Likelihoods from a spectral MD classifier are combined with those from a full cepstral feature vector-based recogniser. Even though the performance of the cepstral recogniser is substantially below that of the MD recogniser, the combined recogniser performs better in all conditions. We also describe improvements to the generation of time-frequency masks for the MD recogniser. Our system is compared with a previous approach based on a hybrid MLP-HMM recogniser with MSG and PLP feature vectors. The proposed system has a substantial performance advantage in the most reverberated conditions


IEEE Signal Processing Letters | 2011

Missing-Feature Reconstruction With a Bounded Nonlinear State-Space Model

Ulpu Remes; Kalle J. Palomäki; Tapani Raiko; Antti Honkela; Mikko Kurimo

Missing-feature reconstruction can improve speech recognition performance in unknown noisy environments. In this work, we examine using a nonlinear state-space model (NSSM) for missing-feature reconstruction and propose estimation with observed bounds to improve the NSSM performance. Evaluated in large-vocabulary continuous speech recognition task with babble and impulsive noise, using observed bounds in NSSM state estimation significantly improved the method performance.

Collaboration


Dive into the Kalle J. Palomäki's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Guy J. Brown

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jort F. Gemmeke

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Annamaria Mesaros

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Ville Mäkinen

Helsinki University Central Hospital

View shared research outputs
Researchain Logo
Decentralizing Knowledge