Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Emma Jokinen is active.

Publication


Featured researches published by Emma Jokinen.


Computer Speech & Language | 2014

An adaptive post-filtering method producing an artificial Lombard-like effect for intelligibility enhancement of narrowband telephone speech

Emma Jokinen; Marko Takanen; Martti Vainio; Paavo Alku

Post-filtering can be used in mobile communications to improve the quality and intelligibility of speech. Energy reallocation with a high-pass type filter has been shown to work effectively in improving the intelligibility of speech in difficult noise conditions. This paper introduces a post-filtering algorithm that adapts to the background noise level as well as to the fundamental frequency of the speaker and models the spectral effects observed in natural Lombard speech. The introduced method and another post-filtering technique were compared to unprocessed telephone speech in subjective listening tests in terms of intelligibility and quality. The results indicate that the proposed method outperforms the reference method in difficult noise conditions.


Journal of the Acoustical Society of America | 2012

Signal-to-noise ratio adaptive post-filtering method for intelligibility enhancement of telephone speech.

Emma Jokinen; Santeri Yrttiaho; Hannu Pulakka; Martti Vainio; Paavo Alku

Post-filtering can be utilized to improve the quality and intelligibility of telephone speech. Previous studies have shown that energy reallocation with a high-pass type filter works effectively in improving the intelligibility of speech in difficult noise conditions. The present study introduces a signal-to-noise ratio adaptive post-filtering method that utilizes energy reallocation to transfer energy from the first formant to higher frequencies. The proposed method adapts to the level of the background noise so that, in favorable noise conditions, the post-filter has a flat frequency response and the effect of the post-filtering is increased as the level of the ambient noise increases. The performance of the proposed method is compared with a similar post-filtering algorithm and unprocessed speech in subjective listening tests which evaluate both intelligibility and listener preference. The results indicate that both of the post-filtering methods maintain the quality of speech in negligible noise conditions and are able to provide intelligibility improvement over unprocessed speech in adverse noise conditions. Furthermore, the proposed post-filtering algorithm performs better than the other post-filtering method under evaluation in moderate to difficult noise conditions, where intelligibility improvement is mostly required.


international workshop on acoustic signal enhancement | 2014

Spectral tilt modelling with extrapolated GMMs for intelligibility enhancement of narrowband telephone speech

Emma Jokinen; Ulpu Remes; Marko Takanen; Kalle J. Palomäki; Mikko Kurimo; Paavo Alku

Post-processing methods are used in mobile communications to improve the intelligibility of speech in adverse background noise conditions. In this study, post-processing based on the modification of the spectral tilt with Gaussian mixture models according to the Lombard effect is investigated. A spectral envelope estimation method is studied and optimized for this purpose. Furthermore, the extrapolation of the statistical mapping in a post-processing context is investigated. The proposed post-processing methods are compared to unprocessed speech and a reference method in subjective intelligibility and quality tests in different near-end noise conditions. The results indicate that one of the extrapolated methods achieved the same intelligibility as fixed high-pass filtering without degrading the quality of speech.


Journal of the Acoustical Society of America | 2017

Estimating the spectral tilt of the glottal source from telephone speech using a deep neural network

Emma Jokinen; Paavo Alku

Estimation of the spectral tilt of the glottal source has several applications in speech analysis and modification. However, direct estimation of the tilt from telephone speech is challenging due to vocal tract resonances and distortion caused by speech compression. In this study, a deep neural network is used for the tilt estimation from telephone speech by training the network with tilt estimates computed by glottal inverse filtering. An objective evaluation shows that the proposed technique gives more accurate estimates for the spectral tilt than previously used techniques that estimate the tilt directly from telephone speech without glottal inverse filtering.


conference of the international speech communication association | 2016

The use of read versus conversational Lombard speech in spectral tilt modeling for intelligibility enhancement in near-end noise conditions

Emma Jokinen; Ulpu Remes; Paavo Alku

Intelligibility of speech in adverse near-end noise conditions can be enhanced with post-processing. Recently, a postprocessing method based on statistical mapping of the spectral tilt of normal speech to that of Lombard speech was proposed. However, previous intelligibility improvement studies utilizing Lombard speech have mainly gathered data from read sentences which might result in a less pronounced Lombard effect. Having a mild Lombard effect in the training data weakens the statistical normal-to-Lombard mapping of the spectral tilt which in turn deteriorates performance of intelligibility enhancement. Therefore, a database containing both conversational and read Lombard speech was recorded in several background noise conditions in this study. Statistical models for normal-to-Lombard mapping of the spectral tilt were then trained using the obtained conversational and read speech data and evaluated using an objective intelligibility metric. The results suggest that the conversational data contains a more pronounced Lombard effect and could be used to obtain better statistical models for intelligibility enhancement.


Speech Communication | 2016

Phase modification for increasing the intelligibility of telephone speech in near-end noise conditions - evaluation of two methods

Emma Jokinen; Hannu Pulakka; Paavo Alku

In this study, two intelligibility-increasing post-processing methods based on the modification of the phase spectrum of speech are proposed for near-end noise conditions. One of the algorithms aims to reduce the dynamic range of the signal and take advantage of the energy gain resulting from amplitude normalization to increase the loudness, while the other algorithm is designed to sharpen the high-amplitude peaks in the time-domain signal generated by the periodic glottal excitation to make the speech sound more clear. Both methods are based on first modifying only the phase spectrum, after which the time-domain signal is computed using the inverse Fourier transform. Finally, the time-domain signal is amplitude normalized by scaling its sample values so that they occupy the original amplitude range of the processed frame. The performance of the proposed methods was evaluated by first comparing them to unprocessed speech using objective quality measures as well as subjective loudness and listening preference tests. Based on the results of these evaluations, the phase-modification methods were further compared to unprocessed speech and dynamic range compression using subjective word-error rate and quality tests. Both narrowband and wideband speech from several talkers were included in both evaluations. Both of the methods were able to increase loudness in some bandwidth conditions as well as outperform unprocessed speech and dynamic range compression in terms of intelligibility in high-noise levels. Both of the methods were rated lower in quality than unprocessed speech in clean conditions. In background noise, however, where intelligibility enhancement algorithms are mostly used, both methods achieved similar results to unprocessed speech in terms of listening preference in some of the bandwidth conditions tested.


NeuroImage | 2016

Previous exposure to intact speech increases intelligibility of its digitally degraded counterpart as a function of stimulus complexity

Maria Hakonen; Patrick J. C. May; Jussi Alho; Paavo Alku; Emma Jokinen; Iiro P. Jääskeläinen; Hannu Tiitinen

Recent studies have shown that acoustically distorted sentences can be perceived as either unintelligible or intelligible depending on whether one has previously been exposed to the undistorted, intelligible versions of the sentences. This allows studying processes specifically related to speech intelligibility since any change between the responses to the distorted stimuli before and after the presentation of their undistorted counterparts cannot be attributed to acoustic variability but, rather, to the successful mapping of sensory information onto memory representations. To estimate how the complexity of the message is reflected in speech comprehension, we applied this rapid change in perception to behavioral and magnetoencephalography (MEG) experiments using vowels, words and sentences. In the experiments, stimuli were initially presented to the subject in a distorted form, after which undistorted versions of the stimuli were presented. Finally, the original distorted stimuli were presented once more. The resulting increase in intelligibility observed for the second presentation of the distorted stimuli depended on the complexity of the stimulus: vowels remained unintelligible (behaviorally measured intelligibility 27%) whereas the intelligibility of the words increased from 19% to 45% and that of the sentences from 31% to 65%. This increase in the intelligibility of the degraded stimuli was reflected as an enhancement of activity in the auditory cortex and surrounding areas at early latencies of 130-160ms. In the same regions, increasing stimulus complexity attenuated mean currents at latencies of 130-160ms whereas at latencies of 200-270ms the mean currents increased. These modulations in cortical activity may reflect feedback from top-down mechanisms enhancing the extraction of information from speech. The behavioral results suggest that memory-driven expectancies can have a significant effect on speech comprehension, especially in acoustically adverse conditions where the bottom-up information is decreased.


Computer Speech & Language | 2019

Vocal effort compensation for MFCC feature extraction in a shouted versus normal speaker recognition task

Emma Jokinen; Rahim Saeidi; Tomi Kinnunen; Paavo Alku

Abstract In shouting, speakers use increased vocal effort to convey spoken messages over distance or above environmental noise. For automatic speaker recognition systems trained using normal speech, shouting causes a severe vocal effort mismatch between the enrollment and test hence reducing the recognition performance. In this study, two compensation methods are proposed to tackle the mismatch in a shouted versus normal speaker recognition task. These techniques are applied in the feature extraction stage of a speaker recognition system to modify the spectral envelopes of shouts to be closer to those in normal speech. The techniques modify the all-pole power spectrum of the MFCC computation chain with shouted-to-normal compensation filtering that is obtained using a GMM-based statistical mapping. In an evaluation using the state-of-the-art i-vector based recognition system, the proposed techniques provided considerable improvements in identification rates compared to the case when shouted speech spectra were not processed.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Intelligibility Enhancement of Telephone Speech Using Gaussian Process Regression for Normal-to-Lombard Spectral Tilt Conversion

Emma Jokinen; Ulpu Remes; Paavo Alku

Noise in the environment can decrease the quality and intelligibility of a telephone conversation. This study focuses on the intelligibility enhancement of narrowband telephone speech in a near-end noise scenario using a postprocessing method based on normal-to-Lombard spectral tilt conversion. The proposed technique uses nonparallel, conversational normal, and Lombard speech together with Gaussian process regression in order to mimic the flattening of the spectral tilt that occurs in the production of natural speech in noisy conditions. The performance of the proposed method was evaluated in comparison to two reference methods, a fixed high-pass filter, and a baseline spectral tilt conversion, as well as in comparison to unprocessed speech in terms of intelligibility and listening preference in noisy conditions and in terms of pressedness in silent conditions. The results indicate that while the proposed technique provides a similar benefit in terms of intelligibility as fixed high-pass filtering, it is also able to produce a notable increase in pressedness. This suggests that the developed processing of the spectral tilt can compete with fixed high-pass filtering in intelligibility enhancement, but it is also able to convert speech to become perceptually closer to natural Lombard speech.


Brain and behavior | 2017

Predictive processing increases intelligibility of acoustically distorted speech: Behavioral and neural correlates

Maria Hakonen; Patrick J. C. May; Iiro P. Jääskeläinen; Emma Jokinen; Mikko Sams; Hannu Tiitinen

We examined which brain areas are involved in the comprehension of acoustically distorted speech using an experimental paradigm where the same distorted sentence can be perceived at different levels of intelligibility. This change in intelligibility occurs via a single intervening presentation of the intact version of the sentence, and the effect lasts at least on the order of minutes. Since the acoustic structure of the distorted stimulus is kept fixed and only intelligibility is varied, this allows one to study brain activity related to speech comprehension specifically.

Collaboration


Dive into the Emma Jokinen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marko Takanen

Technische Universität München

View shared research outputs
Researchain Logo
Decentralizing Knowledge