Marko Takanen
Aalto University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marko Takanen.
Hearing Research | 2014
Marko Takanen; Olli Santala; Ville Pulkki
The count-comparison principle in binaural auditory modeling is based on the assumption that there are nuclei in the mammalian auditory pathway that encode the directional cues in the rate of the output. When this principle is applied, the outputs of the modeled nuclei do not directly result in a topographically organized map of the auditory space that could be monitored as such. Therefore, this article presents a method for visualizing the information from the outputs as well as the nucleus models. The functionality of the auditory model presented here is tested in various binaural listening scenarios, including localization tasks and the discrimination of a target in the presence of distracting sound as well as sound scenarios consisting of multiple simultaneous sound sources. The performance of the model is illustrated with binaural activity maps. The activations seen in the maps are compared to human performance in similar scenarios, and it is shown that the performance of the model is in accordance with the psychoacoustical data.
Computer Speech & Language | 2014
Emma Jokinen; Marko Takanen; Martti Vainio; Paavo Alku
Post-filtering can be used in mobile communications to improve the quality and intelligibility of speech. Energy reallocation with a high-pass type filter has been shown to work effectively in improving the intelligibility of speech in difficult noise conditions. This paper introduces a post-filtering algorithm that adapts to the background noise level as well as to the fundamental frequency of the speaker and models the spectral effects observed in natural Lombard speech. The introduced method and another post-filtering technique were compared to unprocessed telephone speech in subjective listening tests in terms of intelligibility and quality. The results indicate that the proposed method outperforms the reference method in difficult noise conditions.
international workshop on acoustic signal enhancement | 2014
Emma Jokinen; Ulpu Remes; Marko Takanen; Kalle J. Palomäki; Mikko Kurimo; Paavo Alku
Post-processing methods are used in mobile communications to improve the intelligibility of speech in adverse background noise conditions. In this study, post-processing based on the modification of the spectral tilt with Gaussian mixture models according to the Lombard effect is investigated. A spectral envelope estimation method is studied and optimized for this purpose. Furthermore, the extrapolation of the statistical mapping in a post-processing context is investigated. The proposed post-processing methods are compared to unprocessed speech and a reference method in subjective intelligibility and quality tests in different near-end noise conditions. The results indicate that one of the extrapolated methods achieved the same intelligibility as fixed high-pass filtering without degrading the quality of speech.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Vesa Välimäki; Heidi-Maria Lehtonen; Marko Takanen
This paper investigates sparse noise sequences, including the previously proposed velvet noise and its novel variants defined here. All sequences consist of sample values minus one, zero, and plus one only, and the location and the sign of each impulse is randomly chosen. Two of the proposed algorithms are direct variants of the original velvet noise requiring two random number sequences for determining the impulse locations and signs. In one of the proposed algorithms the impulse locations and signs are drawn from the same random number sequence, which is advantageous in terms of implementation. Moreover, two of the new sequences include known regions of zeros. The perceived smoothness of the proposed sequences was studied with a listening test in which test subjects compared the noise sequences against a reference signal that was a Gaussian white noise. The results show that the original velvet noise sounds smoother than the reference at 2000 impulses per second. At 4000 impulses per second, also three of the proposed algorithms are perceived smoother than the Gaussian noise sequence. These observations can be exploited in the synthesis of noisy sounds and in artificial reverberation.
Hearing Research | 2015
Nelli H. Salminen; Alessandro Altoè; Marko Takanen; Olli Santala; Ville Pulkki
Human sound source localization relies on various acoustical cues one of the most important being the interaural time difference (ITD). ITD is best detected in the fine structure of low-frequency sounds but it may also contribute to spatial hearing at higher frequencies if extracted from the sound envelope. The human brain mechanisms related to this envelope ITD cue remain unexplored. Here, we tested the sensitivity of the human auditory cortex to envelope ITD in magnetoencephalography (MEG) recordings. We found two types of sensitivity to envelope ITD. First, the amplitude of the auditory cortical N1m response was smaller for zero envelope ITD than for long envelope ITDs corresponding to the sound being in opposite phase in the two ears. Second, the N1m response amplitude showed ITD-specific adaptation for both fine-structure and for envelope ITD. The auditory cortical sensitivity was weaker for envelope ITD in high-frequency sounds than for fine-structure ITD in low-frequency sounds but occurred within a range of ITDs that are encountered in natural conditions. Finally, the participants were briefly tested for their behavioral ability to detect envelope ITD. Interestingly, we found a correlation between the behavioral performance and the neural sensitivity to envelope ITD. In conclusion, our findings show that the human auditory cortex is sensitive to ITD in the envelope of high-frequency sounds and this sensitivity may have behavioral relevance.
Archive | 2013
Marko Takanen; Olli Santala; Ville Pulkki
In parametric time-frequency-domain spatial audio techniques, the sound field is encoded as a combination of a few audio channels with metadata. The metadata parametrizes the spatial properties of the sound field that are known to be perceivable to humans. The most well-known techniques are reviewed in this chapter. The spatial artifacts specific to such techniques are described, such as dynamically or statically biased directions, spatially too narrow auditory images, and effects of off-sweet-spot listening. Such cases are analyzed with a binaural auditory model, and it is shown that the artifacts are clearly visualized thereby.
international conference on acoustics, speech, and signal processing | 2012
Tuomo Raitio; Marko Takanen; Olli Santala; Antti Suni; Martti Vainio; Paavo Alku
Assessing the intelligibility of synthetic speech is important in creating synthetic voices to be used in real life applications, especially for the ones involving interfering noise. This raises the question how to measure the intelligibility of synthetic speech to correctly simulate such conditions. Conventionally, this has been done using a simple listening test setup where diotic speech and noise are played to both ears with headphones. This is indeed very different from the real noise environment where speech and noise are spatially distributed. This paper addresses the question whether a realistic noise environment should be used to test the intelligibility of synthetic speech. Three different test conditions, one with multichannel reproduction of noise and speech, and two headphone setups are evaluated. Tests are performed with natural and synthetic speech, including speech especially intended for noisy conditions. The results indicate a general trend in all setups but also some interesting differences.
international conference on acoustics, speech, and signal processing | 2014
Emma Jokinen; Marko Takanen; Paavo Alku
Post-processing methods can be used in mobile communications to improve the intelligibility of speech in adverse background noise conditions. This study addresses the improved intelligibility and the speech quality achieved with a well-known approach, dynamic range compression, by comparing it to two other real-time postprocessing methods based on energy reallocation. In addition, the effects of utilizing amplitude normalization instead of energy normalization on the performance of the post-processing methods are investigated. The evaluations were conducted using subjective tests in several background noise conditions. The results indicate that the two energy reallocating approaches outperform dynamic range compression both in intelligibility and quality and that amplitude normalization causes the performance of the tested post-processing methods to degrade in some conditions.
Archive | 2011
Olli Santala; Marko Takanen; Ville Pulkki
Audio Engineering Society Conference: 45th International Conference: Applications of Time-Frequency Processing in Audio | 2012
Marko Takanen; Marko Hiipakka; Ville Pulkki