Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Miikka Vilermo is active.

Publication


Featured researches published by Miikka Vilermo.


acm multimedia | 2001

A compressed domain beat detector using MP3 audio bitstreams

Ye Wang; Miikka Vilermo

This paper presents a novel beat detector that processes MPEG-1 Layer III (known as MP3) encoded audio bitstreams directly in the compressed domain. Most previous beat detection or tracking systems dealing with MIDI or PCM signals are not directly applicable to compressed audio bitstreams, such as MP3 bitstreams. We have developed the beat detector as a part of a beat-pattern based error concealment scheme for streaming music over error prone channels. Special effort was used to obtain a tailored trade-off between performance, complexity and memory consumption for this specific application. A comparison between the machine-detected results to the human annotation has shown that the proposed method correctly tracked beats in 4 out of 6 popular music test signals. The results were analyzed.


workshop on applications of signal processing to audio and acoustics | 2011

Multichannel audio upmixing based on non-negative tensor factorization representation

Joonas Nikunen; Tuomas Virtanen; Miikka Vilermo

This paper proposes a new spatial audio coding (SAC) method that is based on parametrization of multichannel audio by sound objects using non-negative tensor factorization (NTF). The spatial parameters are estimated using perceptually motivated NTF model and are used for upmixing a downmixed and encoded mixture signal. The performance of the proposed coding is evaluated using listening tests, which prove the coding performance being on a par with conventional SAC methods. The novelty of the proposed coding is that it enables controlling the upmix content by meaningful objects.


international conference on signal processing | 2000

On the relationship between MDCT, SDPT and DFT

Ye Wang; Leonid Yaroslavsky; Miikka Vilermo

The modified discrete cosine transform (MDCT) has emerged as a dominant time-frequency decomposition method in high quality audio compression. The MDCT is a special case of the lapped transforms (LTs) with 50% overlap. This paper establishes the relationship between the MDCT and shifted discrete Fourier transform (SDFT). The analysis provides insight into the following issues: (1) the relationship between MDCT, shifted DFT (SDFT) and DFT, (2) characteristics of the MDCT in the time and frequency domain, (3) the concept of time domain aliasing cancellation (TDAC).


international conference on signal processing | 2000

Some peculiar properties of the MDCT

Ye Wang; Leonid Yaroslavsky; Miikka Vilermo; Mauri Väänänen

Having established the interconnection between MDCT, SDFT (shifted discrete Fourier transform) and DFT, we have successfully applied the results in audio encoder design. This paper presents some new observations and analyses: (1) MDCT is not an orthogonal transform and this has an impact on audio coding, and (2) analysis of time domain alias cancellation (TDAC) during the window switching in the case of MPEG-2 AAC. Finally, we report some experimental results on the energy compaction properties of MDCT.


acm multimedia | 1999

An excitation level based psychoacoustic model for audio compression

Ye Wang; Miikka Vilermo

This paper describes an excitation level based psychoacoustic model to estimate the simultaneous masking threshold for audio coding. The system has the following stages: 1) a windowing function; 2) a time-to-frequency transformation; 3) an excitation level calculation block similar to that in Moore and Glasbergs loudness model; 4) a correction factor for estimating masking threshold; 5) the inclusion of the absolute masking threshold; 6) the output Signal-to-Masking ratio. We have evaluated the performance by integrating the proposed psychoacoustic model into an audio coder similar to MPEG-2 AAC, which contains only the basic coding tools. Our model performs better than or as well as the psychoacoustic model suggested in the MPEG-2 AAC audio coding standard for all the test signals. We can achieve almost transparent quality with bitrate below 64 kbps for most of the critical test signals. Significant improvements have been achieved with speech signals, which are always difficult for transform audio coders.


Speech Communication | 2016

Binaural rendering of microphone array captures based on source separation

Joonas Nikunen; Aleksandr Diment; Tuomas Virtanen; Miikka Vilermo

A method for binaural rendering of sound scene recordings is proposed.Source signals and their direction of arrival is estimated using a microphone array.A low-rank NMF model for separation of sound sources is used.Speech intelligibility test with overlapping speech is conducted.The speech intelligibility by binaural processing is shown to increase over stereo. This paper proposes a method for binaural reconstruction of a sound scene captured with a portable-sized array consisting of several microphones. The proposed processing is separating the scene into a sum of small number of sources, and the spectrogram of each of them is in turn represented as a small number of latent components. The direction of arrival (DOA) of each source is estimated, which is followed by binaural rendering of each source at its estimated direction. For representing the sources, the proposed method uses low-rank complex-valued non-negative matrix factorization combined with DOA-based spatial covariance matrix model. The binaural reconstruction is achieved by applying the binaural cues (head-related transfer function) associated with the estimated source DOA to the separated source signals. The binaural rendering quality of the proposed method was evaluated using a speech intelligibility test. The test results indicated that the proposed binaural rendering was able to improve the intelligibility of speech over stereo recordings and separation by minimum variance distortionless response beamformer with the same binaural synthesis in a three-speaker scenario. An additional listening test evaluating the subjective quality of the rendered output indicates no added processing artifacts by the proposed method in comparison to unprocessed stereo recording.


international conference on acoustics, speech, and signal processing | 2010

Parametric binaural audio coding

Pasi Ojala; Mikko Tammi; Miikka Vilermo

A spatial audio scene consists of discrete audio sources and ambience. The 3D audio image is observed due to the directional sounds, but even more important is the reverberation and so called room effect caused by the properties of the space, the sources and listener location. It is obvious that a human being is able to capture the 3D image using the signal from left and right ear. Hence, two audio channels are sufficient to represent the spatial audio image for the listener. An efficient transmission and representation of a spatial audio image using two channels requires a specific coding and rendering algorithm for the audio content. In this paper we present novel mechanisms to efficiently parameterize, quantize and represent the spatial audio signal.


Archive | 2003

Efficiency improvements in scalable audio coding

Sebastian Streich; Miikka Vilermo


Archive | 2016

METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR INPUT DETECTION

Miikka Vilermo; Koray Ozcan


Journal of The Audio Engineering Society | 2003

Modified Discrete Cosine Transform: Its Implications for Audio Coding and Error Concealment

Ye Wang; Miikka Vilermo

Collaboration


Dive into the Miikka Vilermo's collaboration.

Researchain Logo
Decentralizing Knowledge