Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joonas Nikunen is active.

Publication


Featured researches published by Joonas Nikunen.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation

Joonas Nikunen; Tuomas Virtanen

This paper addresses the problem of sound source separation from a multichannel microphone array capture via estimation of source spatial covariance matrix (SCM) of a short-time Fourier transformed mixture signal. In many conventional audio separation algorithms the source mixing parameter estimation is done separately for each frequency thus making them prone to errors and leading to suboptimal source estimates. In this paper we propose a SCM model which consists of a weighted sum of direction of arrival (DoA) kernels and estimate only the weights dependent on the source directions. In the proposed algorithm, the spatial properties of the sources become jointly optimized over all frequencies, leading to more coherent source estimates and mitigating the effect of spatial aliasing at high frequencies. The proposed SCM model is combined with a linear model for magnitudes and the parameter estimation is formulated in a complex-valued non-negative matrix factorization (CNMF) framework. Simulations consist of recordings done with a hand-held device sized array having multiple microphones embedded inside the device casing. Separation quality of the proposed algorithm is shown to exceed the performance of existing state of the art separation methods with two sources when evaluated by objective separation quality metrics.


workshop on applications of signal processing to audio and acoustics | 2011

Multichannel audio upmixing based on non-negative tensor factorization representation

Joonas Nikunen; Tuomas Virtanen; Miikka Vilermo

This paper proposes a new spatial audio coding (SAC) method that is based on parametrization of multichannel audio by sound objects using non-negative tensor factorization (NTF). The spatial parameters are estimated using perceptually motivated NTF model and are used for upmixing a downmixed and encoded mixture signal. The performance of the proposed coding is evaluated using listening tests, which prove the coding performance being on a par with conventional SAC methods. The novelty of the proposed coding is that it enables controlling the upmix content by meaningful objects.


Speech Communication | 2015

Distant speech separation using predicted time-frequency masks from spatial features

Pasi Pertilä; Joonas Nikunen

A neural network for time-frequency mask prediction is proposed.The network is trained using simulated speech to produce naturally occurring masks.After a post-processing stage, the mask is used as a post-filter of a beamformer.Speech mixtures recorded with a circular array in two rooms are separated.The method shows best intelligibility and SNR values compared to contrast methods. Speech separation algorithms are faced with a difficult task of producing high degree of separation without containing unwanted artifacts. The time-frequency (T-F) masking technique applies a real-valued (or binary) mask on top of the signals spectrum to filter out unwanted components. The practical difficulty lies in the mask estimation. Often, using efficient masks engineered for separation performance leads to presence of unwanted musical noise artifacts in the separated signal. This lowers the perceptual quality and intelligibility of the output.Microphone arrays have been long studied for processing of distant speech. This work uses a feed-forward neural network for mapping microphone arrays spatial features into a T-F mask. Wiener filter is used as a desired mask for training the neural network using speech examples in simulated setting. The T-F masks predicted by the neural network are combined to obtain an enhanced separation mask that exploits the information regarding interference between all sources. The final mask is applied to the delay-and-sum beamformer (DSB) output.The algorithms objective separation capability in conjunction with the separated speech intelligibility is tested with recorded speech from distant talkers in two rooms from two distances. The results show improvement in instrumental measure for intelligibility and frequency-weighted SNR over complex-valued non-negative matrix factorization (CNMF) source separation approach, spatial sound source separation, and conventional beamforming methods such as the DSB and minimum variance distortionless response (MVDR).


international conference on acoustics, speech, and signal processing | 2010

Noise-to-mask ratio minimization by weighted non-negative matrix factorization

Joonas Nikunen; Tuomas Virtanen

This paper proposes a novel algorithm for minimizing the perceptual distortion in non-negative matrix factorization (NMF) based audio representation. We formulate the noise-to-mask ratio audio quality criterion in a form where it can be used in NMF and propose an algorithm for optimizing the criterion. We also propose a method for compensating the spreading of the representation error in the synthesis filterbank. The objective perceptual quality produced by the proposed method is found to outperform all the reference methods. We also study the trade-off between the window length and the rank of factorization with a fixed data rate, and find that the best performance is obtained with window lengths between 10 and 30 ms.


international conference on acoustics, speech, and signal processing | 2014

Multichannel audio separation by direction of arrival based spatial covariance model and non-negative matrix factorization

Joonas Nikunen; Tuomas Virtanen

This paper studies multichannel audio separation using non-negative matrix factorization (NMF) combined with a new model for spatial covariance matrices (SCM). The proposed model for SCMs is parameterized by source direction of arrival (DoA) and its parameters can be optimized to yield a spatially coherent solution over frequencies thus avoiding permutation ambiguity and spatial aliasing. The model constrains the estimation of SCMs to a set of geometrically possible solutions. Additionally we present a method for using a priori DoA information of the sources extracted blindly from the mixture for the initialization of the parameters of the proposed model. The simulations show that the proposed algorithm exceeds the separation quality of existing spatial separation methods.


IEEE Transactions on Audio, Speech, and Language Processing | 2018

Separation of Moving Sound Sources Using Multichannel NMF and Acoustic Tracking

Joonas Nikunen; Aleksandr Diment; Tuomas Virtanen

In this paper, we propose a method for separation of moving sound sources. The method is based on first tracking the sources and then estimation of source spectrograms using multichannel nonnegative matrix factorization (NMF) and extracting the sources from the mixture by single-channel Wiener filtering. We propose a novel multichannel NMF model with time-varying mixing of the sources denoted by spatial covariance matrices (SCM) and provide update equations for optimizing model parameters minimizing squared Frobenius norm. The SCMs of the model are obtained based on estimated directions of arrival of tracked sources at each time frame. The evaluation is based on established objective separation criteria and using real recordings of two and three simultaneous moving sound sources. The compared methods include conventional beamforming and ideal ratio mask separation. The proposed method is shown to exceed the separation quality of other evaluated blind approaches according to all measured quantities. Additionally, we evaluate the methods susceptibility toward tracking errors by comparing the separation quality achieved using annotated ground truth source trajectories.


Speech Communication | 2016

Binaural rendering of microphone array captures based on source separation

Joonas Nikunen; Aleksandr Diment; Tuomas Virtanen; Miikka Vilermo

A method for binaural rendering of sound scene recordings is proposed.Source signals and their direction of arrival is estimated using a microphone array.A low-rank NMF model for separation of sound sources is used.Speech intelligibility test with overlapping speech is conducted.The speech intelligibility by binaural processing is shown to increase over stereo. This paper proposes a method for binaural reconstruction of a sound scene captured with a portable-sized array consisting of several microphones. The proposed processing is separating the scene into a sum of small number of sources, and the spectrogram of each of them is in turn represented as a small number of latent components. The direction of arrival (DOA) of each source is estimated, which is followed by binaural rendering of each source at its estimated direction. For representing the sources, the proposed method uses low-rank complex-valued non-negative matrix factorization combined with DOA-based spatial covariance matrix model. The binaural reconstruction is achieved by applying the binaural cues (head-related transfer function) associated with the estimated source DOA to the separated source signals. The binaural rendering quality of the proposed method was evaluated using a speech intelligibility test. The test results indicated that the proposed binaural rendering was able to improve the intelligibility of speech over stereo recordings and separation by minimum variance distortionless response beamformer with the same binaural synthesis in a three-speaker scenario. An additional listening test evaluating the subjective quality of the rendered output indicates no added processing artifacts by the proposed method in comparison to unprocessed stereo recording.


european signal processing conference | 2017

Time-difference of arrival model for spherical microphone arrays and application to direction of arrival estimation

Joonas Nikunen; Tuomas Virtanen

This paper investigates different steering techniques for spherical microphone arrays and proposes a time-difference of arrival (TDOA) model for microphones on surface of a rigid sphere. The model is based on geometric interpretation of wavefront incident angle and the extra distance the wavefront needs to travel to reach microphones on the opposite side of a sphere. We evaluate the proposed model by comparing analytic TDOAs to measured TDOAs extracted from impulse responses (IR) of a rigid sphere (r = 7.5cm). The proposed method achieves over 40% relative improvement in TDOA accuracy in comparison to free-field propagation and TDOAs extracted from analytic IRs of a spherical microphone array provide an additional 10% improvement. We test the proposed model for the application of source direction of arrival (DOA) estimation using steered response power (SRP) with real reverberant recordings of moving speech sources. All tested methods perform equally well in noise-free scenario, while the proposed model and simulated IRs improve over free-field assumption in low SNR conditions. The proposed model has the benefit of only using single delay for steering the array.


Journal of The Audio Engineering Society | 2010

Object-Based Audio Coding Using Non-Negative Matrix Factorization for the Spectrogram Representation

Joonas Nikunen; Tuomas Virtanen


conference of the international speech communication association | 2014

Microphone array post-filtering using supervised machine learning for speech enhancement.

Pasi Pertilä; Joonas Nikunen

Collaboration


Dive into the Joonas Nikunen's collaboration.

Top Co-Authors

Avatar

Tuomas Virtanen

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Pasi Pertilä

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Aleksandr Diment

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gaurav Naithani

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Julio Jose Carabias-Orti

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Sharath Adavanne

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge