Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pasi Pertilä is active.

Publication


Featured researches published by Pasi Pertilä.


Eurasip Journal on Audio, Speech, and Music Processing | 2008

Measurement Combination for Acoustic Source Localization in a Room Environment

Pasi Pertilä; Teemu Korhonen; Ari Visa

The behavior of time delay estimation (TDE) is well understood and therefore attractive to apply in acoustic source localization (ASL). A time delay between microphones maps into a hyperbola. Furthermore, the likelihoods for different time delays are mapped into a set of weighted nonoverlapping hyperbolae in the spatial domain. Combining TDE functions from several microphone pairs results in a spatial likelihood function (SLF) which is a combination of sets of weighted hyperbolae. Traditionally, the maximum SLF point is considered as the source location but is corrupted by reverberation and noise. Particle filters utilize past source information to improve localization performance in such environments. However, uncertainty exists on how to combine the TDE functions. Results from simulated dialogues in various conditions favor TDE combination using intersection-based methods over union. The real-data dialogue results agree with the simulations, showing a 45% RMSE reduction when choosing the intersection over union of TDE functions.


2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011

Closed-form self-localization of asynchronous microphone arrays

Pasi Pertilä; Mikael Mieskolainen; Matti Hämäläinen

The utilization of distributed microphone arrays in many speech processing applications such as beamforming and speaker localization rely on the precise knowledge of microphone locations. Several self-localization approaches have been presented in the literature but still a simple, accurate, and robust method for asynchronous devices is lacking. This work presents an analytical solution for estimating the positions and rotations of asynchronous loudspeaker equipped microphone arrays or devices. The method is based on emitting and receiving calibration signals from each device, and extracting the time of arrival (TOA) values. Utilizing the knowledge of array geometry in the TOA estimation is proposed to improve accuracy of translation. Results with measurements using four devices on a table surface demonstrates a mean translation error of 11 mm with standard deviation of 6 mm and mean z-axis rotation error of 0.11 (rad) with a standard deviation of 0.14 (rad) in contrast to computer vision annotations with 200 rotations and translation estimates.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Passive Temporal Offset Estimation of Multichannel Recordings of an Ad-Hoc Microphone Array

Pasi Pertilä; Matti Hämäläinen; Mikael Mieskolainen

In recent years ad-hoc microphone arrays have become ubiquitous, and the capture hardware and quality is increasingly more sophisticated. Ad-hoc arrays hold a vast potential for audio applications, but they are inherently asynchronous, i.e., temporal offset exists in each channel, and furthermore the device locations are generally unknown. Therefore, the data is not directly suitable for traditional microphone array applications such as source localization and beamforming. This work presents a least squares method for temporal offset estimation of a static ad-hoc microphone array. The method utilizes the captured audio content without the need to emit calibration signals, provided that during the recording a sufficient amount of sound sources surround the array. The Cramer-Rao lower bound of the estimator is given and the effect of limited number of surrounding sources on the solution accuracy is investigated. A practical implementation is then presented using non-linear filtering with automatic parameter adjustment. Simulations over a range of reverberation and noise levels demonstrate the algorithms robustness. Using smartphones an average RMS error of 3.5 samples (at 48 kHz) was reached when the algorithms assumptions were met.


international symposium on circuits and systems | 2004

Detection and compensation of sensor malfunction in time delay based direction of arrival estimation

Tuomo W. Pirinen; Jari Yli-Hietanen; Pasi Pertilä; Ari Visa

There is an increasing need for robust localization of signal sources of various types. With recent developments in sensory instrumentation, some of these needs can be answered. These new developments have also introduced new requirements. Sensor arrays and networks operate for long periods of time, perhaps unattended, and hardware malfunctions may occur between scheduled maintenance. Sensor systems should be able to detect and compensate for hardware failures. This paper presents a new method to detect and compensate for a failure of one sensor in an array performing time delay based direction of arrival (DOA) estimation. The method utilizes confidence factors based on the planar wave assumption. The proposed method is combined with time delay based DOA estimators and tested with simulations. Results indicate that the given method can be used to detect the failed sensor and improve DOA estimation performance when a failure has occurred.


international conference on acoustics, speech, and signal processing | 2010

A track before detect approach for sequential Bayesian tracking of multiple speech sources

Pasi Pertilä; Matti Hämäläinen

This paper describes a novel multiple acoustic source tracking method based on track before detect paradigm. Multiple particle filters are used to represent the state of all sources. Sources are detected and removed using a likelihood ratio obtained from particle weights. The weights are obtained by evaluating the likelihood of microphone pair phase difference. Tracking performance from recorded data with rich sequences of speech is presented using multiple object tracking metrics. Results show that the proposed method can detect and track multiple temporally overlapping speech sources as well as switching talkers even in weak signal-to-noise ratios.


international conference on acoustics, speech, and signal processing | 2003

Toward intelligent sensors - reliability for time delay based direction of arrival estimates

Tuomo W. Pirinen; Pasi Pertilä; Ari Visa

Increasing demand for automatic large area surveillance has made methods estimating direction of arrival (DOA), especially acoustic methods, an interesting topic. In these systems false alarms and faults are a disturbing factor. This paper presents a reliability criterion and a new method to diminish these errors. In very low signal-to-noise ratio conditions there are seldom any means to improve the performance of a DOA estimator. In these cases it is important to have some information about the estimation reliability. Our focus is on wideband signals propagating as planar waves in three dimensional space. If the estimation method makes no assumptions on the signal propagation speed, it is possible to compute a reliability measure for obtained estimates. With this reliability measure we are able to decide whether the data produced by a time delay based DOA estimation system is usable or not. Indeed, with this criterion an estimation system can state its own reliability.


Computer Speech & Language | 2013

Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking

Pasi Pertilä

Separating speech signals of multiple simultaneous talkers in a reverberant enclosure is known as the cocktail party problem. In real-time applications online solutions capable of separating the signals as they are observed are required in contrast to separating the signals offline after observation. Often a talker may move, which should also be considered by the separation system. This work proposes an online method for speaker detection, speaker direction tracking, and speech separation. The separation is based on multiple acoustic source tracking (MAST) using Bayesian filtering and time-frequency masking. Measurements from three room environments with varying amounts of reverberation using two different designs of microphone arrays are used to evaluate the capability of the method to separate up to four simultaneously active speakers. Separation of moving talkers is also considered. Results are compared to two reference methods: ideal binary masking (IBM) and oracle tracking (O-T). Simulations are used to evaluate the effect of number of microphones and their spacing.


Speech Communication | 2015

Distant speech separation using predicted time-frequency masks from spatial features

Pasi Pertilä; Joonas Nikunen

A neural network for time-frequency mask prediction is proposed.The network is trained using simulated speech to produce naturally occurring masks.After a post-processing stage, the mask is used as a post-filter of a beamformer.Speech mixtures recorded with a circular array in two rooms are separated.The method shows best intelligibility and SNR values compared to contrast methods. Speech separation algorithms are faced with a difficult task of producing high degree of separation without containing unwanted artifacts. The time-frequency (T-F) masking technique applies a real-valued (or binary) mask on top of the signals spectrum to filter out unwanted components. The practical difficulty lies in the mask estimation. Often, using efficient masks engineered for separation performance leads to presence of unwanted musical noise artifacts in the separated signal. This lowers the perceptual quality and intelligibility of the output.Microphone arrays have been long studied for processing of distant speech. This work uses a feed-forward neural network for mapping microphone arrays spatial features into a T-F mask. Wiener filter is used as a desired mask for training the neural network using speech examples in simulated setting. The T-F masks predicted by the neural network are combined to obtain an enhanced separation mask that exploits the information regarding interference between all sources. The final mask is applied to the delay-and-sum beamformer (DSB) output.The algorithms objective separation capability in conjunction with the separated speech intelligibility is tested with recorded speech from distant talkers in two rooms from two distances. The results show improvement in instrumental measure for intelligibility and frequency-weighted SNR over complex-valued non-negative matrix factorization (CNMF) source separation approach, spatial sound source separation, and conventional beamforming methods such as the DSB and minimum variance distortionless response (MVDR).


international conference on acoustics, speech, and signal processing | 2017

Sound event detection using spatial features and convolutional recurrent neural network

Sharath Adavanne; Pasi Pertilä; Tuomas Virtanen

This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection. We extend the convolutional recurrent neural network to handle more than one type of these multichannel features by learning from each of them separately in the initial stages. We show that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multichannel audio better when they are presented as separate layers of a volume. Using the proposed spatial features over monaural features on the same network gives an absolute F-score improvement of 6.1% on the publicly available TUT-SED 2016 dataset and 2.7% on the TUT-SED 2009 dataset that is fifteen times larger.


Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014 4th Joint Workshop on | 2014

Self-localization of wireless acoustic sensors in meeting rooms

Mikko Parviainen; Pasi Pertilä; Matti Hämäläinen

This paper presents a passive acoustic self-localization and synchronization system, which estimates the positions of wireless acoustic sensors utilizing the signals emitted by the persons present in the same room. The system is designed to utilize common off-the-shelf devices such as mobile phones. Once devices are self-localized and synchronized, the system could be utilized by traditional array processing methods. The proposed calibration system is evaluated with real recordings from meeting scenarios. The proposed system builds on earlier work with the added contribution of this work is i) increasing the accuracy of positioning, and ii) introduction data-driven data association. The results show that improvement over the existing methods in all tested recordings with 10 smartphones.

Collaboration


Dive into the Pasi Pertilä's collaboration.

Top Co-Authors

Avatar

Mikko Parviainen

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Ari Visa

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Teemu Korhonen

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Tuomas Virtanen

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Tuomo W. Pirinen

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Joonas Nikunen

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Mikael Mieskolainen

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Sharath Adavanne

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Toni Mäkinen

Tampere University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge