Takuya Yoshioka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Takuya Yoshioka is active.

Explore More

Publication

Featured researches published by Takuya Yoshioka.

2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011

A microphone array system integrating beamforming, feature enhancement, and spectral mask-based noise estimation

Takuya Yoshioka; Tomohiro Nakatani

This paper proposes a microphone array system that integrates beamforming, feature enhancement, and highly accurate noise feature model estimation based on spectral masking. Previously proposed methods for combining beamformers and single-channel post-filters estimate noise power spectra or noise features based only on spatial information acquired from multiple microphones. These methods suffer from low noise estimation accuracy when the available microphones are limited or when there are array calibration or steering vector estimation errors. By contrast, the proposed method estimates a noise feature model accurately in a highly adaptive way by capitalizing on both spatial information and the characteristics of speech. Specifically, the method leverages an inter-microphone phase difference model, a clean feature model, and a harmonicitybased spectral mask model for the accurate estimation of spectral masks, each of which indicates the presence or absence of speech at a particular frequency bin. The estimated spectral masks are used to obtain the time-varying noise feature model. Results of a digit recognition experiment prove that the proposed system significantly outperforms an existing microphone array system combining a beamformer and a post-filter.

Archive | 2010

Inverse Filtering for Speech Dereverberation Without the Use of Room Acoustics Information

Masato Miyoshi; Marc Delcroix; Keisuke Kinoshita; Takuya Yoshioka; Tomohiro Nakatani; Takafumi Hikichi

This chapter discusses multi-microphone inverse filtering, which does not use a priori information of room acoustics, such as room impulse responses between the target speaker and the microphones. One major problem as regards achieving this type of processing is the degradation of the recovered speech caused by excessive equalization of the speech characteristics. To overcome this problem, several approaches have been studied based on a multichannel linear prediction framework, since the framework may be able to perform speech dereverberation as well as noise attenuation. Here, we first discuss the relationship between optimal filtering and linear prediction. Then, we review our four approaches, which differ in terms of their treatment of the statistical properties of a speech signal.

Archive | 2010

Speech Dereverberation and Denoising Based on Time Varying Speech Model and Autoregressive Reverberation Model

Takuya Yoshioka; Tomohiro Nakatani; Keisuke Kinoshita; Masato Miyoshi

Speech dereverberation and denoising have been important problems for decades in the speech processing field. As regards to denoising, a model-based approach has been intensively studied and many practical methods have been developed. In contrast, research on dereverberation has been relatively limited. It is in very recent years that studies on a model-based approach to dereverberation have made rapid progress. This chapter reviews a model-based dereverberation method developed by the authors. This dereverberation method is effectively combined with a traditional denoising technique, specifically a multichannel Wiener filter. This combined method is derived by solving a dereverberation and denoising problem with a modelbased approach. The combined dereverberation and denoising method as well as the original dereverberation method are developed by using a multichannel autoregressive model of room acoustics and a time-varying power spectrum model of clean speech signals.

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 2006

Common Acoustical Pole Estimation from Multi-Channel Musical Audio Signals

Takuya Yoshioka; Takafumi Hikichi; Masato Miyoshi; Hiroshi G. Okuno

This paper describes a method for estimating the amplitude characteristics of poles common to multiple room transfer functions from musical audio signals received by multiple microphones. Knowledge of these pole characteristics would make it easier to manipulate audio equalizers, since they correspond to the room resonance. It has been proven that an estimate of the poles can be calculated precisely when a source signal is white. However, if a source signal is colored as in the case of a musical audio signal, the estimate is degraded by the frequency characteristics originally contained in the source signal. In this paper, we consider that an amplitude spectrum of a musical audio signal consists of its envelope and fine structure. We assume that musical pieces can be classified into several categories according to their average amplitude spectral envelopes. Based on this assumption, the amplitude spectral envelope of the musical audio signal can be obtained from prior knowledge of the average amplitude spectral envelope of a musical piece category into which the target piece is classified. On the other hand, the fine structure is identified based on its time variance. By removing both the spectral envelope and the fine structure from the amplitude spectrum estimated with the conventional method, the amplitude characteristics of the acoustical poles can be extracted. Simulation results for 20 popular songs revealed that our method was capable of estimating the amplitude characteristics of the acoustical poles with a spectral distortion of 3.11 dB. In particular, most of the spectral peaks, corresponding to the room resonance modes, were successfully detected.

Archive | 2015

Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

Yasuaki Iwata; Tomohiro Nakatani; Takuya Yoshioka; Masakiyo Fujimoto; Hirofumi Saito

When speech signals are captured in real acoustical environments, the captured signals are distorted by certain types of interference, such as ambient noise, reverberation, and extraneous speakers’ utterances. There are two important approaches to speech enhancement that reduce such interference in the captured signals. One approach is based on the spatial features of the signals, such as direction of arrival and acoustic transfer functions, and enhances speech using multichannel audio signal processing. The other approach is based on speech spectral models that represent the probability density function of the speech spectra, and it enhances speech by distinguishing between speech and noise based on the spectral models. In this chapter, we propose a new approach that integrates the above two approaches. The proposed approach uses the spatial and spectral features of signals in a complementary manner to achieve reliable and accurate speech enhancement. The approach can be applied to various speech enhancement problems, including denoising, dereverberation, and blind source separation (BSS). In particular, in this chapter, we focus on applying the approach to BSS. We show experimentally that the proposed integration can improve the performance of BSS compared with a conventional approach.

international symposium/conference on music information retrieval | 2004