Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Masato Miyoshi is active.

Publication


Featured researches published by Masato Miyoshi.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1988

Inverse filtering of room acoustics

Masato Miyoshi; Yutaka Kaneda

A novel method is proposed for realizing exact inverse filtering of acoustic impulse responses in room. This method is based on the principle called the multiple-input/output inverse theorem (MINT). The inverse is constructed from multiple finite-impulse response (FIR) filters (transversal filters) by adding some extra acoustic signal-transmission channels produced by multiple loudspeakers or microphones. The coefficients of these FIR filters can be computed by the well-known rules of matrix algebra. Inverse filtering in a sound field is investigated experimentally. It is shown that the proposed method is greatly superior to previous methods that use only one acoustic signal-transmission channel. The results prove the possibility of sound reproduction and sound reception without any distortion caused by reflected sounds. >


IEEE Transactions on Audio, Speech, and Language Processing | 2009

Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction

Keisuke Kinoshita; Marc Delcroix; Tomohiro Nakatani; Masato Miyoshi

A speech signal captured by a distant microphone is generally smeared by reverberation, which severely degrades automatic speech recognition (ASR) performance. One way to solve this problem is to dereverberate the observed signal prior to ASR. In this paper, a room impulse response is assumed to consist of three parts: a direct-path response, early reflections and late reverberations. Since late reverberations are known to be a major cause of ASR performance degradation, this paper focuses on dealing with the effect of late reverberations. The proposed method first estimates the late reverberations using long-term multi-step linear prediction, and then reduces the late reverberation effect by employing spectral subtraction. The algorithm provided good dereverberation with training data corresponding to the duration of one speech utterance, in our case, less than 6 s. This paper describes the proposed framework for both single-channel and multichannel scenarios. Experimental results showed substantial improvements in ASR performance with real recordings under severe reverberant conditions.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization

Takuya Yoshioka; Tomohiro Nakatani; Masato Miyoshi; Hiroshi G. Okuno

This paper proposes a method for performing blind source separation (BSS) and blind dereverberation (BD) at the same time for speech mixtures. In most previous studies, BSS and BD have been investigated separately. The separation performance of conventional BSS methods deteriorates as the reverberation time increases while many existing BD methods rely on the assumption that there is only one sound source in a room. Therefore, it has been difficult to perform both BSS and BD when the reverberation time is long. The proposed method uses a network, in which dereverberation and separation networks are connected in tandem, to estimate source signals. The parameters for the dereverberation network (prediction matrices) and those for the separation network (separation matrices) are jointly optimized. This enables a BD process to take a BSS process into account. The prediction and separation matrices are alternately optimized with each depending on the other; hence, we call the proposed method the conditional separation and dereverberation (CSD) method. Comprehensive evaluation results are reported, where all the speech materials contained in the complete test set of the TIMIT corpus are used. The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s. The direct-to-reverberation ratio is also improved by about 10 dB.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction

Tomohiro Nakatani; Takuya Yoshioka; Keisuke Kinoshita; Masato Miyoshi; Biing-Hwang Juang

This paper proposes a statistical model-based speech dereverberation approach that can cancel the late reverberation of a reverberant speech signal captured by distant microphones without prior knowledge of the room impulse responses. With this approach, the generative model of the captured signal is composed of a source process, which is assumed to be a Gaussian process with a time-varying variance, and an observation process modeled by a delayed linear prediction (DLP). The optimization objective for the dereverberation problem is derived to be the sum of the squared prediction errors normalized by the source variances; hence, this approach is referred to as variance-normalized delayed linear prediction (NDLP). Inheriting the characteristic of DLP, NDLP can robustly estimate an inverse system for late reverberation in the presence of noise without greatly distorting a direct speech signal. In addition, owing to the use of variance normalization, NDLP allows us to improve the dereverberation result especially with relatively short (of the order of a few seconds) observations. Furthermore, NDLP can be implemented in a computationally efficient manner in the time-frequency domain. Experimental results demonstrate the effectiveness and efficiency of the proposed approach in comparison with two existing approaches.


international conference on acoustics, speech, and signal processing | 2003

Blind dereverberation of single channel speech signal based on harmonic structure

Tomohiro Nakatani; Masato Miyoshi

The paper presents a new method for dereverberation of speech signals with a single microphone. For applications such as speech recognition, reverberant speech causes serious problems when a distant microphone is used in recording. This is especially severe when the reverberation time exceeds 0.5 s. We propose a method which uses the fundamental frequency (F/sub 0/) of the target speech as the primary feature for dereverberation. This method initially estimates F/sub 0/ and the harmonic structure of the speech signal and then obtains a dereverberation operator. This operator transforms the reverberant signal to its direct signal based on an inverse filtering operation. Dereverberation is achieved without prior knowledge of either the room acoustics or the target speech. Experimental results show that the dereverberation operator estimated from 5240 Japanese word utterances could effectively reduce the reverberation when the reverberation time is longer than 0.1 s.


international conference on acoustics, speech, and signal processing | 2008

Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation

Tomohiro Nakatani; Takuya Yoshioka; Keisuke Kinoshita; Masato Miyoshi; Biing-Hwang Juang

It has recently been shown that the use of the time-varying nature of speech signals allows us to achieve high quality speech dereverberation based on multi-channel linear prediction (MCLP). However, this approach requires a huge computing cost for calculating large covariance matrices in the time domain. In addition, we face the important problem of how to combine the speech dereverberation efficiently with many other useful speech enhancement techniques in the short time Fourier transform (STFT) domain. As the first step to overcoming these problems, this paper presents methods for implementing MCLP based speech dereverberation that allow it to work in the STFT domain with much less computing cost. The effectiveness of the present methods is confirmed by experiments in terms of the recovered signal quality and the computing time.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Harmonicity-Based Blind Dereverberation for Single-Channel Speech Signals

Tomohiro Nakatani; Keisuke Kinoshita; Masato Miyoshi

The distant acquisition of acoustic signals in an enclosed space often produces reverberant artifacts due to the room impulse response. Speech dereverberation is desirable in situations where the distant acquisition of acoustic signals is involved. These situations include hands-free speech recognition, teleconferencing, and meeting recording, to name a few. This paper proposes a processing method, named Harmonicity-based dEReverBeration (HERB), to reduce the amount of reverberation in the signal picked up by a single microphone. The method makes extensive use of harmonicity, a unique characteristic of speech, in the design of a dereverberation filter. In particular, harmonicity enhancement is proposed and demonstrated as an effective way of estimating a filter that approximates an inverse filter corresponding to the room impulse response. Two specific harmonicity enhancement techniques are presented and compared; one based on an average transfer function and the other on the minimization of a mean squared error function. Prototype HERB systems are implemented by introducing several techniques to improve the accuracy of dereverberation filter estimation, including time warping analysis. Experimental results show that the proposed methods can achieve high-quality speech dereverberation, when the reverberation time is between 0.1 and 1.0 s, in terms of reverberation energy decay curves and automatic speech recognition accuracy


EURASIP Journal on Advances in Signal Processing | 2007

Inverse filtering for speech dereverberation less sensitive to noise and room transfer function fluctuations

Takafumi Hikichi; Marc Delcroix; Masato Miyoshi

Inverse filtering of room transfer functions (RTFs) is considered an attractive approach for speech dereverberation given that the time invariance assumption of the used RTFs holds. However, in a realistic environment, this assumption is not necessarily guaranteed, and the performance is degraded because the RTFs fluctuate over time and the inverse filter fails to remove the effect of the RTFs. The inverse filter may amplify a small fluctuation in the RTFs and may cause large distortions in the filters output. Moreover, when interference noise is present at the microphones, the filter may also amplify the noise. This paper proposes a design strategy for the inverse filter that is less sensitive to such disturbances. We consider that reducing the filter energy is the key to making the filter less sensitive to the disturbances. Using this idea as a basis, we focus on the influence of three design parameters on the filter energy and the performance, namely, the regularization parameter, modeling delay, and filter length. By adjusting these three design parameters, we confirm that the performance can be improved in the presence of RTF fluctuations and interference noise.


international conference on acoustics, speech, and signal processing | 2006

Spectral Subtraction Steered by Multi-Step Forward Linear Prediction For Single Channel Speech Dereverberation

Keisuke Kinoshita; Tomohiro Nakatani; Masato Miyoshi

A speech signal captured by a distant microphone is generally smeared by reverberation, which severely degrades automatic speech recognition (ASR) performance. In this paper, we propose a novel dereverberation method utilizing multi-step forward linear prediction. It precisely estimates and suppresses the late reflections, which constitute a major cause of ASR performance degradation. Our experimental results showed that the proposed method can improve ASR performance significantly even without using special adaptation methods such as multi-condition acoustic model training


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Precise Dereverberation Using Multichannel Linear Prediction

Marc Delcroix; Takafumi Hikichi; Masato Miyoshi

In this paper, we discuss the numerical problems posed by the previously reported LInear-predictive Multi-input Equalization (LIME) algorithm when dealing with dereverberation of long room transfer functions (RTF). The LIME algorithm consists of two steps. First, a speech residual is calculated using multichannel linear prediction. The residual is free from the room reverberation effect but it is also excessively whitened because the average speech characteristics have been removed. In the second step, LIME estimates such average speech characteristics to compensate for the excessive whitening. When multiple microphones are used, the speech characteristics are common to all microphones whereas the room reverberation differs for each microphone. LIME estimates the average speech characteristics as the characteristics that are common to all the microphones. Therefore, LIME relies on the hypothesis that there are no zeros common to all channels. However, it is known that RTFs have a large number of zeros close to the unit circle on the z-plane. Consequently, the zeros of the RTFs are distributed in the same regions of the z-plane and, if an insufficient number of microphones are used, the channels would present numerically overlapping zeros. In such a case, the dereverberation algorithm would perform poorly. We discuss the influence of overlapping zeros on the dereverberation performance of LIME. Spatial information can be used to deal with the problem of overlapping zeros. By increasing the number of microphones, the number of overlapping zeros decreases and the dereverberation performance is improved. We also examine the use of cepstral mean normalization for post-processing to reduce the remaining distortions caused by the overlapping zeros

Collaboration


Dive into the Masato Miyoshi's collaboration.

Top Co-Authors

Avatar

Keisuke Kinoshita

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Tomohiro Nakatani

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tomohiro Nakatani

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Biing-Hwang Juang

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge