Shota Morita
Japan Advanced Institute of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shota Morita.
international symposium on chinese spoken language processing | 2014
Shota Morita; Masashi Unoki; Xugang Lu; Masato Akagi
Voice activity detection (VAD) is used to detect speech and non-speech periods from observed speech signals. It is an important front-end technique for many speech technology applications. Many VAD methods have been proposed. However most of them have been applied under clean or noisy conditions. Only a few methods have been proposed for reverberant conditions, particularly under noisy reverberant conditions. We therefore need to understand the ill effects of noise and reverberation on speech to design an accurate and robust method of VAD under noisy reverberant conditions. The ill effects of noise and reverberation for speech can be regarded as the modulation transfer function (MTF) under noisy and reverberant conditions. Therefore, our study is based on the MTF concept to reduce the ill effects of noise and reverberation on speech, and propose a robust VAD method that we obtained in this study. Noise reduction and dereverberation were first applied to the temporal power envelope of the speech signal to restore the temporal power envelope with this method. Then, power thresholding as a VAD decision was designed based on the restored temporal power envelope. A method of estimating the signal to noise ratio (SNR) was proposed to accurately estimate the SNR in the noise reduction stage. Experiments under both artificial and realistic noisy reverberant conditions were carried out to evaluate the performance of the proposed method of VAD and it was compared with conventional VAD methods. The results revealed that the proposed method significantly outperformed the conventional methods under artificial and realistic noisy reverberant conditions.
Speech Communication | 2016
Yang Liu; Naushin Nower; Shota Morita; Masashi Unoki
We introduced a restoration scheme for instantaneous amplitude and phase in noisy reverberant environments.We dealt with the summation of additive noise and late reverberant speech in the Kalman filter and removed the early reflection effect by CMN.Objective and subjective experiments revealed that the proposed method can improve both quality and intelligibility of speech in noisy reverberant environments.The results of ASR experiments showed that the proposed method outperforms the conventional methods to work well in noisy reverberant environments. We previously proved that restoring the instantaneous amplitude as well as instantaneous phase on the output from Gammatone filterbank plays a significant role in speech enhancement. However, dereverberation is still a challenge since the previously proposed scheme can only work in noisy environments. In this paper, we extend our previously proposed scheme to general speech enhancement for removing both the effects of noise and reverberation by restoring instantaneous amplitude and phase simultaneously. Objective and subjective experiments were conducted under various noisy reverberant conditions to evaluate the effectiveness of the extension of the proposed scheme. The signal to error ratio (SER), correlation, PESQ, and SNR loss were used in objective evaluations. The normalized mean preference score and correctness in modified rhyme test (MRT) were used in subjective evaluations. We also tested how effective our proposed scheme is as a front-end for an automatic speech recognition (ASR) system in realistic noisy reverberant environments. The results of all evaluations revealed that the proposed scheme could effectively improve quality and intelligibility of speech signals under noisy reverberant conditions.
international symposium on chinese spoken language processing | 2016
Yang Liu; Naushin Nower; Shota Morita; Masashi Unoki
This paper proposes a robust front-end for speech applications based on restoration scheme of instantaneous amplitude and phase. Typical applications such as hearing aids and automatic speech recognition systems still have challenging issues with regard to robustness against noise and reverberation. The proposed front-end employed a combination of our previously proposed method for restoring instantaneous amplitude and phase on a Gammatone filterbank and cepstral mean normalization (CMN). The first method can remove late reverberated and additive noise components from the observed speech, while the second method can remove the early reflection. In this paper, we comparatively evaluated the proposed method with other typical methods as robust front-end for speech recognition by human and machine in noisy reverberant environments. Modified Rhyme tests and word recognition tests were carried out as speech recognition by human and machine. The results of both evaluations revealed that the proposed front-end could effectively improve correctness of speech intelligibility and word recognition rate in noisy reverberant environments. In addition, effect of phase information was found to greatly improve the quality and intelligibility of speech.
international symposium on chinese spoken language processing | 2014
Shota Morita; Xugang Lu; Masashi Unoki
Estimates of the signal to noise ratio (SNR) of speech play an important role in noise reduction and predictions of speech intelligibility based on the speech transmission index (STI). Techniques of voice activity detection (VAD) must be used explicitly or implicitly during estimates of SNR to detect speech and non-speech sections. The decision of threshold in most studies has been fixed for VAD to speech and non-speech classifications during SNR estimates. We argue that fixing the decision of the threshold for all testing conditions is not optimal in controlling the false acceptance and miss detection rates of speech. We propose SNR estimates in this paper using a speech and non-speech detection algorithm based on optimizing the trade-off between false speech acceptance and miss detection rates on a receiver operating characteristic (ROC) curve. Rather than fixing the decision threshold in VAD for all SNR conditions, we optimally estimate the decision threshold using an ROC curve for each SNR condition. Thresholds are optimized in subband signals on a large training data set composed of various SNR conditions and noise types. After speech and non-speech are detected, SNR is estimated by summarizing the subband powers of speech and noise from all subbands. We applied the proposed method of estimating SNR based on AURORA2J and NOISEX-92 data corpora. The experimental results demonstrated that the proposed method was more accurate than the classical method of estimating SNR. The proposed approach could be used in robust VAD and STI estimates.
Journal of the Acoustical Society of America | 2012
Shota Morita; Masashi Unoki; Xugang Lu; Yang Liu; Masato Akagi; Ruediger Hoffmann
The concept of the modulation transfer function (MTF) can be successfully applied to evaluating the quality of speech transmission in room acoustics (noisy reverberant environments) as functions of reverberation (reverberation time) and additive noise (signal to noise ratio) (Houtgast and Steeneken, J. Acoust. Soc. Am., 77, 1069-1077, 1985). This paper proposes a method of restoring the power envelope from noisy reverberant speech based on the MTF concept. The proposed method does not need the impulse response and noise conditions in room acoustics to be measured to enhance speech. The proposed approach suppresses the effects of reverberation and noise on the power envelopes by restoring the smeared MTF. We carried out massive simulations of noise-suppression and dereverberation on noisy reverberant speech to objectively evaluate the proposed method. The results revealed that the proposed method could simultaneously work well with both the suppression of noise and dereverberation. We further tested the pr...
conference of the international speech communication association | 2011
Masashi Unoki; Xugang Lu; Rico Petrick; Shota Morita; Masato Akagi; Rüdiger Hoffmann
conference of the international speech communication association | 2013
Yasuaki Kanai; Shota Morita; Masashi Unoki
Journal of Signal Processing | 2014
Akikazu Miyazaki; Shota Morita; Masashi Unoki
IEICE technical report. Speech | 2011
Shota Morita; Xugang Lu; Masashi Unoki; Masato Akagi; Luediger Hoffmann
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 2016
Yang Liu; Shota Morita; Masashi Unoki
Collaboration
Dive into the Shota Morita's collaboration.
National Institute of Information and Communications Technology
View shared research outputs