Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kou Tanaka is active.

Publication


Featured researches published by Kou Tanaka.


international conference on acoustics, speech, and signal processing | 2014

An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement

Kou Tanaka; Tomoki Toda; Graham Neubig; Sakriani Sakti; Satoshi Nakamura

We implement removing micro-prosody with low-pass filtering and avoiding Unvoiced/Voiced (U/V) prediction as part of a hybrid approach to improve statistical excitation prediction in electrolaryngeal (EL) speech enhancement. An electrolarynx is a device that artificially generates excitation sounds to enable laryngectomees to produce EL speech. Although proficient laryngectomees can produce quite intelligible EL speech, it sounds very unnatural due to the mechanical excitation produced by the device. Moreover, the excitation sounds produced by the device often leak outside, adding noise to EL speech. To address these issues, in our previous work, we proposed a hybrid method using a noise reduction method for enhancing spectral parameters and voice conversion method for predicting excitation parameters. In this paper, we evaluate the effect of removing micro-prosody with low-pass filtering and avoiding U/V prediction in the hybrid enhancement process.


international conference on acoustics, speech, and signal processing | 2016

Statistical F0 prediction for electrolaryngeal speech enhancement considering generative process of F0 contours within product of experts framework

Kou Tanaka; Hirokazu Kameoka; Tomoki Toda; Satoshi Nakamura

We have previously proposed a statistical fundamental frequency (F0) prediction method that makes it possible to predict the underlying F0 contour of electrolaryngeal (EL) speech from its spectral feature sequence. Although this method was shown to contribute to improving the naturalness of EL speech as a whole, the predicted F0 contour was still unnatural compared with that in normal speech. One possible solution to improve the naturalness of the predicted F0 contours would be to take account of the physical mechanism of vocal phonation. Recently a statistical model of voice F0 contours was formulated by constructing a stochastic counterpart of the Fujisaki model, a well-founded mathematical model representing the control mechanism of vocal fold vibration. This paper proposes a Product-of-Experts model to incorporate this generative model of voice F0 contours into the statistical F0 prediction model. Based on the constructed model, we derive algorithms for parameter training and F0 prediction. Experimental results revealed that the proposed method successfully outperformed our previously proposed method in terms of the naturalness of the predicted F0 contours.


conference on computers and accessibility | 2015

An Enhanced Electrolarynx with Automatic Fundamental Frequency Control based on Statistical Prediction

Kou Tanaka; Tomoki Toda; Graham Neubig; Sakriani Sakti; Satoshi Nakamura

An electrolarynx is a type of speaking aid device which is able to mechanically generate excitation sounds to help laryngectomees produce electrolaryngeal (EL) speech. Although EL speech is quite intelligible, its naturalness suffers from monotonous fundamental frequency patterns of the mechanical excitation sounds. To make it possible to generate more natural excitation sounds, we have proposed a method to automatically control the fundamental frequency of the sounds generated by the electrolarynx based on a statistical prediction model, which predicts the fundamental frequency patterns from the produced EL speech in real-time. In this paper, we develop a prototype system by implementing the proposed control method in an actual, physical electrolarynx and evaluate its performance.


asia pacific signal and information processing association annual summit and conference | 2014

An evaluation of target speech for a nonaudible murmur enhancement system in noisy environments

Sakura Tsuruta; Kou Tanaka; Tomoki Toda; Graham Neubig; Sakriani Sakti; Satoshi Nakamura

Nonaudible murmur (NAM) is a soft whispered voice recorded with NAM microphone through body conduction. NAM allows for silent speech communication as it makes it possible for the speaker to convey their message in a nonaudible voice. However, its intelligibility and naturalness are significantly degraded compared to those of natural speech owing to acoustic changes caused by body conduction. To address this issue, statistical voice conversion (VC) methods from NAM to normal speech (NAM-to-Speech) and to a whispered voice (NAM-to-Whisper) have been proposed. It has been reported that these NAM enhancement methods significantly improve speech quality and intelligibility of NAM, and NAM-to-Whisper is more effective than NAM-to-Speech. However, it is still not obvious which method is more effective if a listener listens to the enhanced speech in noisy environments, a situation that often happens in silent speech communication. In this paper, assuming a typical situation in which NAM is uttered by a speaker in a quiet environment and conveyed to a listener in noisy environments, we investigate what kinds of target speech are more effective for NAM enhancement. We also propose NAM enhancement methods for converting NAM to other types of target voiced speech. Experiments show that the conversion process into voiced speech is more effective than that into unvoiced speech for generating more intelligible speech in noisy environments.


european signal processing conference | 2016

Real-time vibration control of an electrolarynx based on statistical F 0 contour prediction

Kou Tanaka; Tomoki Toda; Graham Neubig; Satoshi Nakamura

An electrolarynx is a speaking aid device to artificially generate excitation sounds to help laryngectomees produce electrolaryngeal (EL) speech. Although EL speech is quite intelligible, its naturalness significantly suffers from the unnatural fundamental frequency (F0) patterns of the mechanical excitation sounds. To make it possible to produce more naturally sounding EL speech, we have proposed a method to automatically control F0 patterns of the excitation sounds generated from the electrolarynx based on the statistical F0 prediction, which predicts F0 patterns from the produced EL speech in real-time. In our previous work, we have developed a prototype system by implementing the proposed real-time prediction method in an actual, physical electrolarynx, and through the use of the prototype system, we have found that improvements of the naturalness of EL speech yielded by the prototype system tend to be lower than that yielded by the batch-type prediction. In this paper, we examine negative impacts caused by latency of the real-time prediction on the F0 prediction accuracy, and to alleviate them, we also propose two methods, 1) modeling of segmented continuous F0 (CF0) patterns and 2) prediction of forthcoming F0 values. The experimental results demonstrate that 1) the conventional real-time prediction method needs a large delay to predict CF0 patterns and 2) the proposed methods have positive impacts on the real-time prediction.


Journal of the Acoustical Society of America | 2016

Evaluation of electrolarynx controlled by real-time statistical F0 prediction

Kou Tanaka; Tomoki Toda; Satoshi Nakamura

One of the major speaking methods for laryngectomees is a speaking method using an electrolarynx to generate artificial excitation sounds, instead of vocal fold vibration. Although electrolaryngeal (EL) speech is relatively intelligible, its naturalness is quite low owing to the artificial excitation sounds. To make it possible to produce more naturally sounding EL speech, we have proposed an automatic control method of the fundamental frequency (F0) patterns of the excitation sounds generated from the electrolarynx based on real-time statistical F0 prediction. In this method, a vibration of the electrolarynx to generate the excitation sounds is controlled not according to additional signals consciously provided by the laryngectomees but using only their produced EL speech signals. In the previous report, we have developed a prototype system by implementing our proposed method to the electrolarynx and have evaluated its performance objectively through a simulation. In this paper, we evaluate its performan...


asia pacific signal and information processing association annual summit and conference | 2014

An inter-speaker evaluation through simulation of electrolarynx control based on statistical F 0 prediction

Kou Tanaka; Tomoki Toda; Graham Neubig; Sakriani Sakti; Satoshi Nakamura

An electrolarynx is a device that artificially generates excitation sounds to produce electrolaryngeal (EL) speech. Although proficient laryngectomees can produce intelligible EL speech by using this device, it sounds quite unnatural due to the mechanical excitation. To address this issue, we have proposed several EL speech enhancement methods using statistical voice conversion and showed that statistical prediction of excitation parameters, such as F0 patterns, was essential to significantly improve naturalness of EL speech. Based on this result, we have also proposed a direct control method of F0 patterns of excitation sounds generated from the electrolarynx based on the statistical excitation prediction, which may allow EL speech enhancement to be applied to face-to-face conversation. In our previous work, this direct control method was evaluated through simulation using only a single laryngectomees EL speech and it was demonstrated that this method allows for improved naturalness of EL speech while preserving listenability. However, because quality of EL speech highly depends on the proficiency of each laryngectomee, it is still not clear whether these methods will generalize to other speakers. In addition, while previous work only evaluated the naturalness and listenability, intelligibility is also an important factor that has not been evaluated. In this paper, we apply the direct control method to multiple speakers consisting of two real laryngectomees and one non-laryngectomee and evaluate its performance through simulations in terms of naturalness, listenability, and intelligibility. The experimental results demonstrate that the proposed method yields significant improvements in naturalness of EL speech for multiple laryngectomees while maintaining listenability and intelligibility.


IEICE Transactions on Information and Systems | 2014

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation

Kou Tanaka; Tomoki Toda; Graham Neubig; Sakriani Sakti; Satoshi Nakamura


conference of the international speech communication association | 2013

A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Spectral Subtraction and Statistical Voice Conversion

Kou Tanaka; Tomoki Toda; Graham Neubig; Sakriani Sakti; Satoshi Nakamura


conference of the international speech communication association | 2015

Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments.

Yusuke Tajiri; Kou Tanaka; Tomoki Toda; Graham Neubig; Sakriani Sakti; Satoshi Nakamura

Collaboration


Dive into the Kou Tanaka's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Satoshi Nakamura

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Sakriani Sakti

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Graham Neubig

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Graham Neubig

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sakura Tsuruta

Nara Institute of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge