Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Keiichi Tokuda is active.

Publication


Featured researches published by Keiichi Tokuda.


international conference on acoustics, speech, and signal processing | 2007

Statistical Parametric Speech Synthesis

Alan W. Black; Heiga Zen; Keiichi Tokuda

This paper gives a general overview of techniques in statistical parametric speech synthesis. One of the instances of these techniques, called HMM-based generation synthesis (or simply HMM-based synthesis), has recently been shown to be very effective in generating acceptable speech synthesis. This paper also contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years. Advantages and disadvantages of statistical parametric synthesis are highlighted as well as identifying where we expect the key developments to appear in the immediate future.


international conference on acoustics, speech, and signal processing | 2000

Speech parameter generation algorithms for HMM-based speech synthesis

Keiichi Tokuda; Takayoshi Yoshimura; Takashi Masuko; Takao Kobayashi; Tadashi Kitamura

This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors. In the algorithm, we assume that the state sequence (state and mixture sequence for the multi-mixture case) or a part of the state sequence is unobservable (i.e., hidden or latent). As a result, the algorithm iterates the forward-backward algorithm and the parameter generation algorithm for the case where the state sequence is given. Experimental results show that by using the algorithm, we can reproduce clear formant structure from multi-mixture HMMs as compared with that produced from single-mixture HMMs.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Tomoki Toda; Alan W. Black; Keiichi Tokuda

In this paper, we describe a novel spectral conversion method for voice conversion (VC). A Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers. The conventional method converts spectral parameters frame by frame based on the minimum mean square error. Although it is reasonably effective, the deterioration of speech quality is caused by some problems: 1) appropriate spectral movements are not always caused by the frame-based conversion process, and 2) the converted spectra are excessively smoothed by statistical modeling. In order to address those problems, we propose a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory. Not only static but also dynamic feature statistics are used for realizing the appropriate converted spectrum sequence. Moreover, the oversmoothing effect is alleviated by considering a global variance feature of the converted spectra. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of both speech quality and conversion accuracy for speaker individuality.


IEICE Transactions on Information and Systems | 2007

A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

Tomoki Toda; Keiichi Tokuda

This paper describes a novel parameter generation algorithm for an HMM-based speech synthesis technique. The conventional algorithm generates a parameter trajectory of static features that maximizes the likelihood of a given HMM for the parameter sequence consisting of the static and dynamic features under an explicit constraint between those two features. The generated trajectory is often excessively smoothed due to the statistical processing. Using the over-smoothed speech parameters usually causes muffled sounds. In order to alleviate the over-smoothing effect, we propose a generation algorithm considering not only the HMM likelihood maximized in the conventional algorithm but also a likelihood for a global variance (GV) of the generated trajectory. The latter likelihood works as a penalty for the over-smoothing, i.e., a reduction of the GV of the generated trajectory. The result of a perceptual evaluation demonstrates that the proposed algorithm causes considerably large improvements in the naturalness of synthetic speech.


international conference on acoustics, speech, and signal processing | 1992

An adaptive algorithm for mel-cepstral analysis of speech

Toshiaki Fukada; Keiichi Tokuda; Takao Kobayashi; Satoshi Imai

The authors describe a mel-cepstral analysis method and its adaptive algorithm. In the proposed method, the authors apply the criterion used in the unbiased estimation of log spectrum to the spectral model represented by the mel-cepstral coefficients. To solve the nonlinear minimization problem involved in the method, they give an iterative algorithm whose convergence is guaranteed. Furthermore, they derive an adaptive algorithm for the mel-cepstral analysis by introducing an instantaneous estimate for gradient of the criterion. The adaptive mel-cepstral analysis system is implemented with an IIR adaptive filter which has an exponential transfer function, and whose stability is guaranteed. The authors also present examples of speech analysis and results of an isolated word recognition experiment.<<ETX>>


international conference on acoustics, speech, and signal processing | 1995

Speech parameter generation from HMM using dynamic features

Keiichi Tokuda; Takao Kobayashi; Satoshi Imai

This paper proposes an algorithm for speech parameter generation from HMMs which include the dynamic features. The performance of speech recognition based on HMMs has been improved by introducing the dynamic features of speech. Thus we surmise that, if there is a method for speech parameter generation from HMMs which include the dynamic features, it will be useful for speech synthesis by rule. It is shown that the parameter generation from HMMs using the dynamic features results in searching for the optimum state sequence and solving a set of linear equations for each possible state sequence. We derive a fast algorithm for the solution by the analogy of the RLS algorithm for adaptive filtering. We also show the effect of incorporating the dynamic features by an example of speech parameter generation.


Proceedings of the IEEE | 2013

Speech Synthesis Based on Hidden Markov Models

Keiichi Tokuda; Yoshihiko Nankaku; Tomoki Toda; Heiga Zen; Junichi Yamagishi; Keiichiro Oura

This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech. The main advantage of this approach is its flexibility in changing speaker identities, emotions, and speaking styles. This paper also discusses the relation between the HMM-based approach and the more conventional unit-selection approach that has dominated over the last decades. Finally, advanced techniques for future developments are described.


Speech Communication | 2008

Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model

Tomoki Toda; Alan W. Black; Keiichi Tokuda

In this paper, we describe a statistical approach to both an articulatory-to-acoustic mapping and an acoustic-to-articulatory inversion mapping without using phonetic information. The joint probability density of an articulatory parameter and an acoustic parameter is modeled using a Gaussian mixture model (GMM) based on a parallel acoustic-articulatory speech database. We apply the GMM-based mapping using the minimum mean-square error (MMSE) criterion, which has been proposed for voice conversion, to the two mappings. Moreover, to improve the mapping performance, we apply maximum likelihood estimation (MLE) to the GMM-based mapping method. The determination of a target parameter trajectory having appropriate static and dynamic properties is obtained by imposing an explicit relationship between static and dynamic features in the MLE-based mapping. Experimental results demonstrate that the MLE-based mapping with dynamic features can significantly improve the mapping performance compared with the MMSE-based mapping in both the articulatory-to-acoustic mapping and the inversion mapping.


IEICE Transactions on Information and Systems | 2007

A Hidden Semi-Markov Model-Based Speech Synthesis System

Heiga Zen; Keiichi Tokuda; Takashi Masuko; Takao Kobayasih; Tadashi Kitamura

A statistical speech synthesis system based on the hidden Markov model (HMM) was recently proposed. In this system, spectrum, excitation, and duration of speech are modeled simultaneously by context-dependent HMMs, and speech parameter vector sequences are generated from the HMMs themselves. This system defines a speech synthesis problem in a generative model framework and solves it based on the maximum likelihood (ML) criterion. However, there is an inconsistency: although state duration probability density functions (PDFs) are explicitly used in the synthesis part of the system, they have not been incorporated into its training part. This inconsistency can make the synthesized speech sound less natural. In this paper, we propose a statistical speech synthesis system based on a hidden semi-Markov model (HSMM), which can be viewed as an HMM with explicit state duration PDFs. The use of HSMMs can solve the above inconsistency because we can incorporate the state duration PDFs explicitly into both the synthesis and the training parts of the system. Subjective listening test results show that use of HSMMs improves the reported naturalness of synthesized speech.


IEICE Transactions on Information and Systems | 2007

Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005

Heiga Zen; Tomoki Toda; Masaru Nakamura; Keiichi Tokuda

In January 2005, an open evaluation of corpus-based text-to-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated in this challenge, entering an HMM-based speech synthesis system called Nitech-HTS 2005. This paper describes the technical details, building processes, and performance of our system. We first give an overview of the basic HMM-based speech synthesis system, and then describe new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM-based acoustic modeling, and a speech parameter generation algorithm considering GV. Constructed Nitech-HTS 2005 voices can generate speech waveforms at 0.3 ×RT (real-time ratio) on a 1.6 GHz Pentium 4 machine, and footprints of these voices are less than 2 Mbytes. Subjective listening tests showed that the naturalness and intelligibility of the Nitech-HTS 2005 voices were much better than expected.

Collaboration


Dive into the Keiichi Tokuda's collaboration.

Top Co-Authors

Avatar

Yoshihiko Nankaku

Nagoya Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Takao Kobayashi

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Heiga Zen

Nagoya Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Tadashi Kitamura

Nagoya Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Takashi Masuko

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Keiichiro Oura

Nagoya Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Kei Hashimoto

Nagoya Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Tomoki Toda

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Junichi Yamagishi

National Institute of Informatics

View shared research outputs
Top Co-Authors

Avatar

Satoshi Imai

Tokyo Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge