Is this you? Create Your Porfile

Yamato Ohtani

Nara Institute of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yamato Ohtani is active.

Explore More

Publication

Featured researches published by Yamato Ohtani.

international conference on acoustics, speech, and signal processing | 2010

Non-parallel training for many-to-many eigenvoice conversion

Yamato Ohtani; Tomoki Toda; Hiroshi Saruwatari; Kiyohiro Shikano

This paper presents a novel training method of an eigenvoice Gaussian mixture model (EV-GMM) effectively using non-parallel data sets for many-to-many eigenvoice conversion, which is a technique for converting an arbitrary source speakers voice into an arbitrary target speakers voice. In the proposed method, an initial EV-GMM is trained with the conventional method using parallel data sets consisting of a single reference speaker and multiple pre-stored speakers. Then, the initial EV-GMM is further refined using non-parallel data sets including a larger number of pre-stored speakers while considering the reference speakers voices as hidden variables. The experimental results demonstrate that the proposed method yields significant quality improvements in converted speech by enabling us to use data of a larger number of pre-stored speakers.

Journal of the Acoustical Society of America | 2006

Evaluation of eigenvoice conversion based on Gaussian mixture model

Yamato Ohtani; Tomoki Toda; Hiroshi Saruwatari; Kiyohiro Shikano

Eigenvoice conversion (EVC) has been proposed as a new framework of voice conversion (VC) based on the Gaussian mixture model (GMM) [Toda et al., ‘‘Eigenvoice Conversion Based on Gaussian Mixture Model,’’ ICSLP, Pittsburgh, Sept. 2006]. This paper evaluates the performance of EVC in conversion from one source speaker’s voice to an arbitrary target speakers’ voices. This framework trains canonical GMM (EV‐GMM) in advance using multiple parallel data sets consisting of utterance pairs of the source and many prestored target speakers. This model is adapted to a specific target speaker by estimating a small number of free parameters using a few utterances of the target speaker. This paper compares spectral distortion between converted and target voices in EVC with conventional VC based on GMM when varying the amount of training data and the number of mixtures. Results show EVC outperforms conventional VC when using small amounts of training data. EVC can effectively train a complex conversion model using the ...

conference of the international speech communication association | 2016

Voice Quality Control Using Perceptual Expressions for Statistical Parametric Speech Synthesis Based on Cluster Adaptive Training.

Yamato Ohtani; Koichiro Mori; Masahiro Morita

This paper describes novel voice quality control of synthetic speech using cluster adaptive training (CAT). In this method, we model voice quality factors labeled with perceptual expressions such as “Gender,” “Age” and “Brightness.” In advance, we obtain the intensity scores of the perceptual expressions by conducting a listening test, which evaluates differences of voice qualities between synthetic speech of average voice and that of the target. Then we build perceptual expression (PE) clusters that we call PE models (PEM) under the conditions that the average voice model is used as the bias cluster and the PE intensity scores are employed as the CAT weights. In synthesis, we can generate controlled synthetic speech by the linear combination of PEMs and the existing speaker’s model. Subjective results demonstrate that the proposed method can control the voice qualities with PEs in many cases and the target synthetic speech modified by PEMs achieves comparatively good speech quality.

conference of the international speech communication association | 2006