Kayoko Yanagisawa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kayoko Yanagisawa is active.

Explore More

Publication

Featured researches published by Kayoko Yanagisawa.

IEEE Journal of Selected Topics in Signal Processing | 2014

Building HMM-TTS Voices on Diverse Data

Vincent Wan; Javier Latorre; Kayoko Yanagisawa; Norbert Braunschweiler; Langzhou Chen; Mark J. F. Gales; Masami Akamine

The statistical models of hidden Markov model based text-to-speech (HMM-TTS) systems are typically built using homogeneous data. It is possible to acquire data from many different sources but combining them leads to a non-homogeneous or diverse dataset. This paper describes the application of average voice models (AVMs) and a novel application of cluster adaptive training (CAT) with multiple context dependent decision trees to create HMM-TTS voices using diverse data: speech data recorded in studios mixed with speech data obtained from the internet. Training AVM and CAT models on diverse data yields better quality speech than training on high quality studio data alone. Tests show that CAT is able to create a voice for a target speaker with as little as 7 seconds; an AVM would need more data to reach the same level of similarity to target speaker. Tests also show that CAT produces higher quality voices than AVMs irrespective of the amount of adaptation data. Lastly, it is shown that it is beneficial to model the data using multiple context clustering decision trees.

international conference on acoustics, speech, and signal processing | 2014

CLUSTER ADAPTIVE TRAINING OF AVERAGE VOICE MODELS

Vincent Wan; Javier Latorre; Kayoko Yanagisawa; Mark J. F. Gales; Yannis Stylianou

Hidden Markov model based text-to-speech systems may be adapted so that the synthesised speech sounds like a particular person. The average voice model (AVM) approach uses linear transforms to achieve this while multiple decision tree cluster adaptive training (CAT) represents different speakers as points in a low dimensional space. This paper describes a novel combination of CAT and AVM for modelling speakers. CAT yields higher quality synthetic speech than AVMs but AVMs model the target speaker better. The resulting combination may be interpreted as a more powerful version of the AVM. Results show that the combination achieves better target speaker similarity when compared with both AVM and CAT while the speech quality is in-between AVM and CAT.

international conference on acoustics, speech, and signal processing | 2016

Multi-stream spectral representation for statistical parametric speech synthesis

Kayoko Yanagisawa; Ranniery Maia; Yannis Stylianou

In statistical parametric speech synthesis such as Hidden Markov Model (HMM) based synthesis, one of the problems is in the over-smoothing of parameters, which leads to a muffled sensation in the synthesised output. In this paper, we propose an approach in which the high frequency spectrum is modelled separately from the low frequency spectrum. The high frequency band, which does not carry much linguistic information, is clustered using a very large decision tree so as to generate parameters as close as possible to natural speech samples. The boundary frequency can be adjusted at synthesis time for each state. Subjective listening tests show that the proposed approach is significantly preferred over the conventional approach using a single spectrum stream. Samples synthesised using the proposed approach sound less muffled and more natural.

Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment | 2013

Crowdsourced Assessment of Speech Synthesis

Sabine Buchholz; Javier Latorre; Kayoko Yanagisawa

conference of the international speech communication association | 2013

Photo-realistic expressive text to talking head synthesis.

Wan; Robert Anderson; A Blokland; Norbert Braunschweiler; Langzhou Chen; Bk Kolluru; Javier Latorre; Ranniery Maia; Björn Stenger; Kayoko Yanagisawa; Yannis Stylianou; M Akamine; Mjf Gales; Roberto Cipolla

Computer Vision and Image Understanding | 2016

Expressive visual text-to-speech as an assistive technology for individuals with autism spectrum conditions

Sarah Cassidy; Björn Stenger; L. Van Dongen; Kayoko Yanagisawa; Robert Anderson; Vincent Wan; Simon Baron-Cohen; Roberto Cipolla

SSW | 2013

Noise Robustness in HMM-TTS Speaker Adaptation

Kayoko Yanagisawa; Javier Latorre; Vincent Wan; Mark J. F. Gales; Simon King

conference of the international speech communication association | 2014

Speech intonation for TTS: study on evaluation methodology.

Javier Latorre; Kayoko Yanagisawa; Vincent Wan; BalaKrishna Kolluru; Mark J. F. Gales

Archive | 2014

Synthetic audiovisual storyteller

Javier Latorre-Martinez; Vincent Wan; Balakrishna Venkata Jagannadha Kolluru; Ioannis Stylianou; Robert Arthur Blokland; Norbert Braunschweiler; Kayoko Yanagisawa; Langzhou Chen; Ranniery Maia; Robert Anderson; Björn Stenger; Roberto Cipolla; Neil Baker

conference of the international speech communication association | 2014