Kayoko Yanagisawa
Toshiba
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kayoko Yanagisawa.
IEEE Journal of Selected Topics in Signal Processing | 2014
Vincent Wan; Javier Latorre; Kayoko Yanagisawa; Norbert Braunschweiler; Langzhou Chen; Mark J. F. Gales; Masami Akamine
The statistical models of hidden Markov model based text-to-speech (HMM-TTS) systems are typically built using homogeneous data. It is possible to acquire data from many different sources but combining them leads to a non-homogeneous or diverse dataset. This paper describes the application of average voice models (AVMs) and a novel application of cluster adaptive training (CAT) with multiple context dependent decision trees to create HMM-TTS voices using diverse data: speech data recorded in studios mixed with speech data obtained from the internet. Training AVM and CAT models on diverse data yields better quality speech than training on high quality studio data alone. Tests show that CAT is able to create a voice for a target speaker with as little as 7 seconds; an AVM would need more data to reach the same level of similarity to target speaker. Tests also show that CAT produces higher quality voices than AVMs irrespective of the amount of adaptation data. Lastly, it is shown that it is beneficial to model the data using multiple context clustering decision trees.
international conference on acoustics, speech, and signal processing | 2014
Vincent Wan; Javier Latorre; Kayoko Yanagisawa; Mark J. F. Gales; Yannis Stylianou
Hidden Markov model based text-to-speech systems may be adapted so that the synthesised speech sounds like a particular person. The average voice model (AVM) approach uses linear transforms to achieve this while multiple decision tree cluster adaptive training (CAT) represents different speakers as points in a low dimensional space. This paper describes a novel combination of CAT and AVM for modelling speakers. CAT yields higher quality synthetic speech than AVMs but AVMs model the target speaker better. The resulting combination may be interpreted as a more powerful version of the AVM. Results show that the combination achieves better target speaker similarity when compared with both AVM and CAT while the speech quality is in-between AVM and CAT.
international conference on acoustics, speech, and signal processing | 2016
Kayoko Yanagisawa; Ranniery Maia; Yannis Stylianou
In statistical parametric speech synthesis such as Hidden Markov Model (HMM) based synthesis, one of the problems is in the over-smoothing of parameters, which leads to a muffled sensation in the synthesised output. In this paper, we propose an approach in which the high frequency spectrum is modelled separately from the low frequency spectrum. The high frequency band, which does not carry much linguistic information, is clustered using a very large decision tree so as to generate parameters as close as possible to natural speech samples. The boundary frequency can be adjusted at synthesis time for each state. Subjective listening tests show that the proposed approach is significantly preferred over the conventional approach using a single spectrum stream. Samples synthesised using the proposed approach sound less muffled and more natural.
Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment | 2013
Sabine Buchholz; Javier Latorre; Kayoko Yanagisawa
conference of the international speech communication association | 2013
Wan; Robert Anderson; A Blokland; Norbert Braunschweiler; Langzhou Chen; Bk Kolluru; Javier Latorre; Ranniery Maia; Björn Stenger; Kayoko Yanagisawa; Yannis Stylianou; M Akamine; Mjf Gales; Roberto Cipolla
Computer Vision and Image Understanding | 2016
Sarah Cassidy; Björn Stenger; L. Van Dongen; Kayoko Yanagisawa; Robert Anderson; Vincent Wan; Simon Baron-Cohen; Roberto Cipolla
SSW | 2013
Kayoko Yanagisawa; Javier Latorre; Vincent Wan; Mark J. F. Gales; Simon King
conference of the international speech communication association | 2014
Javier Latorre; Kayoko Yanagisawa; Vincent Wan; BalaKrishna Kolluru; Mark J. F. Gales
Archive | 2014
Javier Latorre-Martinez; Vincent Wan; Balakrishna Venkata Jagannadha Kolluru; Ioannis Stylianou; Robert Arthur Blokland; Norbert Braunschweiler; Kayoko Yanagisawa; Langzhou Chen; Ranniery Maia; Robert Anderson; Björn Stenger; Roberto Cipolla; Neil Baker
conference of the international speech communication association | 2014
BalaKrishna Kolluru; Vincent Wan; Javier Latorre; Kayoko Yanagisawa; Mark J. F. Gales