Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yoshinori Shiga is active.

Publication


Featured researches published by Yoshinori Shiga.


IEEE Journal of Selected Topics in Signal Processing | 2014

Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis

Shinnosuke Takamichi; Tomoki Toda; Yoshinori Shiga; Sakriani Sakti; Graham Neubig; Satoshi Nakamura

In this paper, we propose parameter generation methods using rich context models as yet another hybrid method combining Hidden Markov Model (HMM)-based speech synthesis and unit selection synthesis. Traditional HMM-based speech synthesis enables flexible modeling of acoustic features based on a statistical approach. However, the speech parameters tend to be excessively smoothed. To address this problem, several hybrid methods combining HMM-based speech synthesis and unit selection synthesis have been proposed. Although they significantly improve quality of synthetic speech, they usually lose flexibility of the original HMM-based speech synthesis. In the proposed methods, we use rich context models, which are statistical models that represent individual acoustic parameter segments. In training, the rich context models are reformulated as Gaussian Mixture Models (GMMs). In synthesis, initial speech parameters are generated from probability distributions over-fitted to individual segments, and the speech parameter sequence is iteratively generated from GMMs using a parameter generation method based on the maximum likelihood criterion. Since the basic framework of the proposed methods is still the same as the traditional framework, the capability of flexibly modeling acoustic features remains. The experimental results demonstrate: (1) the use of approximation with a single Gaussian component sequence yields better synthetic speech quality than the use of EM algorithm in the proposed parameter generation method, (2) the state-based model selection yields quality improvements at the same level as the frame-based model selection, (3) the use of the initial parameters generated from the over-fitted speech probability distributions is very effective to further improve speech quality, and (4) the proposed methods for spectral and F0 components yields significant improvements in synthetic speech quality compared with the traditional HMM-based speech synthesis.


mobile data management | 2013

Multilingual Speech-to-Speech Translation System: VoiceTra

Shigeki Matsuda; Xinhui Hu; Yoshinori Shiga; Hideki Kashioka; Chiori Hori; Keiji Yasuda; Hideo Okuma; Masao Uchiyama; Eiichiro Sumita; Hisashi Kawai; Satoshi Nakamura

This study presents an overview of VoiceTra, which was developed by NICT and released as the worlds first network-based multilingual speech-to-speech translation system for smartphones, and describes in detail its multilingual speech recognition, its multilingual translation, and its multilingual speech synthesis in regards to field experiments. We show the effects of system updates using the data collected from field experiments to improve our acoustic and language models.


conference of the international speech communication association | 2016

Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework.

Kentaro Tachibana; Tomoki Toda; Yoshinori Shiga; Hisashi Kawai

In this paper, we propose a model integration method for hidden Markov model (HMM) and deep neural network (DNN) based acoustic models using a product-of-experts (PoE) framework in statistical parametric speech synthesis. In speech parameter generation, DNN predicts a mean vector of the probability density function of speech parameters frame by frame while keeping its covariance matrix constant over all frames. On the other hand, HMM predicts the covariance matrix as well as the mean vector but they are fixed within the same HMM state, i.e., they can actually vary state by state. To make it possible to predict a better probability density function by leveraging advantages of individual models, the proposed method integrates DNN and HMM as PoE, generating a new probability density function satisfying conditions of both DNN and HMM. Furthermore, we propose a joint optimization method of DNN and HMM within the PoE framework by effectively using additional latent variables. We conducted objective and subjective evaluations, demonstrating that the proposed method significantly outperforms the DNN-based speech synthesis as well as the HMM-based speech synthesis.


conference of the international speech communication association | 2016

Using Zero-Frequency Resonator to Extract Multilingual Intonation Structure.

Jinfu Ni; Yoshinori Shiga; Hisashi Kawai

Human uses expressive intonation to convey linguistic and paralinguistic meaning, especially making focal prominence to give emphasis that highlights the focus of speech. Automatic extraction of dynamic intonation feature from a speech corpus and representing it in a continuous form are desired in multilingual speech synthesis. This paper presents a method to extract dynamic prosodic structure from speech signal using zerofrequency resonator to detect glottal cycle epoch and filter both voice amplitude and fundamental frequency (F0) contours. We choose stable voice F0 segments free from micro-prosodic effect to recover relevant F0 trajectory of an utterance, taking into consideration of inter-correlation of micro-prosody with phonetic segments and syllable structure of the utterance, and further filter out long-term global pitch movements. The method is evaluated by objective tests upon multilingual speech corpora including Chinese, Japanese, Korean, and Myanmar. Our experiment results show that the extracted intonation contour can match F0 contour by conventional approach in very high accuracy and the estimated long-term pitch movements demonstrate regular characteristics of intonation across languages. The proposed method is language-independent and robust to noisy speech.


Journal of the Acoustical Society of America | 2000

Speech synthesizing apparatus, and recording medium that stores text-to-speech conversion program and can be read mechanically

Yoshinori Shiga


Journal of the Acoustical Society of America | 2004

CLUSTERED PATTERNS FOR TEXT-TO-SPEECH SYNTHESIS

Takehiko Kagoshima; Takaaki Nii; Shigenobu Seto; Masahiro Morita; Masami Akamine; Yoshinori Shiga


conference of the international speech communication association | 2003

Estimating the Spectral Envelope of Voiced Speech Using Multi-Frame Analysis

Yoshinori Shiga; Simon King


conference of the international speech communication association | 2003

Estimation of Voice Source and Vocal Tract Characteristics Based on Multi-Frame Analysis

Yoshinori Shiga; Simon King


Archive | 1999

Speech synthesizer and machine readable recording medium which records sentence to speech converting program

Yoshinori Shiga; 芳則 志賀


Archive | 1999

Voice synthesizing method and voice synthesizer and recording medium recorded with text voice converting program

Yoshinori Shiga; 芳則 志賀

Collaboration


Dive into the Yoshinori Shiga's collaboration.

Top Co-Authors

Avatar

Hisashi Kawai

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jinfu Ni

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Satoshi Nakamura

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Simon King

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Chiori Hori

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Eiichiro Sumita

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Kentaro Tachibana

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Sakriani Sakti

Nara Institute of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge