Nobukatsu Hojo
University of Tokyo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nobukatsu Hojo.
international conference on acoustics, speech, and signal processing | 2017
Takuhiro Kaneko; Hirokazu Kameoka; Nobukatsu Hojo; Yusuke Ijima; Kaoru Hiramatsu; Kunio Kashino
We propose a postfilter based on a generative adversarial network (GAN) to compensate for the differences between natural speech and speech synthesized by statistical parametric speech synthesis. In particular, we focus on the differences caused by over-smoothing, which makes the sounds muffled. Over-smoothing occurs in the time and frequency directions and is highly correlated in both directions, and conventional methods based on heuristics are too limited to cover all the factors (e.g., global variance was designed only to recover the dynamic range). To solve this problem, we focus on “spectral texture”, i.e., the details of the time-frequency representation, and propose a learning-based postfilter that captures the structures directly from the data. To estimate the true distribution, we utilize a GAN composed of a generator and a discriminator. This optimizes the generator to produce samples imitating the dataset according to the adversarial discriminator. This adversarial process encourages the generator to fit the true data distribution, i.e., to generate realistic spectral texture. Objective evaluation of experimental results shows that the GAN-based postfilter can compensate for detailed spectral structures including modulation spectrum, and subjective evaluation shows that its generated speech is comparable to natural speech.
conference of the international speech communication association | 2016
Nobukatsu Hojo; Yusuke Ijima; Hideyuki Mizuno
Recent studies have shown that DNN-based speech synthesis can produce more natural synthesized speech than the conventional HMM-based speech synthesis. However, an open problem remains as to whether the synthesized speech quality can be improved by utilizing a multi-speaker speech corpus. To address this problem, this paper proposes DNN-based speech synthesis using speaker codes as a simple method to improve the performance of the conventional speaker dependent DNN-based method. In order to model speaker variation in the DNN, the augmented feature (speaker codes) is fed to the hidden layer(s) of the conventional DNN. The proposed method trains connection weights of the whole DNN using a multispeaker speech corpus. When synthesizing a speech parameter sequence, a target speaker is chosen from the corpus and the speaker code corresponding to the selected target speaker is fed to the DNN to generate the speaker’s voice. We investigated the relationship between the prediction performance and architecture of the DNNs by changing the input hidden layer for speaker codes. Experimental results showed that the proposed model outperformed the conventional speaker-dependent DNN when the model architecture was set at optimal for the amount of training data of the selected target speaker.
conference of the international speech communication association | 2014
Kento Kadowaki; Tatsuma Ishihara; Nobukatsu Hojo; Hirokazu Kameoka
Archive | 2018
Keisuke Oyamada; Hirokazu Kameoka; Takuhiro Kaneko; Kou Tanaka; Nobukatsu Hojo; Hiroyasu Ando
SSW | 2013
Nobukatsu Hojo; Kota Yoshizato; Hirokazu Kameoka; Daisuke Saito; Shigeki Sagayama
arxiv:eess.AS | 2018
Kou Tanaka; Takuhiro Kaneko; Nobukatsu Hojo; Hirokazu Kameoka
arXiv: Sound | 2018
Hirokazu Kameoka; Takuhiro Kaneko; Kou Tanaka; Nobukatsu Hojo
arXiv: Machine Learning | 2018
Hirokazu Kameoka; Takuhiro Kaneko; Kou Tanaka; Nobukatsu Hojo
IEICE Transactions on Information and Systems | 2018
Nobukatsu Hojo; Yusuke Ijima; Hideyuki Mizuno
conference of the international speech communication association | 2017
Yusuke Ijima; Nobukatsu Hojo; Ryo Masumura; Taichi Asami