Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nobukatsu Hojo is active.

Publication


Featured researches published by Nobukatsu Hojo.


international conference on acoustics, speech, and signal processing | 2017

Generative adversarial network-based postfilter for statistical parametric speech synthesis

Takuhiro Kaneko; Hirokazu Kameoka; Nobukatsu Hojo; Yusuke Ijima; Kaoru Hiramatsu; Kunio Kashino

We propose a postfilter based on a generative adversarial network (GAN) to compensate for the differences between natural speech and speech synthesized by statistical parametric speech synthesis. In particular, we focus on the differences caused by over-smoothing, which makes the sounds muffled. Over-smoothing occurs in the time and frequency directions and is highly correlated in both directions, and conventional methods based on heuristics are too limited to cover all the factors (e.g., global variance was designed only to recover the dynamic range). To solve this problem, we focus on “spectral texture”, i.e., the details of the time-frequency representation, and propose a learning-based postfilter that captures the structures directly from the data. To estimate the true distribution, we utilize a GAN composed of a generator and a discriminator. This optimizes the generator to produce samples imitating the dataset according to the adversarial discriminator. This adversarial process encourages the generator to fit the true data distribution, i.e., to generate realistic spectral texture. Objective evaluation of experimental results shows that the GAN-based postfilter can compensate for detailed spectral structures including modulation spectrum, and subjective evaluation shows that its generated speech is comparable to natural speech.


conference of the international speech communication association | 2016

An Investigation of DNN-Based Speech Synthesis Using Speaker Codes.

Nobukatsu Hojo; Yusuke Ijima; Hideyuki Mizuno

Recent studies have shown that DNN-based speech synthesis can produce more natural synthesized speech than the conventional HMM-based speech synthesis. However, an open problem remains as to whether the synthesized speech quality can be improved by utilizing a multi-speaker speech corpus. To address this problem, this paper proposes DNN-based speech synthesis using speaker codes as a simple method to improve the performance of the conventional speaker dependent DNN-based method. In order to model speaker variation in the DNN, the augmented feature (speaker codes) is fed to the hidden layer(s) of the conventional DNN. The proposed method trains connection weights of the whole DNN using a multispeaker speech corpus. When synthesizing a speech parameter sequence, a target speaker is chosen from the corpus and the speaker code corresponding to the selected target speaker is fed to the DNN to generate the speaker’s voice. We investigated the relationship between the prediction performance and architecture of the DNNs by changing the input hidden layer for speaker codes. Experimental results showed that the proposed model outperformed the conventional speaker-dependent DNN when the model architecture was set at optimal for the amount of training data of the selected target speaker.


conference of the international speech communication association | 2014

Speech prosody generation for text-to-speech synthesis based on generative model of F 0 contours.

Kento Kadowaki; Tatsuma Ishihara; Nobukatsu Hojo; Hirokazu Kameoka


Archive | 2018

Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms.

Keisuke Oyamada; Hirokazu Kameoka; Takuhiro Kaneko; Kou Tanaka; Nobukatsu Hojo; Hiroyasu Ando


SSW | 2013

Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models

Nobukatsu Hojo; Kota Yoshizato; Hirokazu Kameoka; Daisuke Saito; Shigeki Sagayama


arxiv:eess.AS | 2018

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks.

Kou Tanaka; Takuhiro Kaneko; Nobukatsu Hojo; Hirokazu Kameoka


arXiv: Sound | 2018

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks.

Hirokazu Kameoka; Takuhiro Kaneko; Kou Tanaka; Nobukatsu Hojo


arXiv: Machine Learning | 2018

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder.

Hirokazu Kameoka; Takuhiro Kaneko; Kou Tanaka; Nobukatsu Hojo


IEICE Transactions on Information and Systems | 2018

DNN-Based Speech Synthesis Using Speaker Codes

Nobukatsu Hojo; Yusuke Ijima; Hideyuki Mizuno


conference of the international speech communication association | 2017

Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis.

Yusuke Ijima; Nobukatsu Hojo; Ryo Masumura; Taichi Asami

Collaboration


Dive into the Nobukatsu Hojo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yusuke Ijima

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Kou Tanaka

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kaoru Hiramatsu

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge