Kenji Nagamatsu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kenji Nagamatsu is active.

Explore More

Publication

Featured researches published by Kenji Nagamatsu.

international conference on acoustics, speech, and signal processing | 2006

Scalable Implementation Of Unit Selection Based Text-To-Speech System For Embedded Solutions

Nobuo Nukaga; Ryota Kamoshida; Kenji Nagamatsu; Yoshinori Kitahara

In this paper we propose two methods in order to implement unit selection-based text-to-speech engine into resource-limited embedded systems. While we have achieved improving the quality of synthesized speech by unit selection-based text-to-speech technology, there is a practical problem regarding the trade-off between the size of database and the quality of synthesized speech. That is, we need large database and expensive computation in order to generate highly natural sounding voices, and the text-to-speech system is required to meet the specification of target system. For this problem, we introduced frequency-based approaches to reduce the size of speech database. The experimental results showed the step-by-step downsizing method was better than the direct one in terms of the cumulative join cost and the target cost. Furthermore, some techniques were introduced and evaluated in order to implement our text-to-speech engine into an embedded system. From experimental results, it developed that the run-time work load for the test sentences was 80 MIPS approximately and the implemented engine was useful and scalable for mid-class embedded system

acm multimedia | 2018

Face-Voice Matching using Cross-modal Embeddings

Shota Horiguchi; Naoyuki Kanda; Kenji Nagamatsu

Face-voice matching is a task to find correspondence between faces and voices. Many researches in cognitive science have confirmed human ability in the face-voice matching tasks. Such ability is useful for creating natural human machine interaction systems and in many other applications. In this paper, we propose a face-voice matching model that learns cross-modal embeddings between face images and voice characteristics. We constructed a novel FVCeleb dataset which consists of face images and utterances from 1,078 persons. These persons were selected from the MS-Celeb-1M face image dataset and the VoxCeleb audio dataset. In two-alternative forced-choice matching task with an audio input and two face-image candidates of the same gender, our model achieved 62.2% and 56.5% accuracy on the FVCeleb and the subset of the GRID corpus, respectively. These results are very similar to human performance reported in cognitive science studies.

european signal processing conference | 2017

Independent vector analysis with frequency range division and prior switching

Rintaro Ikeshita; Yohei Kawaguchi; Masahito Togami; Yusuke Fujita; Kenji Nagamatsu

A novel source model is developed to improve the separation performance of independent vector analysis (IVA) for speech mixtures. The source model of IVA generally assumes the same amount of statistical dependency on each pair of frequency bins, which is not effective for speech signals with strong correlations among neighboring frequency bins. In the proposed model, the set of all frequency bins is divided into frequency bands, and the statistical dependency is assumed only within each band to better represent speech signals. In addition, each source prior is switched depending on the source states, active or inactive, since intermittent silent periods have totally different priors from those of speech periods. The optimization of the model is based on an EM algorithm, in which the IVA filters, states of sources, and permutation alignments between each pair of bands are jointly optimized. The experimental results show the effectiveness of the proposed model.

Archive | 2003