Naohiro Tawara
Waseda University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Naohiro Tawara.
international conference on acoustics, speech, and signal processing | 2012
Naohiro Tawara; Tetsuji Ogawa; Shinji Watanabe; Tetsunori Kobayashi
This study aims to verify effective optimization methods for estimating parametric, fully Bayesian models in speech processing. For that purpose, we investigate the impact of the difference in optimization methods for the multi-scale Gaussian mixture model, which is suitable for speaker clustering, on the clustering accuracy. The Markov chain Monte Carlo (MCMC)-based method was compared with the variational Bayesian method in the speaker clustering experiment; with a small amount of data, the MCMC-based method was more effective; with large scale data (more than one million samples), the difference between these methods in terms of the clustering accuracy decreased and the MCMC-based method was computationally efficient.
international conference on acoustics, speech, and signal processing | 2015
Naohiro Tawara; Tetsuji Ogawa; Tetsunori Kobayashi
The present paper dealt with speaker clustering for speech corrupted by noise. In general, the performance of speaker clustering significantly depends on how well the similarities between speech utterances can be measured. The recently proposed i-vector-based cosine similarity has yielded the state-of-the-art performance in speaker clustering systems. However, this similarity often fails to capture the speaker similarity under noisy conditions. Therefore, we attempted to examine the efficiency of spectral clustering on i-vector-based similarity for speech corrupted by noise because spectral clustering can yield robustness against noise by non-linear projection. Experimental comparisons demonstrated that spectral clustering yielded significant improvement from conventional methods, such as agglomerative clustering and k-means clustering, under non-stationary noise conditions.
APSIPA Transactions on Signal and Information Processing | 2015
Naohiro Tawara; Tetsuji Ogawa; Shinji Watanabe; Atsushi Nakamura; Tetsunori Kobayashi
An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of non-parametric Bayesian modeling is implemented with the Markov chain Monte Carlo and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet process mixture model (UO-DPMM). The present paper demonstrates that UO-DPMM is successfully applied on large-scale data and outperforms the conventional hierarchical agglomerative clustering, especially for large amounts of utterances.
international workshop on machine learning for signal processing | 2013
Naohiro Tawara; Tetsuji Ogawa; Shinji Watanabe; Atsushi Nakamura; Tetsunori Kobayashi
A novel sampling method is proposed for estimating a continuous multi-scale mixture model. The multi-scale mixture models we assume have a hierarchical structure in which each component of the mixture is represented by a Gaussian mixture model (GMM). In speaker modeling from speech, this GMM represents intra-speaker dynamics derived from the difference in the attributes such as phoneme contexts and the existence of non-stationary noise and the mixture of GMMs (MoGMMs) represents inter-speaker dynamics derived from the difference in speakers. Gibbs sampling is a powerful technique to estimate such hierarchically structured models but can easily induce the local optima problem depending on its use especially when the elemental GMMs are complex in structure. To solve this problem, a highly accurate and robust sampling method based on the blocked Gibbs sampling and iterative conditional modes (ICM) is proposed and effectively applied for reducing a singularity solution given in the model with complex multi-modal distributions. In speaker clustering experiments under non-stationary noise, the proposed sampling-based model estimation improved the clustering performance by 17% on average compared to the conventional sampling-based methods.
conference of the international speech communication association | 2011
Naohiro Tawara; Shinji Watanabe; Tetsuji Ogawa; Tetsunori Kobayashi
international conference on acoustics, speech, and signal processing | 2018
Taira Tsuchiya; Naohiro Tawara; Tetsuji Ogawa; Tetsunori Kobayashi
international conference on acoustics, speech, and signal processing | 2018
Tsuyoshi Moriokal; Naohiro Tawara; Tetsuji Ogawa; Atsunori Ogawa; Tomoharu Iwata; Tetsunori Kobayashi
asia pacific signal and information processing association annual summit and conference | 2017
Hiroto Ashikawa; Naohiro Tawara; Atsunori Ogawa; Tomoharu Iwata; Tetsunori Kobayashi; Tetsuji Ogawa
Journal of The Japan Society for Precision Engineering | 2014
Kazuya Ueki; Youhei Shiraishi; Naohiro Tawara; Tetsunori Kobayashi
conference of the international speech communication association | 2012
Naohiro Tawara; Tetsuji Ogawa; Shinji Watanabe; Atsushi Nakamura; Tetsunori Kobayashi