Hirokazu Kameoka
University of Tokyo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hirokazu Kameoka.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Hirokazu Kameoka; Takuya Nishimoto; Shigeki Sagayama
This paper proposes a multipitch analyzer called the harmonic temporal structured clustering (HTC) method, that jointly estimates pitch, intensity, onset, duration, etc., of each underlying source in a multipitch audio signal. HTC decomposes the energy patterns diffused in time-frequency space, i.e., the power spectrum time series, into distinct clusters such that each has originated from a single source. The problem is equivalent to approximating the observed power spectrum time series by superimposed HTC source models, whose parameters are associated with the acoustic features that we wish to extract. The update equations of the HTC are explicitly derived by formulating the HTC source model with a Gaussian kernel representation. We verified through experiments the potential of the HTC method
international conference on acoustics, speech, and signal processing | 2009
Hirokazu Kameoka; Nobutaka Ono; Kunio Kashino; Shigeki Sagayama
This paper presents a new sparse representation for acoustic signals which is based on a mixing model defined in the complex-spectrum domain (where additivity holds), and allows us to extract recurrent patterns of magnitude spectra that underlie observed complex spectra and the phase estimates of constituent signals. An efficient iterative algorithm is derived, which reduces to the multiplicative update algorithm for non-negative matrix factorization developed by Lee under a particular condition.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Hiroshi Sawada; Hirokazu Kameoka; Shoko Araki; Naonori Ueda
This paper presents new formulations and algorithms for multichannel extensions of non-negative matrix factorization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of non-negative elements. Multichannel Euclidean distance and multichannel Itakura-Saito (IS) divergence are defined based on appropriate statistical models utilizing multivariate complex Gaussian distributions. To minimize this distance/divergence, efficient optimization algorithms in the form of multiplicative updates are derived by using properly designed auxiliary functions. Two methods are proposed for clustering NMF bases according to the estimated spatial property. Convolutive blind source separation (BSS) is performed by the multichannel extensions of NMF with the clustering mechanism. Experimental results show that 1) the derived multiplicative update rules exhibited good convergence behavior, and 2) BSS tasks for several music sources with two microphones and three instrumental parts were evaluated successfully.
international conference on acoustics, speech, and signal processing | 2009
Hirokazu Kameoka; Tomohiro Nakatani; Takuya Yoshioka
This paper presents a blind dereverberation method designed to recover the subband envelope of an original speech signal from its reverberant version. The problem is formulated as a blind deconvolution problem with non-negative constraints, regularized by the sparse nature of speech spectrograms. We derive an iterative algorithm for its optimization, which can be seen as a special case of the non-negative matrix factor deconvolution. We confirmed through experiments that the algorithm is fast and robust to speaker movement.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Shoichiro Saito; Hirokazu Kameoka; Keigo Takahashi; Takuya Nishimoto; Shigeki Sagayama
This paper introduces a new music signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of log-scaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with log-scaled frequency. Assuming that all tones in a polyphonic sound have a common harmonic pattern, the sound spectrum can be regarded as a sum of linearly stretched common harmonic structures along frequency. In the log-frequency domain, it is formulated as the convolution of a common harmonic structure and the distribution density of the fundamental frequencies of multiple tones. The fundamental frequency distribution can be found by deconvolving the observed spectrum with the assumed common harmonic structure, where the common harmonic structure is given heuristically or quasi-optimized with an iterative algorithm. The efficiency of specmurt analysis is experimentally demonstrated through generation of a piano-roll-like display from a polyphonic music signal and automatic sound-to-MIDI conversion. Multipitch estimation accuracy is evaluated over several polyphonic music signals and compared with manually annotated MIDI data.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
J. Le Roux; Hirokazu Kameoka; Nobutaka Ono; A. de Cheveigne; Shigeki Sagayama
This paper proposes a novel F0 contour estimation algorithm based on a precise parametric description of the voiced parts of speech derived from the power spectrum. The algorithm is able to perform in a wide variety of noisy environments as well as to estimate the F0s of cochannel concurrent speech. The speech spectrum is modeled as a sequence of spectral clusters governed by a common F0 contour expressed as a spline curve. These clusters are obtained by an unsupervised 2-D time-frequency clustering of the power density using a new formulation of the EM algorithm, and their common F 0 contour is estimated at the same time. A smooth F0 contour is extracted for the whole utterance, linking together its voiced parts. A noise model is used to cope with nonharmonic background noise, which would otherwise interfere with the clustering of the harmonic portions of speech. We evaluate our algorithm in comparison with existing methods on several tasks, and show 1) that it is competitive on clean single-speaker speech, 2) that it outperforms existing methods in the presence of noise, and 3) that it outperforms existing methods for the estimation of multiple F0 contours of cochannel concurrent speech
international workshop on machine learning for signal processing | 2010
Masahiro Nakano; Hirokazu Kameoka; Jonathan Le Roux; Yu Kitano; Nobutaka Ono; Shigeki Sagayama
This paper presents a new multiplicative algorithm for nonnegative matrix factorization with β-divergence. The derived update rules have a similar form to those of the conventional multiplicative algorithm, only differing through the presence of an exponent term depending on β. The convergence is theoretically proven for any real-valued β based on the auxiliary function method. The convergence speed is experimentally investigated in comparison with previous works.
international conference on acoustics, speech, and signal processing | 2012
Hiroshi Sawada; Hirokazu Kameoka; Shoko Araki; Naonori Ueda
This paper proposes new algorithms for multichannel extensions of nonnegative matrix factorization (NMF) with the Itakura-Saito (IS) divergence. We employ Hermitian positive definite matrices for modeling the covariance matrix of a multivariate complex Gaussian distribution. Such matrices are basically estimated for NMF bases, but a source separation task can be performed by introducing variables that relate NMF bases and sources. The new algorithms are derived by using a majorization scheme with properly designed auxiliary functions. The algorithms are in the form of multiplicative updates, and exhibit good convergence behavior. We have succeeded in separating a professionally produced music recording into its vocal and guitar components.
workshop on applications of signal processing to audio and acoustics | 2011
Masahiro Nakano; Jonathan Le Roux; Hirokazu Kameoka; Tomohiko Nakamura; Nobutaka Ono; Shigeki Sagayama
This paper presents a Bayesian nonparametric latent source discovery method for music signal analysis. In audio signal analysis, an important goal is to decompose music signals into individual notes, with applications such as music transcription, source separation or note-level manipulation. Recently, the use of latent variable decompositions, especially nonnegative matrix factorization (NMF), has been a very active area of research. These methods are facing two, mutually dependent, problems: first, instrument sounds often exhibit time-varying spectra, and grasping this time-varying nature is an important factor to characterize the diversity of each instrument; moreover, in many cases we do not know in advance the number of sources and which instruments are played. Conventional decompositions generally fail to cope with these issues as they suffer from the difficulties of automatically determining the number of sources and automatically grouping spectra into single events. We address both these problems by developing a Bayesian nonparametric fusion of NMF and hidden Markov model (HMM). Our model decomposes music spectrograms in an automatically estimated number of components, each of which consisting in an HMM whose number of states is also automatically estimated from the data.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Daichi Kitamura; Nobutaka Ono; Hiroshi Sawada; Hirokazu Kameoka; Hiroshi Saruwatari
This paper addresses the determined blind source separation problem and proposes a new effective method unifying independent vector analysis (IVA) and nonnegative matrix factorization (NMF). IVA is a state-of-the-art technique that utilizes the statistical independence between sources in a mixture signal, and an efficient optimization scheme has been proposed for IVA. However, since the source model in IVA is based on a spherical multivariate distribution, IVA cannot utilize specific spectral structures such as the harmonic structures of pitched instrumental sounds. To solve this problem, we introduce NMF decomposition as the source model in IVA to capture the spectral structures. The formulation of the proposed method is derived from conventional multichannel NMF (MNMF), which reveals the relationship between MNMF and IVA. The proposed method can be optimized by the update rules of IVA and single-channel NMF. Experimental results show the efficacy of the proposed method compared with IVA and MNMF in terms of separation accuracy and convergence speed.