Hirokazu Kameoka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hirokazu Kameoka is active.

Explore More

Publication

Featured researches published by Hirokazu Kameoka.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering

Hirokazu Kameoka; Takuya Nishimoto; Shigeki Sagayama

This paper proposes a multipitch analyzer called the harmonic temporal structured clustering (HTC) method, that jointly estimates pitch, intensity, onset, duration, etc., of each underlying source in a multipitch audio signal. HTC decomposes the energy patterns diffused in time-frequency space, i.e., the power spectrum time series, into distinct clusters such that each has originated from a single source. The problem is equivalent to approximating the observed power spectrum time series by superimposed HTC source models, whose parameters are associated with the acoustic features that we wish to extract. The update equations of the HTC are explicitly derived by formulating the HTC source model with a Gaussian kernel representation. We verified through experiments the potential of the HTC method

international conference on acoustics, speech, and signal processing | 2009

Complex NMF: A new sparse representation for acoustic signals

Hirokazu Kameoka; Nobutaka Ono; Kunio Kashino; Shigeki Sagayama

This paper presents a new sparse representation for acoustic signals which is based on a mixing model defined in the complex-spectrum domain (where additivity holds), and allows us to extract recurrent patterns of magnitude spectra that underlie observed complex spectra and the phase estimates of constituent signals. An efficient iterative algorithm is derived, which reduces to the multiplicative update algorithm for non-negative matrix factorization developed by Lee under a particular condition.

IEEE Transactions on Audio, Speech, and Language Processing | 2013

Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data

Hiroshi Sawada; Hirokazu Kameoka; Shoko Araki; Naonori Ueda

This paper presents new formulations and algorithms for multichannel extensions of non-negative matrix factorization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of non-negative elements. Multichannel Euclidean distance and multichannel Itakura-Saito (IS) divergence are defined based on appropriate statistical models utilizing multivariate complex Gaussian distributions. To minimize this distance/divergence, efficient optimization algorithms in the form of multiplicative updates are derived by using properly designed auxiliary functions. Two methods are proposed for clustering NMF bases according to the estimated spatial property. Convolutive blind source separation (BSS) is performed by the multichannel extensions of NMF with the clustering mechanism. Experimental results show that 1) the derived multiplicative update rules exhibited good convergence behavior, and 2) BSS tasks for several music sources with two microphones and three instrumental parts were evaluated successfully.

international conference on acoustics, speech, and signal processing | 2009

Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms

Hirokazu Kameoka; Tomohiro Nakatani; Takuya Yoshioka

This paper presents a blind dereverberation method designed to recover the subband envelope of an original speech signal from its reverberant version. The problem is formulated as a blind deconvolution problem with non-negative constraints, regularized by the sparse nature of speech spectrograms. We derive an iterative algorithm for its optimization, which can be seen as a special case of the non-negative matrix factor deconvolution. We confirmed through experiments that the algorithm is fast and robust to speaker movement.

IEEE Transactions on Audio, Speech, and Language Processing | 2008

Specmurt Analysis of Polyphonic Music Signals

Shoichiro Saito; Hirokazu Kameoka; Keigo Takahashi; Takuya Nishimoto; Shigeki Sagayama

This paper introduces a new music signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of log-scaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with log-scaled frequency. Assuming that all tones in a polyphonic sound have a common harmonic pattern, the sound spectrum can be regarded as a sum of linearly stretched common harmonic structures along frequency. In the log-frequency domain, it is formulated as the convolution of a common harmonic structure and the distribution density of the fundamental frequencies of multiple tones. The fundamental frequency distribution can be found by deconvolving the observed spectrum with the assumed common harmonic structure, where the common harmonic structure is given heuristically or quasi-optimized with an iterative algorithm. The efficiency of specmurt analysis is experimentally demonstrated through generation of a piano-roll-like display from a polyphonic music signal and automatic sound-to-MIDI conversion. Multipitch estimation accuracy is evaluated over several polyphonic music signals and compared with manually annotated MIDI data.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

Single and Multiple

J. Le Roux; Hirokazu Kameoka; Nobutaka Ono; A. de Cheveigne; Shigeki Sagayama

This paper proposes a novel F0 contour estimation algorithm based on a precise parametric description of the voiced parts of speech derived from the power spectrum. The algorithm is able to perform in a wide variety of noisy environments as well as to estimate the F0s of cochannel concurrent speech. The speech spectrum is modeled as a sequence of spectral clusters governed by a common F0 contour expressed as a spline curve. These clusters are obtained by an unsupervised 2-D time-frequency clustering of the power density using a new formulation of the EM algorithm, and their common F 0 contour is estimated at the same time. A smooth F0 contour is extracted for the whole utterance, linking together its voiced parts. A noise model is used to cope with nonharmonic background noise, which would otherwise interfere with the clustering of the harmonic portions of speech. We evaluate our algorithm in comparison with existing methods on several tasks, and show 1) that it is competitive on clean single-speaker speech, 2) that it outperforms existing methods in the presence of noise, and 3) that it outperforms existing methods for the estimation of multiple F0 contours of cochannel concurrent speech

international workshop on machine learning for signal processing | 2010

{ F}_{0}

Masahiro Nakano; Hirokazu Kameoka; Jonathan Le Roux; Yu Kitano; Nobutaka Ono; Shigeki Sagayama

This paper presents a new multiplicative algorithm for nonnegative matrix factorization with β-divergence. The derived update rules have a similar form to those of the conventional multiplicative algorithm, only differing through the presence of an exponent term depending on β. The convergence is theoretically proven for any real-valued β based on the auxiliary function method. The convergence speed is experimentally investigated in comparison with previous works.

international conference on acoustics, speech, and signal processing | 2012

Contour Estimation Through Parametric Spectrogram Modeling of Speech in Noisy Environments

Hiroshi Sawada; Hirokazu Kameoka; Shoko Araki; Naonori Ueda

This paper proposes new algorithms for multichannel extensions of nonnegative matrix factorization (NMF) with the Itakura-Saito (IS) divergence. We employ Hermitian positive definite matrices for modeling the covariance matrix of a multivariate complex Gaussian distribution. Such matrices are basically estimated for NMF bases, but a source separation task can be performed by introducing variables that relate NMF bases and sources. The new algorithms are derived by using a majorization scheme with properly designed auxiliary functions. The algorithms are in the form of multiplicative updates, and exhibit good convergence behavior. We have succeeded in separating a professionally produced music recording into its vocal and guitar components.

workshop on applications of signal processing to audio and acoustics | 2011

Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence

Masahiro Nakano; Jonathan Le Roux; Hirokazu Kameoka; Tomohiko Nakamura; Nobutaka Ono; Shigeki Sagayama

This paper presents a Bayesian nonparametric latent source discovery method for music signal analysis. In audio signal analysis, an important goal is to decompose music signals into individual notes, with applications such as music transcription, source separation or note-level manipulation. Recently, the use of latent variable decompositions, especially nonnegative matrix factorization (NMF), has been a very active area of research. These methods are facing two, mutually dependent, problems: first, instrument sounds often exhibit time-varying spectra, and grasping this time-varying nature is an important factor to characterize the diversity of each instrument; moreover, in many cases we do not know in advance the number of sources and which instruments are played. Conventional decompositions generally fail to cope with these issues as they suffer from the difficulties of automatically determining the number of sources and automatically grouping spectra into single events. We address both these problems by developing a Bayesian nonparametric fusion of NMF and hidden Markov model (HMM). Our model decomposes music spectrograms in an automatically estimated number of components, each of which consisting in an HMM whose number of states is also automatically estimated from the data.

IEEE Transactions on Audio, Speech, and Language Processing | 2016

Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization

Daichi Kitamura; Nobutaka Ono; Hiroshi Sawada; Hirokazu Kameoka; Hiroshi Saruwatari

This paper addresses the determined blind source separation problem and proposes a new effective method unifying independent vector analysis (IVA) and nonnegative matrix factorization (NMF). IVA is a state-of-the-art technique that utilizes the statistical independence between sources in a mixture signal, and an efficient optimization scheme has been proposed for IVA. However, since the source model in IVA is based on a spherical multivariate distribution, IVA cannot utilize specific spectral structures such as the harmonic structures of pitched instrumental sounds. To solve this problem, we introduce NMF decomposition as the source model in IVA to capture the spectral structures. The formulation of the proposed method is derived from conventional multichannel NMF (MNMF), which reveals the relationship between MNMF and IVA. The proposed method can be optimized by the update rules of IVA and single-channel NMF. Experimental results show the efficacy of the proposed method compared with IVA and MNMF in terms of separation accuracy and convergence speed.

Explore More