Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Takuya Nishimoto is active.

Publication


Featured researches published by Takuya Nishimoto.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering

Hirokazu Kameoka; Takuya Nishimoto; Shigeki Sagayama

This paper proposes a multipitch analyzer called the harmonic temporal structured clustering (HTC) method, that jointly estimates pitch, intensity, onset, duration, etc., of each underlying source in a multipitch audio signal. HTC decomposes the energy patterns diffused in time-frequency space, i.e., the power spectrum time series, into distinct clusters such that each has originated from a single source. The problem is equivalent to approximating the observed power spectrum time series by superimposed HTC source models, whose parameters are associated with the acoustic features that we wish to extract. The update equations of the HTC are explicitly derived by formulating the HTC source model with a Gaussian kernel representation. We verified through experiments the potential of the HTC method


international conference on acoustics, speech, and signal processing | 2010

HMM-based approach for automatic chord detection using refined acoustic features

Yushi Ueda; Yuuki Uchiyama; Takuya Nishimoto; Nobutaka Ono; Shigeki Sagayama

We discuss an HMM-based method for detecting the chord sequence from musical acoustic signals using percussion-suppressed, Fourier-transformed chroma and delta-chroma features. To reduce the interference often caused by percussive sounds in popular music, we use Harmonic/Percussive Sound Separation (HPSS) technique to suppress percussive sounds and to emphasize harmonic sound components. We also use the Fourier transform of chroma to approximately diagonalize the covariance matrix of feature parameters so as to reduce the number of model parameters without degrading performance. It is shown that HMM with the new features yields higher recognition rates (the best in MIREX 2008 audio chord detection task) than that with conventional features.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Specmurt Analysis of Polyphonic Music Signals

Shoichiro Saito; Hirokazu Kameoka; Keigo Takahashi; Takuya Nishimoto; Shigeki Sagayama

This paper introduces a new music signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of log-scaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with log-scaled frequency. Assuming that all tones in a polyphonic sound have a common harmonic pattern, the sound spectrum can be regarded as a sum of linearly stretched common harmonic structures along frequency. In the log-frequency domain, it is formulated as the convolution of a common harmonic structure and the distribution density of the fundamental frequencies of multiple tones. The fundamental frequency distribution can be found by deconvolving the observed spectrum with the assumed common harmonic structure, where the common harmonic structure is given heuristically or quasi-optimized with an iterative algorithm. The efficiency of specmurt analysis is experimentally demonstrated through generation of a piano-roll-like display from a polyphonic music signal and automatic sound-to-MIDI conversion. Multipitch estimation accuracy is evaluated over several polyphonic music signals and compared with manually annotated MIDI data.


international conference on acoustics, speech, and signal processing | 2004

Separation of harmonic structures based on tied Gaussian mixture model and information criterion for concurrent sounds

H. Katmeoka; Takuya Nishimoto; Shigeki Sagayama

A method for the separation of harmonic structures of cochannel input concurrent sounds is described. A model for multiple harmonic structures is constructed with a mixture of tied Gaussian mixtures, from which a single harmonic structure is modeled. Our algorithm enables estimation of both the number and the shape of the underlying harmonic structures, based on a maximum likelihood estimation of the model parameters using the EM algorithm and an information criterion. It operates without restriction on the number of mixed sounds and varieties of sound sources, and extracts accurate fundamental frequencies continuously with simple procedures in the spectral domain. Experiments showed high performance of the algorithm for both simultaneous speech and polyphonic music.


Life-like characters | 2004

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents

Shinichi Kawamoto; Hiroshi Shimodaira; Tsuneo Nitta; Takuya Nishimoto; Satoshi Nakamura; Katsunobu Itou; Shigeo Morishima; Tatsuo Yotsukura; Atsuhiko Kai; Akinobu Lee; Yoichi Yamashita; Takao Kobayashi; Keiichi Tokuda; Keikichi Hirose; Nobuaki Minematsu; Atsushi Yamada; Yasuharu Den; Takehito Utsuro; Shigeki Sagayama

Galatea is a software toolkit to develop a human-like spoken dialog agent. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial animation synthesizer, and dialog controller, each module is modeled as a virtual machine having a simple common interface and connected to each other through a broker (communication manager). Galatea employs model-based speech and facial animation synthesizers whose model parameters are adapted easily to those for an existing person if his or her training data is given. The software toolkit that runs on both UNIX/Linux and Windows operating systems will be publicly available in the middle of 2003 [7, 6].


international conference on acoustics, speech, and signal processing | 2006

Model Adaptation for Long Convolutional Distortion by Maximum Likelihood Based State Filtering Approach

Chandra Kant Raut; Takuya Nishimoto; Shigeki Sagayama

In environment with considerably long reverberation time, each frame of speech is affected by energy components from the preceding frames. Therefore, to adapt parameters of a state of HMM, it becomes necessary to consider these frames, and compute their contributions to current state. However, these speech frames preceding to a state of HMM are not known during adaptation of the models. In this paper, we propose to use preceding states as units of preceding speech segments, estimate their contributions to current state in maximum likelihood manner, and adapt models by accounting their contributions. When clean models were adapted by proposed method for a speaker-dependent isolated word recognition task, word accuracy of the system typically increased from 67.6% to 83.2%, and from 44.8% to 72.5%, for channel distorted speech simulated by linear convolution of clean speech and impulse responses with reverberation time (T60) of 310 ms and 780 ms, respectively


IEEE Journal of Selected Topics in Signal Processing | 2011

Polyphonic Pitch Estimation and Instrument Identification by Joint Modeling of Sustained and Attack Sounds

Jun Wu; Emmanuel Vincent; Stanislaw Andrzej Raczynski; Takuya Nishimoto; Nobutaka Ono; Shigeki Sagayama

Polyphonic pitch estimation and musical instrument identification are some of the most challenging tasks in the field of music information retrieval (MIR). While existing approaches have focused on the modeling of harmonic partials, we design a joint Gaussian mixture model of the harmonic partials and the inharmonic attack of each note. This model encodes the power of each partial over time as well as the spectral envelope of the attack part. We derive an expectation-maximization (EM) algorithm to estimate the pitch and the parameters of the notes. We then extract timbre features both from the harmonic and the attack part via principal component analysis (PCA) over the estimated model parameters. Musical instrument recognition for each estimated note is finally carried out with a support vector machine (SVM) classifier. Experiments conducted on mixtures of isolated notes as well as real-world polyphonic music show higher accuracy over state-of-the-art approaches based on the modeling of harmonic partials only.


Advances in Music Information Retrieval | 2010

Harmonic and Percussive Sound Separation and Its Application to MIR-Related Tasks

Nobutaka Ono; Kenichi Miyamoto; Hirokazu Kameoka; Jonathan Le Roux; Yuuki Uchiyama; Emiru Tsunoo; Takuya Nishimoto; Shigeki Sagayama

In this chapter, we present a simple and fast method to separate a monaural audio signal into harmonic and percussive components, which leads to a useful pre-processing for MIR-related tasks. Exploiting the anisotropies of the power spectrograms of harmonic and percussive components, we define objective functions based on spectrogram gradients, and, applying to them the auxiliary function approach, we derive simple and fast update equations which guarantee the decrease of the objective function at each iteration. We show experimental results for sound separation on popular and jazz music pieces, and also present the application of the proposed technique to automatic chord recognition and rhythm-pattern extraction.


international conference on acoustics, speech, and signal processing | 2008

Harmonic-Temporal-Timbral Clustering (HTTC) for the analysis of multi-instrument polyphonic music signals

Kenichi Miyamoto; Hirokazu Kameoka; Takuya Nishimoto; Nobutaka Ono; Shigeki Sagayama

In this paper, we discuss a new approach named Harmonic-Temporal-Timbral Clustering (HTTC) for the analysis of single- channel audio signal of multi-instrument polyphonic music to estimate the pitch, onset timing, power and duration of all the acoustic events and to classify them into timbre categories simultaneously. Each acoustic event is modeled by a harmonic structure and a smooth envelope both represented by Gaussian mixtures. Based on the similarity between these spectro- temporal structures, timbres are clustered to form timbre categories. The entire process is mathematically formulated as a minimization problem for the I-divergence between the HTTC parametric model and the observed spectrogram of the music audio signal to simultaneously update harmonic, temporal and timbral model parameters through the EM algorithm. Some experimental results are presented to discuss the performance of the algorithm.


international conference on acoustics, speech, and signal processing | 2011

Multipitch estimation by joint modeling of harmonic and transient sounds

Jun Wu; Emmanuel Vincent; Stanislaw Andrzej Raczynski; Takuya Nishimoto; Nobutaka Ono; Shigeki Sagayama

Multipitch estimation techniques are widely used for music transcription and acquisition of musical data from digital signals. In this paper, we propose a flexible harmonic temporal timbre model to decompose the spectral energy of the signal in the time-frequency domain into individual pitched notes. Each note is modeled with a 2-dimensional Gaussian mixture. Unlike previous approaches, the proposed model is able to represent not only the harmonic partials but also the inharmonic attack of each note. We derive an Expectation-Maximization (EM) algorithm to estimate the parameters of this model and illustrate the higher performance of the proposed algorithm than NMF algorithm [9] and HTC algorithm [10] for the task of multipitch estimation over synthetic and real-world data.

Collaboration


Dive into the Takuya Nishimoto's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nobutaka Ono

National Institute of Informatics

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Noboru Harada

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Takayuki Watanabe

Tokyo Woman's Christian University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yasuhisa Niimi

Kyoto Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Masahiro Araki

Kyoto Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge