Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Katsutoshi Itoyama is active.

Publication


Featured researches published by Katsutoshi Itoyama.


international conference on acoustics, speech, and signal processing | 2007

Integration and Adaptation of Harmonic and Inharmonic Models for Separating Polyphonic Musical Signals

Katsutoshi Itoyama; Masataka Goto; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

This paper describes a sound source separation method for polyphonic sound mixtures of music to build an instrument equalizer for remixing multiple tracks separated from compact-disc recordings by changing the volume level of each track. Although such mixtures usually include both harmonic and inharmonic sounds, the difficulties in dealing with both types of sounds together have not been addressed in most previous methods that have focused on either of the two types separately. We therefore developed an integrated weighted-mixture model consisting of both harmonic-structure and inharmonic-structure tone models (generative models for the power spectrogram). On the basis of the MAP estimation using the EM algorithm, we estimated all model parameters of this integrated model under several original constraints for preventing over-training and maintaining intra-instrument consistency. Using standard MIDI files as prior information of the model parameters, we applied this model to compact-disc recordings and achieved the instrument equalizer.


intelligent robots and systems | 2013

Noise correlation matrix estimation for improving sound source localization by multirotor UAV

Koutarou Furukawa; Keita Okutani; Kohei Nagira; Takuma Otsuka; Katsutoshi Itoyama; Kazuhiro Nakadai; Hiroshi G. Okuno

A method has been developed for improving sound source localization (SSL) using a microphone array from an unmanned aerial vehicle with multiple rotors, a “multirotor UAV”. One of the main problems in SSL from a multirotor UAV is that the ego noise of the rotors on the UAV interferes with the audio observation and degrades the SSL performance. We employ a generalized eigenvalue decomposition-based multiple signal classification (GEVD-MUSIC) algorithm to reduce the effect of ego noise. While GEVD-MUSIC algorithm requires a noise correlation matrix corresponding to the auto-correlation of the multichannel observation of the rotor noise, the noise correlation is nonstationary due to the aerodynamic control of the UAV. Therefore, we need an adaptive estimation method of the noise correlation matrix for a robust SSL using GEVD-MUSIC algorithm. Our method uses a Gaussian process regression to estimate the noise correlation matrix in each time period from the measurements of self-monitoring sensors attached to the UAV such as the pitch-roll-yaw tilt angles, xyz speeds, and motor control values. Experiments compare our method with existing SSL methods in terms of precision and recall rates of SSL. The results demonstrate that our method outperforms existing methods, especially under high signal-to-noise-ratio conditions.


international conference on acoustics, speech, and signal processing | 2015

Singing voice analysis and editing based on mutually dependent F0 estimation and source separation

Yukara Ikemiya; Kazuyoshi Yoshii; Katsutoshi Itoyama

This paper presents a novel framework that improves both vocal fundamental frequency (F0) estimation and singing voice separation by making effective use of the mutual dependency of those two tasks. A typical approach to singing voice separation is to estimate the vocal F0 contour from a target music signal and then extract the singing voice by using a time-frequency mask that passes only the harmonic components of the vocal F0s and overtones. Vocal F0 estimation, on the contrary, is considered to become easier if only the singing voice can be extracted accurately from the target signal. Such mutual dependency has scarcely been focused on in most conventional studies. To overcome this limitation, our framework alternates those two tasks while using the results of each in the other. More specifically, we first extract the singing voice by using robust principal component analysis (RPCA). The F0 contour is then estimated from the separated singing voice by finding the optimal path over a F0-saliency spectrogram based on subharmonic summation (SHS). This enables us to improve singing voice separation by combining a time-frequency mask based on RPCA with a mask based on harmonic structures. Experimental results obtained when we used the proposed technique to directly edit vocal F0s in popular-music audio signals showed that it significantly improved both vocal F0 estimation and singing voice separation.


international conference on acoustics, speech, and signal processing | 2014

Automatic transcription of guitar tablature from audio signals in accordance with player's proficiency

Kazuki Yazawa; Katsutoshi Itoyama; Hiroshi G. Okuno

We describe a method for automatically transcribing guitar tablatures from audio signals in accordance with the players proficiency for use as support for a guitar players practice. The system estimates the multiple pitches in each time frame and the optimal fingering considering playability and players proficiency. It combines a conventional multipitch estimation method with a basic dynamic programming method. The difficulty of the fingerings can be changed by tuning the parameter representing the relative weights of the acoustical reproducibility and the fingering easiness. Experiments conducted using synthesized guitar audio signals to evaluate the transcribed tablatures in terms of the multipitch estimation accuracy and fingering easiness demonstrated that the system can simplify the fingering with higher precision of multipitch estimation results than the conventional method.


international conference on acoustics, speech, and signal processing | 2011

Simultaneous processing of sound source separation and musical instrument identification using Bayesian spectral modeling

Katsutoshi Itoyama; Masataka Goto; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

This paper presents a method of both separating audio mixtures into sound sources and identifying the musical instruments of the sources. A statistical tone model of the power spectrogram, called an integrated model, is defined and source separation and instrument identification are carried out on the basis of Bayesian inference. Since, the parameter distributions of the integrated model depend on each instrument, the instrument name is identified by selecting the one that has the maximum relative instrument weight. Experimental results showed correct instrument identification enables precise source separation even when many overtones overlap.


international conference on acoustics, speech, and signal processing | 2016

Student's T nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation

Kazuyoshi Yoshii; Katsutoshi Itoyama; Masataka Goto

This paper presents a robust variant of nonnegative matrix factorization (NMF) based on complex Students t distributions (t-NMF) for source separation of single-channel audio signals. The Itakura-Saito divergence NMF (Gaussian NMF) is justified for this purpose under an assumption that the complex spectra of source signals and those of the mixture signal are complex Gaussian distributed (the additiv-ity of power spectra holds). In fact, however, the source spectra are often heavy-tailed distributed. When the source spectra are complex Cauchy distributed, for example, the mixture spectra are also complex Cauchy distributed (the additivity of amplitude spectra holds). Using the complex t distribution that includes the complex Gaussian and Cauchy distributions as its special cases, we propose t-NMF as a unified extension of Gaussian NMF and Cauchy NMF. Furthermore, we propose the corresponding variant of positive semidefinite tensor factorization based on multivariate complex t distributions (t-PSDTF). The experimental results showed that while t-NMF and t-PSDTF were comparative to Gaussian counterparts in terms of peak performance, they worked much better on average because they are insensitive to initialization and tend to avoid local optima.


intelligent robots and systems | 2013

Posture estimation of hose-shaped robot using microphone array localization

Yoshiaki Bando; Takeshi Mizumoto; Katsutoshi Itoyama; Kazuhiro Nakadai; Hiroshi G. Okuno

This paper presents a posture estimation of hose-shaped robot using microphone array localization. The hose-shaped robots, one of major rescue robots, have problems with navigation because their posture is too flexible for a remote operator to control to go as far as desired. For navigational and mission usability, the posture estimation of the hose-shaped robot is essential. We developed a posture estimation method with a microphone array and small loudspeakers equipped on the hose-shaped robot. Our method consists of two steps: (1) playing a known sound from the loudspeaker one-by-one, and (2) estimating the microphone positions on the hose-shaped robot instead of estimating the posture directly. We designed a time difference of arrival (TDOA) estimation method to be robust against directional noise and implemented a prototype system using a posture model of the hose-shaped robot and an Extended Kalman Filter (EKF). The validity of our approach is evaluated by the experiments with both signals recorded in an anechoic chamber and simulated data.


EURASIP Journal on Advances in Signal Processing | 2011

Query-by-Example Music Information Retrieval by Score-Informed Source Separation and Remixing Technologies

Katsutoshi Itoyama; Masataka Goto; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

We describe a novel query-by-example (QBE) approach in music information retrieval that allows a user to customize query examples by directly modifying the volume of different instrument parts. The underlying hypothesis of this approach is that the musical mood of retrieved results changes in relation to the volume balance of different instruments. On the basis of this hypothesis, we aim to clarify the relationship between the change in the volume balance of a query and the genre of the retrieved pieces, called genre classification shift. Such an understanding would allow us to instruct users in how to generate alternative queries without finding other appropriate pieces. Our QBE system first separates all instrument parts from the audio signal of a piece with the help of its musical score, and then it allows users remix these parts to change the acoustic features that represent the musical mood of the piece. Experimental results showed that the genre classification shift was actually caused by the volume change in the vocal, guitar, and drum parts.


international symposium on safety, security, and rescue robotics | 2015

Human-voice enhancement based on online RPCA for a hose-shaped rescue robot with a microphone array

Yoshiaki Bando; Katsutoshi Itoyama; Masashi Konyo; Satoshi Tadokoro; Kazuhiro Nakadai; Kazuyoshi Yoshii; Hiroshi G. Okuno

This paper presents an online real-time method that enhances human voices included in severely noisy audio signals captured by microphones of a hose-shaped rescue robot. To help a remote operator of such a robot pick up a weak voice of a human buried under rubble, it is crucial to suppress the loud ego-noise caused by the movements of the robot in real time. We tackle this task by using online robust principal component analysis (ORPCA) for decomposing the spectrogram of an observed noisy signal into the sum of low-rank and sparse spectrograms that are expected to correspond to periodic ego-noise and human voices. Using a microphone array distributed on the long body of a hose-shaped robot, ego-noise suppression can be further improved by combining the results of ORPCA applied to the observed signal captured by each microphone. Experiments using a 3-m hose-shaped rescue robot with eight microphones show that the proposed method improves the performance of conventional ego-noise suppression using only one microphone by 7.4 dB in SDR and 17.2 in SIR.


international conference on acoustics, speech, and signal processing | 2013

Audio-based guitar tablature transcription using multipitch analysis and playability constraints

Kazuki Yazawa; Daichi Sakaue; Kohei Nagira; Katsutoshi Itoyama; Hiroshi G. Okuno

This paper proposes a method of guitar tablature transcription from audio signals. Multipitch estimation and fingering configuration estimation are essential for transcribing tablatures. Conventional multipitch estimation methods, including latent harmonic allocation (LHA), often estimate combinations of pitches that people cannot play due to inherent physical constraints. Unplayable combinations of pitches are eliminated by filtering the results of LHA with three constraints. We first enumerate playable fingering configurations, and use them to suppress any undesirable combination of pitches. The optimal fingering configuration in each time frame is optimized to satisfy the need for temporal continuity by using dynamic programming. We use synthesized guitar sounds from MIDI data (ground truth) for evaluation. Experiments with them demonstrate the improvement of multipitch estimation by 5.9 points on average in F-measure and the transcribed tablatures are playable.

Collaboration


Dive into the Katsutoshi Itoyama's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Masataka Goto

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge