Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tetsuro Kitahara is active.

Publication


Featured researches published by Tetsuro Kitahara.


EURASIP Journal on Advances in Signal Processing | 2007

Instrument identification in polyphonic music: feature weighting to minimize influence of sound overlaps

Tetsuro Kitahara; Masataka Goto; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

We provide a new solution to the problem of feature variations caused by the overlapping of sounds in instrument identification in polyphonic music. When multiple instruments simultaneously play, partials (harmonic components) of their sounds overlap and interfere, which makes the acoustic features different from those of monophonic sounds. To cope with this, we weight features based on how much they are affected by overlapping. First, we quantitatively evaluate the influence of overlapping on each feature as the ratio of the within-class variance to the between-class variance in the distribution of training data obtained from polyphonic sounds. Then, we generate feature axes using a weighted mixture that minimizes the influence via linear discriminant analysis. In addition, we improve instrument identification using musical context. Experimental results showed that the recognition rates using both feature weighting and musical context were 84.1 for duo, 77.6 for trio, and 72.3 for quartet; those without using either were 53.4, 49.6, and 46.5, respectively.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval

Hiromasa Fujihara; Masataka Goto; Tetsuro Kitahara; Hiroshi G. Okuno

This paper describes a method of modeling the characteristics of a singing voice from polyphonic musical audio signals including sounds of various musical instruments. Because singing voices play an important role in musical pieces with vocals, such representation is useful for music information retrieval systems. The main problem in modeling the characteristics of a singing voice is the negative influences caused by accompaniment sounds. To solve this problem, we developed two methods, accompaniment sound reduction and reliable frame selection . The former makes it possible to calculate feature vectors that represent a spectral envelope of a singing voice after reducing accompaniment sounds. It first extracts the harmonic components of the predominant melody from sound mixtures and then resynthesizes the melody by using a sinusoidal model driven by these components. The latter method then estimates the reliability of frame of the obtained melody (i.e., the influence of accompaniment sound) by using two Gaussian mixture models (GMMs) for vocal and nonvocal frames to select the reliable vocal portions of musical pieces. Finally, each song is represented by its GMM consisting of the reliable frames. This new representation of the singing voice is demonstrated to improve the performance of an automatic singer identification system and to achieve an MIR system based on vocal timbre similarity.


international conference on acoustics, speech, and signal processing | 2003

Musical instrument identification based on F0-dependent multivariate normal distribution

Tetsuro Kitahara; Masataka Goto; Hiroshi G. Okuno

The pitch dependency of timbres has not been fully exploited in musical instrument identification. In this paper, we present a method using an F0-dependent multivariate normal distribution of which mean is represented by a function of fundamental frequency (FO). This F0-dependent mean function represents the pitch dependency of each feature, while the F0-normalized covariance represents the non-pitch dependency. Musical instrument sounds are first analyzed by the F0-dependent multivariate normal distribution, and then identified by using the discriminant function based on the Bayes decision rule. Experimental results of identifying 6,247 solo tones of 19 musical instruments by 10-fold cross validation showed that the proposed method improved the recognition rate at individual-instrument level from 75.73% to 79.73%, and the recognition rate at category level from 88.20% to 90.65%.


international conference on acoustics, speech, and signal processing | 2006

F0 Estimation Method for Singing Voice in Polyphonic Audio Signal Based on Statistical Vocal Model and Viterbi Search

Hiromasa Fujihara; Tetsuro Kitahara; Masataka Goto; Kazunori Komatani; Tetsuya Ogata; H.G. Okun

This paper describes a method for estimating F0s of vocal from polyphonic audio signals. Because melody is sung by a singer in many musical pieces, the estimation of F0s of the vocal part is useful for many applications. Based on existing multiple-F0 estimation method, we evaluate the vocal probabilities of the harmonic structure of each F0 candidate. In order to calculate the vocal probabilities of the harmonic structure, we extract and resynthesize the harmonic structure by using a sinusoidal model and extract feature vectors. Then, we evaluate the vocal probability by using vocal and non-vocal Gaussian mixture models (GMMs). Finally, we track F0 trajectories using these probabilities based on Viterbi search. Experimental results show that our method improves estimation accuracy from 78.1% to 84.3%, which is 28.3% reduction of misestimation


Applied Intelligence | 2005

Pitch-Dependent Identification of Musical Instrument Sounds

Tetsuro Kitahara; Masataka Goto; Hiroshi G. Okuno

This paper describes a musical instrument identification method that takes into consideration the pitch dependency of timbres of musical instruments. The difficulty in musical instrument identification resides in the pitch dependency of musical instrument sounds, that is, acoustic features of most musical instruments vary according to the pitch (fundamental frequency, F0). To cope with this difficulty, we propose an F0-dependent multivariate normal distribution, where each element of the mean vector is represented by a function of F0. Our method first extracts 129 features (e.g., the spectral centroid, the gradient of the straight line approximating the power envelope) from a musical instrument sound and then reduces the dimensionality of the feature space into 18 dimension. In the 18-dimensional feature space, it calculates an F0-dependent mean function and an F0-normalized covariance, and finally applies the Bayes decision rule. Experimental results of identifying 6,247 solo tones of 19 musical instruments shows that the proposed method improved the recognition rate from 75.73% to 79.73%.


international symposium on multimedia | 2006

Musical Instrument Recognizer "Instrogram" and Its Application to Music Retrieval Based on Instrumentation Similarity

Tetsuro Kitahara; Masataka Goto; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

Instrumentation is an important cue in retrieving musical content. Conventional methods for instrument recognition performing notewise require accurate estimation of the onset time and fundamental frequency (FO) for each note, which is not easy in polyphonic music. This paper presents a non-notewise method for instrument recognition in polyphonic musical audio signals. Instead of such note-wise estimation, our method calculates the temporal trajectory of instrument existence probabilities for every FO and visualizes it as a spectrogram-like graphical representation, called an instrogram. This method can avoid the influence by errors of onset detection and FO estimation because it does not use them. We also present methods for MPEG-7-based instrument annotation and music information retrieval based on the similarity between instrograms. Experimental results with realistic music show the average accuracy of 76.2% for the instrument annotation and that the instrogram-based similarity measure represents the actual instrumentation similarity better than an MFCC-based one


international conference on acoustics, speech, and signal processing | 2006

Instrogram: A New Musical Instrument Recognition Technique Without Using Onset Detection NOR F0 Estimation

Tetsuro Kitahara; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno; Masataka Goto

This paper describes a new technique for recognizing musical instruments in polyphonic music. Because the conventional framework for musical instrument recognition in polyphonic music had to estimate the onset time and fundamental frequency (F0) of each note, instrument recognition strictly suffered from errors of onset detection and F0 estimation. Unlike such a note-based processing framework, our technique calculates the temporal trajectory of instrument existence probabilities for every possible F0, and the results are visualized with a spectrogram-like graphical representation called instrogram. The instrument existence probability is defined as the product of a nonspecific instrument existence probability calculated using PreFEst and a conditional instrument existence probability calculated using the hidden Markov model. Experimental results show that the obtained instrograms reflect the actual instrumentations and facilitate instrument recognition


international conference on acoustics, speech, and signal processing | 2004

Comparing features for forming music streams in automatic music transcription

Y. Sakuraba; Tetsuro Kitahara; J.G. Okuno

In formating temporal sequences of notes played by the same instrument (referred to as music streams), timbre of musical instruments may be a predominant feature. In polyphonic music, the performance of timbre extraction based on power-related features deteriorates, because such features are blurred when two or more frequency components are superimposed in the same frequency. To cope with this problem, we integrated timbre similarity and direction proximity with success, but left using other features as future work. In this paper, we investigate four features. timbre similarity, direction proximity, pitch transition and pitch relation consistency to clarify the precedence among them in music stream formation. Experimental results with quartet music show that direction proximity is the most dominant feature, and pitch transition is the secondary. In addition, the performance of music stream formation was improved from 63.3% by only timbre similarity to 84.9% by integrating four features.


international conference on acoustics, speech, and signal processing | 2004

Category-level identification of non-registered musical instrument sounds

Tetsuro Kitahara; Masataka Goto; Hiroshi G. Okuno

This paper describes a method that identifies sounds of non-registered musical instruments (i.e., musical instruments that are not contained in the training data) at a category level. Although the problem of how to deal with non-registered musical instruments is essential in musical instrument identification, it has not been dealt with in previous studies. Our method solves this problem by distinguishing between registered and non-registered instruments and identifying the category name of the non-registered instruments. When a given sound is registered, its instrument name, e.g. violin, is identified. Even if it is not registered, its category name, e.g. strings, can be identified. The important issue in achieving such identification is to adopt a musical instrument hierarchy reflecting the acoustical similarity. We present a method for acquiring such a hierarchy from a musical instrument sound database. Experimental results show that around 77% of non-registered instrument sounds, on average, were correctly identified at the category level.


international conference industrial engineering other applications applied intelligent systems | 2011

Environmental sound recognition for robot audition using matching-pursuit

Nobuhide Yamakawa; Toru Takahashi; Tetsuro Kitahara; Tetsuya Ogata; Hiroshi G. Okuno

Our goal is to achieve a robot audition system that is capable of recognizing multiple environmental sounds and making use of them in human-robot interaction. The main problems in environmental sound recognition in robot audition are: (1) recognition under a large amount of background noise including the noise from the robot itself, and (2) the necessity of robust feature extraction against spectrum distortion due to separation of multiple sound sources. This paper presents the environmental recognition of two sound sources fired simultaneously using matching pursuit (MP) with the Gabor wavelet, which extracts salient audio features from a signal. The two environmental sounds come from different directions, and they are localized by multiple signal classification and, using their geometric information, separated by geometric source separation with the aid of measured head-related transfer functions. The experimental results show the noise-robustness of MP although the performance depends on the properties of the sound sources.

Collaboration


Dive into the Tetsuro Kitahara's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Masataka Goto

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hiromasa Fujihara

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Katsuhisa Ishida

Tokyo University of Science

View shared research outputs
Top Co-Authors

Avatar

Masayuki Takeda

Tokyo University of Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge