Masataka Goto
National Institute of Advanced Industrial Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Masataka Goto.
Proceedings of the IEEE | 2008
Michael A. Casey; Remco C. Veltkamp; Masataka Goto; Marc Leman; Christophe Rhodes; Malcolm Slaney
The steep rise in music downloading over CD sales has created a major shift in the music industry away from physical media formats and towards online products and services. Music is one of the most popular types of online information and there are now hundreds of music streaming and download services operating on the World-Wide Web. Some of the music collections available are approaching the scale of ten million tracks and this has posed a major challenge for searching, retrieving, and organizing music content. Research efforts in music information retrieval have involved experts from music perception, cognition, musicology, engineering, and computer science engaged in truly interdisciplinary activity that has resulted in many proposed algorithmic and methodological solutions to music search using content-based methods. This paper outlines the problems of content-based music information retrieval and explores the state-of-the-art methods using audio cues (e.g., query by humming, audio fingerprinting, content-based music retrieval) and other cues (e.g., music notation and symbolic representation), and identifies some of the major challenges for the coming years.
Speech Communication | 2004
Masataka Goto
Abstract In this paper, we describe the concept of music scene description and address the problem of detecting melody and bass lines in real-world audio signals containing the sounds of various instruments. Most previous pitch-estimation methods have had difficulty dealing with such complex music signals because these methods were designed to deal with mixtures of only a few sounds. To enable estimation of the fundamental frequency (F0) of the melody and bass lines, we propose a predominant-F0 estimation method called PreFEst that does not rely on the unreliable fundamental component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. This method estimates the relative dominance of every possible F0 (represented as a probability density function of the F0) by using MAP (maximum a posteriori probability) estimation and considers the F0’s temporal continuity by using a multiple-agent architecture. Experimental results with a set of ten music excerpts from compact-disc recordings showed that a real-time system implementing this method was able to detect melody and bass lines about 80% of the time these existed.
Journal of New Music Research | 2001
Masataka Goto
This paper describes a real-time beat tracking system that recognizes a hierarchical beat structure comprising the quarter-note, half-note, and measure levels in real-world audio signals sampled from popular-music compact discs. Most previous beat-tracking systems dealt with MIDI signals and had difficulty in processing, in real time, audio signals containing sounds of various instruments and in tracking beats above the quarter-note level. The system described here can process music with drums and music without drums and can recognize the hierarchical beat structure by using three kinds of musical knowledge: of onset times, of chord changes, and of drum patterns. This paper also describes several applications of beat tracking, such as beat-driven real-time computer graphics and lighting control.
international conference on acoustics, speech, and signal processing | 2003
Masataka Goto
This paper describes a method for obtaining a list of chorus (refrain) sections in compact-disc recordings of popular music. The detection of chorus sections is essential for the computational modeling of music understanding and is useful in various applications, such as automatic chorus-preview functions in music browsers or retrieval systems. Most previous methods detected as a chorus a repeated section of a given length and had difficulty in identifying both ends of a chorus section and in dealing with modulations (key changes). By analyzing relationships between various repeated sections, our method called RefraiD can detect all the chorus sections in a song and estimate both ends of each section. It can also detect modulated chorus sections by introducing a similarity that enables modulated repetition to be judged correctly. Experimental results with a popular-music database show that this method detects the correct chorus sections in 80 of 100 songs.
Speech Communication | 1999
Masataka Goto; Yoichi Muraoka
Abstract This paper describes a real-time beat-tracking system that detects a hierarchical beat structure in musical audio signals without drum-sounds. Most previous systems have dealt with MIDI signals and had difficulty in applying, in real time, musical heuristics to audio signals containing sounds of various instruments and in tracking beats above the quarter-note level. Our system not only tracks beats at the quarter-note level but also detects beat structure at the half-note and measure levels. To make musical decisions about the audio signals, we propose a method of detecting chord changes that does not require chord names to be identified. The method enables the system to track beats at different rhythmic levels – for example, to find the beginnings of half notes and measures – and to select the best of various hypotheses about beat positions. Experimental results show that the proposed method was effective to detect the beat structure in real-world audio signals sampled from compact discs of popular music.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Kazuyoshi Yoshii; Masataka Goto; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno
This paper presents a hybrid music recommender system that ranks musical pieces while efficiently maintaining collaborative and content-based data, i.e., rating scores given by users and acoustic features of audio signals. This hybrid approach overcomes the conventional tradeoff between recommendation accuracy and variety of recommended artists. Collaborative filtering, which is used on e-commerce sites, cannot recommend nonbrated pieces and provides a narrow variety of artists. Content-based filtering does not have satisfactory accuracy because it is based on the heuristics that the users favorite pieces will have similar musical content despite there being exceptions. To attain a higher recommendation accuracy along with a wider variety of artists, we use a probabilistic generative model that unifies the collaborative and content-based data in a principled way. This model can explain the generative mechanism of the observed data in the probability theory. The probability distribution over users, pieces, and features is decomposed into three conditionally independent ones by introducing latent variables. This decomposition enables us to efficiently and incrementally adapt the model for increasing numbers of users and rating scores. We evaluated our system by using audio signals of commercial CDs and their corresponding rating scores obtained from an e-commerce site. The results revealed that our system accurately recommended pieces including nonrated ones from a wide variety of artists and maintained a high degree of accuracy even when new users and rating scores were added.
acm multimedia | 1994
Masataka Goto; Yoichi Muraoka
This paper presents a beat tracking system that processes acoustic signals of music and recognizes temporal positions of beats in time. Musical beat tracking is needed by various multimedia applications such as video editing, audio editing, and stage lighting control. Previous systems were not able to deal with acoustic signals that contained sounds of various instruments, especially drums. They dealt with either MIDI signals or acoustic signals played on a few instruments, and in the latter case, did not work in real time. Our system deals with popular music in which drums maintain the beat. Because our system examines multiple hypotheses in parallel, it can follow beats without losing track of them, even if some hypotheses become wrong. Our system has been implemented on a parallel computer, the Fujitsu AP1000. In our experiment, the system correctly tracked beats in 27 out of 30 commercially distributed popular songs.
international conference on acoustics, speech, and signal processing | 2000
Masataka Goto
This paper describes a robust method for estimating the fundamental frequency (F0) of melody and bass lines in monaural real-world musical audio signals containing sounds of various instruments. Most previous F0-estimation methods had great difficulty dealing with such complex audio signals because they were designed to deal with mixtures of only a few sounds. To make it possible to estimate the F0 of the melody and bass lines, we propose a predominant-F0 estimation method called PreFEst that does not rely on the F0s unreliable frequency component and obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. It evaluates the relative dominance of every possible F0 by using the expectation-maximization algorithm and considers the temporal continuity of F0s by using a multiple-agent architecture. Experimental results show that our real-time system can detect the melody and bass lines in audio signals sampled from commercially distributed compact discs.
EURASIP Journal on Advances in Signal Processing | 2007
Tetsuro Kitahara; Masataka Goto; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno
We provide a new solution to the problem of feature variations caused by the overlapping of sounds in instrument identification in polyphonic music. When multiple instruments simultaneously play, partials (harmonic components) of their sounds overlap and interfere, which makes the acoustic features different from those of monophonic sounds. To cope with this, we weight features based on how much they are affected by overlapping. First, we quantitatively evaluate the influence of overlapping on each feature as the ratio of the within-class variance to the between-class variance in the distribution of training data obtained from polyphonic sounds. Then, we generate feature axes using a weighted mixture that minimizes the influence via linear discriminant analysis. In addition, we improve instrument identification using musical context. Experimental results showed that the recognition rates using both feature weighting and musical context were 84.1 for duo, 77.6 for trio, and 72.3 for quartet; those without using either were 53.4, 49.6, and 46.5, respectively.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Kazuyoshi Yoshii; Masataka Goto; Hiroshi G. Okuno
This paper describes a system that detects onsets of the bass drum, snare drum, and hi-hat cymbals in polyphonic audio signals of popular songs. Our system is based on a template-matching method that uses power spectrograms of drum sounds as templates. This method calculates the distance between a template and each spectrogram segment extracted from a song spectrogram, using Gotos distance measure originally designed to detect the onsets in drums-only signals. However, there are two main problems. The first problem is that appropriate templates are unknown for each song. The second problem is that it is more difficult to detect drum-sound onsets in sound mixtures including various sounds other than drum sounds. To solve these problems, we propose template-adaptation and harmonic-structure-suppression methods. First of all, an initial template of each drum sound, called a seed template, is prepared. The former method adapts it to actual drum-sound spectrograms appearing in the song spectrogram. To make our system robust to the overlapping of harmonic sounds with drum sounds, the latter method suppresses harmonic components in the song spectrogram before the adaptation and matching. Experimental results with 70 popular songs showed that our template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83%, 58%, and 46% in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively
Collaboration
Dive into the Masataka Goto's collaboration.
National Institute of Advanced Industrial Science and Technology
View shared research outputsNational Institute of Advanced Industrial Science and Technology
View shared research outputsNational Institute of Advanced Industrial Science and Technology
View shared research outputs