Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kazuyoshi Yoshii is active.

Publication


Featured researches published by Kazuyoshi Yoshii.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

An Efficient Hybrid Music Recommender System Using an Incrementally Trainable Probabilistic Generative Model

Kazuyoshi Yoshii; Masataka Goto; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

This paper presents a hybrid music recommender system that ranks musical pieces while efficiently maintaining collaborative and content-based data, i.e., rating scores given by users and acoustic features of audio signals. This hybrid approach overcomes the conventional tradeoff between recommendation accuracy and variety of recommended artists. Collaborative filtering, which is used on e-commerce sites, cannot recommend nonbrated pieces and provides a narrow variety of artists. Content-based filtering does not have satisfactory accuracy because it is based on the heuristics that the users favorite pieces will have similar musical content despite there being exceptions. To attain a higher recommendation accuracy along with a wider variety of artists, we use a probabilistic generative model that unifies the collaborative and content-based data in a principled way. This model can explain the generative mechanism of the observed data in the probability theory. The probability distribution over users, pieces, and features is decomposed into three conditionally independent ones by introducing latent variables. This decomposition enables us to efficiently and incrementally adapt the model for increasing numbers of users and rating scores. We evaluated our system by using audio signals of commercial CDs and their corresponding rating scores obtained from an e-commerce site. The results revealed that our system accurately recommended pieces including nonrated ones from a wide variety of artists and maintained a high degree of accuracy even when new users and rating scores were added.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With Harmonic Structure Suppression

Kazuyoshi Yoshii; Masataka Goto; Hiroshi G. Okuno

This paper describes a system that detects onsets of the bass drum, snare drum, and hi-hat cymbals in polyphonic audio signals of popular songs. Our system is based on a template-matching method that uses power spectrograms of drum sounds as templates. This method calculates the distance between a template and each spectrogram segment extracted from a song spectrogram, using Gotos distance measure originally designed to detect the onsets in drums-only signals. However, there are two main problems. The first problem is that appropriate templates are unknown for each song. The second problem is that it is more difficult to detect drum-sound onsets in sound mixtures including various sounds other than drum sounds. To solve these problems, we propose template-adaptation and harmonic-structure-suppression methods. First of all, an initial template of each drum sound, called a seed template, is prepared. The former method adapts it to actual drum-sound spectrograms appearing in the song spectrogram. To make our system robust to the overlapping of harmonic sounds with drum sounds, the latter method suppresses harmonic components in the song spectrogram before the adaptation and matching. Experimental results with 70 popular songs showed that our template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83%, 58%, and 46% in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively


intelligent robots and systems | 2007

A biped robot that keeps steps in time with musical beats while listening to music with its own ears

Kazuyoshi Yoshii; Kazuhiro Nakadai; Toyotaka Torii; Yuji Hasegawa; Hiroshi Tsujino; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

We aim at enabling a biped robot to interact with humans through real-world music in daily-life environments, e.g., to autonomously keep its steps (stamps) in time with musical beats. To achieve this, the robot should be able to robustly predict the beat times in real time while listening to musical performance with its own ears (head-embedded microphones). However, this has not previously been addressed in most studies on music-synchronized robots due to the difficulty in predicting the beat times in real-world music. To solve this problem, we implemented a beat-tracking method developed in the field of music information processing. The predicted beat times are then used by a feedback-control method that adjusts the robots step intervals to synchronize its steps in time with the beats. The experimental results show that the robot can adjust its steps in time with the beat times as the tempo changes. The resulting robot needed about 25 [s] to recognize the tempo change after it and then synchronize its steps.


intelligent robots and systems | 2008

A robot uses its own microphone to synchronize its steps to musical beats while scatting and singing

Kazumasa Murata; Kazuhiro Nakadai; Kazuyoshi Yoshii; Ryu Takeda; Toyotaka Torii; Hiroshi G. Okuno; Yuji Hasegawa; Hiroshi Tsujino

Musical beat tracking is one of the effective technologies for human-robot interaction such as musical sessions. Since such interaction should be performed in various environments in a natural way, musical beat tracking for a robot should cope with noise sources such as environmental noise, its own motor noises, and self voices, by using its own microphone. This paper addresses a musical beat tracking robot which can step, scat and sing according to musical beats by using its own microphone. To realize such a robot, we propose a robust beat tracking method by introducing two key techniques, that is, spectro-temporal pattern matching and echo cancellation. The former realizes robust tempo estimation with a shorter window length, thus, it can quickly adapt to tempo changes. The latter is effective to cancel self noises such as stepping, scatting, and singing. We implemented the proposed beat tracking method for Honda ASIMO. Experimental results showed ten times faster adaptation to tempo changes and high robustness in beat tracking for stepping, scatting and singing noises. We also demonstrated the robot times its steps while scatting or singing to musical beats.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

A Nonparametric Bayesian Multipitch Analyzer Based on Infinite Latent Harmonic Allocation

Kazuyoshi Yoshii; Masataka Goto

The statistical multipitch analyzer described in this paper estimates multiple fundamental frequencies (F0s) in polyphonic music audio signals produced by pitched instruments. It is based on hierarchic4al nonparametric Bayesian models that can deal with uncertainty of unknown random variables such as model complexities (e.g., the number of F0s and the number of harmonic partials), model parameters (e.g., the values of F0s and the relative weights of harmonic partials), and hyperparameters (i.e., prior knowledge on complexities and parameters). Using these models, we propose a statistical method called infinite latent harmonic allocation (iLHA). To avoid model-complexity control, we allow the observed spectra to contain an unbounded number of sound sources (F0s), each of which is allowed to contain an unbounded number of harmonic partials. More specifically, to model a set of time-sliced spectra, we formulated nested infinite Gaussian mixture models based on hierarchical and generalized Dirichlet processes. To avoid manual tuning of influential hyperparameters, we put noninformative hyperprior distributions on them in a hierarchical manner. For efficient Bayesian inference, we used a modern technique called collapsed variational Bayes. In comparative experiments using audio recordings of piano and guitar solo performances, iLHA yielded promising results and we found that there would be room for improvement based on modeling of temporal continuity and spectral smoothness.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

AutoMashUpper: automatic creation of multi-song music mashups

Matthew E. P. Davies; Philippe Hamel; Kazuyoshi Yoshii; Masataka Goto

In this paper we present a system, AutoMashUpper, for making multi-song music mashups. Central to our system is a measure of “mashability” calculated between phrase sections of an input song and songs in a music collection. We define mashability in terms of harmonic and rhythmic similarity and a measure of spectral balance. The principal novelty in our approach centres on the determination of how elements of songs can be made fit together using key transposition and tempo modification, rather than based on their unaltered properties. In this way, the properties of two songs used to model their mashability can be altered with respect to transformations performed to maximize their perceptual compatibility. AutoMashUpper has a user interface to allow users to control the parameterization of the mashability estimation. It allows users to define ranges for key shifts and tempo as well as adding, changing or removing elements from the created mashups. We evaluate AutoMashUpper by its ability to reliably segment music signals into phrase sections, and also via a listening test to examine the relationship between estimated mashability and user enjoyment.


international conference on acoustics, speech, and signal processing | 2015

Singing voice analysis and editing based on mutually dependent F0 estimation and source separation

Yukara Ikemiya; Kazuyoshi Yoshii; Katsutoshi Itoyama

This paper presents a novel framework that improves both vocal fundamental frequency (F0) estimation and singing voice separation by making effective use of the mutual dependency of those two tasks. A typical approach to singing voice separation is to estimate the vocal F0 contour from a target music signal and then extract the singing voice by using a time-frequency mask that passes only the harmonic components of the vocal F0s and overtones. Vocal F0 estimation, on the contrary, is considered to become easier if only the singing voice can be extracted accurately from the target signal. Such mutual dependency has scarcely been focused on in most conventional studies. To overcome this limitation, our framework alternates those two tasks while using the results of each in the other. More specifically, we first extract the singing voice by using robust principal component analysis (RPCA). The F0 contour is then estimated from the separated singing voice by finding the optimal path over a F0-saliency spectrogram based on subharmonic summation (SHS). This enables us to improve singing voice separation by combining a time-frequency mask based on RPCA with a mask based on harmonic structures. Experimental results obtained when we used the proposed technique to directly edit vocal F0s in popular-music audio signals showed that it significantly improved both vocal F0 estimation and singing voice separation.


intelligent robots and systems | 2008

A robot listens to music and counts its beats aloud by separating music from counting voice

Takeshi Mizumoto; Ryu Takeda; Kazuyoshi Yoshii; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

This paper presents a beat-counting robot that can count musical beats aloud, i.e., speak ldquoone, two, three, four, one, two, ...rdquo along music, while listening to music by using its own ears. Music-understanding robots that interact with humans should be able not only to recognize music internally, but also to express their own internal states. To develop our beat-counting robot, we have tackled three issues: (1) recognition of hierarchical beat structures, (2) expression of these structures by counting beats, and (3) suppression of counting voice (self-generated sound) in sound mixtures recorded by ears. The main issue is (3) because the interference of counting voice in music causes the decrease of the beat recognition accuracy. So we designed the architecture for music-understanding robot that is capable of dealing with the issue of self-generated sounds. To solve these issues, we took the following approaches: (1) beat structure prediction based on musical knowledge on chords and drums, (2) speed control of counting voice according to music tempo via a vocoder called STRAIGHT, and (3) semi-blind separation of sound mixtures into music and counting voice via an adaptive filter based on ICA (independent component analysis) that uses the waveform of the counting voice as a prior knowledge. Experimental result showed that suppressing robotpsilas own voice improved music recognition capability.


international conference on acoustics, speech, and signal processing | 2014

Vocal timbre analysis using latent Dirichlet allocation and cross-gender vocal timbre similarity

Tomoyasu Nakano; Kazuyoshi Yoshii; Masataka Goto

This paper presents a vocal timbre analysis method based on topic modeling using latent Dirichlet allocation (LDA). Although many works have focused on analyzing characteristics of singing voices, none have dealt with “latent” characteristics (topics) of vocal timbre, which are shared by multiple singing voices. In the work described in this paper, we first automatically extracted vocal timbre features from polyphonic musical audio signals including vocal sounds. The extracted features were used as observed data, and mixing weights of multiple topics were estimated by LDA. Finally, the semantics of each topic were visualized by using a word-cloud-based approach. Experimental results for a singer identification task using 36 songs sung by 12 singers showed that our method achieved a mean reciprocal rank of 0.86. We also proposed a method for estimating cross-gender vocal timbre similarity by generating pitch-shifted (frequency-warped) signals of every singing voice. Experimental results for a cross-gender singer retrieval task showed that our method discovered interesting similar pitch-shifted singers.


intelligent robots and systems | 2013

Nested iGMM recognition and multiple hypothesis tracking of moving sound sources for mobile robot audition

Yoko Sasaki; Naotaka Hatao; Kazuyoshi Yoshii; Satoshi Kagami

The paper proposes two modules for a mobile robot audition system: 1) recognizing surrounding acoustic event, 2) tracking moving sound sources. We propose nested infinite Gaussian mixture model (iGMM) for recognizing frame based feature vectors. The main advantage is that the number of classes is allowed to increase without bound, if necessary, to represent unknown audio input. The multiple hypothesis tracking module provides time-series of separated audio stream using localized directions and recognition results at each frame. Not only for continuous sounds, the proposed tracker automatically detects appearing and disappearing point of stream from multiple hypothesis. These two modules are connected to microphone array based sound localization and separation, and the combined robot audition system achieved tracking of multiple moving sounds including intermittent sound source.

Collaboration


Dive into the Kazuyoshi Yoshii's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Masataka Goto

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tomoyasu Nakano

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge