Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eita Nakamura is active.

Publication


Featured researches published by Eita Nakamura.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices

Eita Nakamura; Kazuyoshi Yoshii; Shigeki Sagayama

In a recent conference paper, we have reported a rhythm transcription method based on a merged-output hidden Markov model (HMM) that explicitly describes the multiple-voice structure of polyphonic music. This model solves a major problem of conventional methods that could not properly describe the nature of multiple voices as in polyrhythmic scores or in the phenomenon of loose synchrony between voices. In this paper, we present a complete description of the proposed model and develop an inference technique, which is valid for any merged-output HMMs, for which output probabilities depend on past events. We also examine the influence of the architecture and parameters of the method in terms of accuracies of rhythm transcription and voice separation and perform comparative evaluations with six other algorithms. Using MIDI recordings of classical piano pieces, we found that the proposed model outperformed other methods by more than 12 points in the accuracy for polyrhythmic performances and performed almost as good as the best one for non-polyrhythmic performances. This reveals the state-of-the-art methods of rhythm transcription for the first time in the literature. Publicly available source codes are also provided for future comparisons.


Journal of New Music Research | 2015

A Stochastic Temporal Model of Polyphonic MIDI Performance with Ornaments

Eita Nakamura; Nobutaka Ono; Shigeki Sagayama; Kenji Watanabe

We study indeterminacies in realization of ornaments and how they can be incorporated in a stochastic performance model applicable for music information processing such as score-performance matching. We point out the importance of temporal information, and propose a hidden Markov model which describes it explicitly and represents ornaments with several state types. Following a review of the indeterminacies, they are carefully incorporated into the model through its topology and parameters, and the state construction for quite general polyphonic scores is explained in detail. By analysing piano performance data, we find significant overlaps in inter-onset-interval distributions of chordal notes, ornaments, and inter-chord events, and the data is used to determine details of the model. The model is applied for score following and offline score-performance matching, yielding highly accurate matching for performances with many ornaments and relatively frequent errors, repeats, and skips.


international conference on acoustics, speech, and signal processing | 2016

Tree-structured probabilistic model of monophonic written music based on the generative theory of tonal music

Eita Nakamura; Masatoshi Hamanaka; Keiji Hirata; Kazuyoshi Yoshii

This paper presents a probabilistic formulation of music language modelling based on the generative theory of tonal music (GTTM) named probabilistic GTTM (PGTTM). GTTM is a well-known music theory that describes the tree structure of written music in analogy with the phrase structure grammar of natural language. To develop a computational music language model incorporating GTTM and a machine-learning framework for data-driven music grammar induction, we construct a generative model of monophonic music based on probabilistic context-free grammar, in which the time-span tree proposed in GTTM corresponds to the parse tree. Applying the techniques of natural language processing, we also derive supervised and unsupervised learning algorithms based on the maximal-likelihood estimation, and a Bayesian inference algorithm based on the Gibbs sampling. Despite the conceptual simplicity of the model, we found that the model automatically acquires music grammar from data and reproduces time-span trees of written music as accurately as an analyser that required elaborate manual parameter tuning.


european signal processing conference | 2016

Rhythm transcription of MIDI performances based on hierarchical Bayesian modelling of repetition and modification of musical note patterns

Eita Nakamura; Katsutoshi Itoyama; Kazuyoshi Yoshii

This paper presents a method of rhythm transcription (i.e., automatic recognition of note values in music performance signals) based on a Bayesian music language model that describes the repetitive structure of musical notes. Conventionally, music language models for music transcription are trained with a dataset of musical pieces. Because typical musical pieces have repetitions consisting of a limited number of note patterns, better models fitting individual pieces could be obtained by inducing compact grammars. The main challenges are inducing appropriate grammar for a score that is observed indirectly through a performance and capturing incomplete repetitions, which can be represented as repetitions with modifications. We propose a hierarchical Bayesian model in which the generation of a language model is described with a Dirichlet process and the production of musical notes is described with a hierarchical hidden Markov model (HMM) that incorporates the process of modifying note patterns. We derive an efficient algorithm based on Gibbs sampling for simultaneously inferring from a performance signal the score and the individual language model behind it. Evaluations showed that the proposed model outperformed previously studied HMM-based models.


Journal of New Music Research | 2018

Generative statistical models with self-emergent grammar of chord sequences

Hiroaki Tsushima; Eita Nakamura; Katsutoshi Itoyama; Kazuyoshi Yoshii

ABSTRACT Generative statistical models of chord sequences play crucial roles in music processing. To capture syntactic similarities among certain chords (e.g. in C major key, between G and G7 and between F and Dm), we study hidden Markov models and probabilistic context-free grammar models with latent variables describing syntactic categories of chord symbols and their unsupervised learning techniques for inducing the latent grammar from data. Surprisingly, we find that these models often outperform conventional Markov models in predictive power, and the self-emergent categories often correspond to traditional harmonic functions. This implies the need for chord categories in harmony models from the informatics perspective.


european signal processing conference | 2016

A unified Bayesian model of time-frequency clustering and low-rank approximation for multi-channel source separation

Kousuke Itakura; Yoshiaki Bando; Eita Nakamura; Katsutoshi Itoyama; Kazuyoshi Yoshii

This paper presents a statistical method of multichannel source separation, called NMF-LDA, that unifies nonnegative matrix factorization (NMF) and latent Dirichlet allocation (LDA) in a hierarchical Bayesian manner. If the frequency components of sources are sparsely distributed, the source spectrograms can be considered to be disjoint with each other in most time-frequency bins. Under this assumption, LDA has been used for clustering time-frequency bins into individual sources using spatial information. A way to improve LDA-based source separation is to consider the empirical fact that source spectrograms tend to have low-rank structure. To leverage both the sparseness and low-rankness of source spectrograms, our method iterates an LDA-step (hard clustering of time-frequency bins) that gives deficient source spectrograms and an NMF-step (low-rank matrix approximation) that completes the deficient bins of those spectrograms. Experimental results showed the proposed method outperformed conventional methods.


international conference on acoustics, speech, and signal processing | 2017

Bayesian multichannel nonnegative matrix factorization for audio source separation and localization

Kousuke Itakura; Yoshiaki Bando; Eita Nakamura; Katsutoshi Itoyama; Kazuyoshi Yoshii; Tatsuya Kawahara

This paper presents a Bayesian extension of multichannel nonnegative matrix factorization (MNMF) that decomposes the complex spectrograms of mixture signals recorded by a microphone array into basis spectra, their temporal activations, and the spatial correlation matrices of sources (directions) in the time-frequency-channel domain. Although the original MNMF can be used in a blind setting, prior knowledge of a microphone array is useful for improving source separation. The impulse response (spatial correlation matrix) of each direction can be measured in an anechoic room, however, it differs from that in a real environment where the microphone array is used. To solve this, we propose a unified Bayesian model of source separation and localization by introducing a prior distribution determined by an anechoic spatial correlation matrix on a real spatial correlation matrix with respect to each direction. This enables us to adaptively estimate a real spatial correlation matrix and the direction of each source. Experimental results showed that our method outperformed the original MNMF and the state-of-the-art methods with prior knowledge in terms of signal-to-distortion ratio (SDR) even when the method was used in an unknown environment with acoustic characteristics different from those of the anechoic room.


Journal of robotics and mechatronics | 2017

Audio-Visual Beat Tracking Based on a State-Space Model for a Robot Dancer Performing with a Human Dancer

Misato Ohkita; Yoshiaki Bando; Eita Nakamura; Katsutoshi Itoyama; Kazuyoshi Yoshii

This paper presents a real-time beat-tracking method that integrates audio and visual information in a probabilistic manner to enable a humanoid robot to dance in synchronization with music and human dancers. Most conventional music robots have focused on either music audio signals or movements of human dancers to detect and predict beat times in real time. Since a robot needs to record music audio signals with its own microphones, however, the signals are severely contaminated with loud environmental noise. To solve this problem, we propose a state-space model that encodes a pair of a tempo and a beat time in a state-space and represents how acoustic and visual features are generated from a given state. The acoustic features consist of tempo likelihoods and onset likelihoods obtained from music audio signals and the visual features are tempo likelihoods obtained from dance movements. The current tempo and the next beat time are estimated in an online manner from a history of observed features by using a particle filter. Experimental results show that the proposed multi-modal method using a depth sensor (Kinect) to extract skeleton features outperformed conventional mono-modal methods in terms of beat-tracking accuracy in a noisy and reverberant environment.


IEEE Transactions on Audio, Speech, and Language Processing | 2016

Real-time audio-to-score alignment of music performances containing errors and arbitrary repeats and skips

Tomohiko Nakamura; Eita Nakamura; Shigeki Sagayama

This paper discusses real-time alignment of audio signals of music performance to the corresponding score (a.k.a. score following) which can handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips) in performances. This type of score following is particularly useful in automatic accompaniment for practices and rehearsals, where errors and repeats/skips are often made. Simple extensions of the algorithms previously proposed in the literature are not applicable in these situations for scores of practical length due to the problem of large computational complexity. To cope with this problem, we present two hidden Markov models of monophonic performance with errors and arbitrary repeats/skips, and derive efficient score-following algorithms with an assumption that the prior probability distributions of score positions before and after repeats/skips are independent from each other. We confirmed real-time operation of the algorithms with music scores of practical length (around 10000 notes) on a modern laptop and their tracking ability to the input performance within 0.7 s on average after repeats/skips in clarinet performance data. Further improvements and extension for polyphonic signals are also discussed.


International Conference on Mathematics and Computation in Music | 2015

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals

Eita Nakamura; Shinji Takaki

For the purpose of quantitatively characterising polyphonic music styles, we study computational analysis of some traditionally recognised harmonic and melodic features and their statistics. While a direct computational analysis is not easy due to the need for chord and key analysis, a method for statistical analysis is developed based on relations between these features and successions of pitch-class (pc) intervals extracted from polyphonic music data. With these relations, we can explain some patterns seen in the model parameters obtained from classical pieces and reduce a significant number of model parameters (110 to five) without heavy deterioration of accuracies of discriminating composers in and around the common practice period, showing the significance of the features. The method can be applied for polyphonic music style analyses for both typed score data and performed MIDI data, and can possibly improve the state-of-the-art music style classification algorithms.

Collaboration


Dive into the Eita Nakamura's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nobutaka Ono

National Institute of Informatics

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Masataka Goto

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge