Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Akira Maezawa is active.

Publication


Featured researches published by Akira Maezawa.


international conference on acoustics, speech, and signal processing | 2011

Polyphonic audio-to-score alignment based on Bayesian Latent Harmonic Allocation Hidden Markov Model

Akira Maezawa; Hiroshi G. Okuno; Tetsuya Ogata; Masataka Goto

This paper presents a Bayesian method for temporally aligning a music score and an audio rendition. A critical problem in audio-to-score alignment is in dealing with the wide variety of timbre and volume of the audio rendition. In contrast with existing works that achieve this through ad-hoc feature design or careful training of tone models, we propose a Bayesian audio-to-score alignment method by modeling music performance as a Bayesian Hidden Markov Model, each state of which emits a Bayesian signal model based on Latent Harmonic Allocation. After attenuating reverberation, variational Bayes method is used to iteratively adapt the alignment, instrument tone model and the volume balance at each position of the score. The method is evaluated using sixty works of classical music of a variety of instrumentation ranging from solo piano to full orchestra. We verify that our method improves the alignment accuracy compared to dynamic time warping based on chroma vector for orchestral music, or our method employed in a maximum likelihood setting.


Computer Music Journal | 2012

Automated violin fingering transcription through analysis of an audio recording

Akira Maezawa; Katsutoshi Itoyama; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

We present a method to recuperate fingerings for a given piece of violin music in order to recreate the timbre of a given audio recording of the piece. This is achieved by first analyzing an audio signal to determine the most likely sequence of two-dimensional fingerboard locations (string number and location along the string), which recovers elements of violin fingering relevant to timbre. This sequence is then used as a constraint for finding an ergonomic sequence of finger placements that satisfies both the sequence of notated pitch and the given fingerboard-location sequence. Fingerboard-location-sequence estimation is based on estimation of a hidden Markov model, each state of which represents a particular fingerboard location and emits a Gaussian mixture model of the relative strengths of harmonics. The relative strengths of harmonics are estimated from a polyphonic mixture using score-informed source segregation, and compensates for discrepancies between observed data and training data through mean normalization. Fingering estimation is based on the modeling of a cost function for a sequence of finger placements. We tailor our model to incorporate the playing practices of the violin. We evaluate the performance of the fingerboard-location estimator with a polyphonic mixture, and with recordings of a violin whose timbral characteristics differ significantly from that of the training data. We subjectively evaluate the fingering estimator and validate the effectiveness of tailoring the fingering model towards the violin.


international symposium on multimedia | 2009

Bowed String Sequence Estimation of a Violin Based on Adaptive Audio Signal Classification and Context-Dependent Error Correction

Akira Maezawa; Katsutoshi Itoyama; Toru Takahashi; Tetsuya Ogata; Hiroshi G. Okuno

he sequence of strings played on a bowed string instrument is essential to understanding of the fingering. Thus, its estimation is required for machine understanding of violin playing. Audio-based identification is the only viable way to realize this goal for existing music recordings. A naive implementation using audio classification alone, however, is inaccurate and is not robust against variations in string or instruments. We develop a bowed string sequence estimation method by combining audio-based bowed string classification and context-dependent error correction. The robustness against different setups of instruments improves by normalizing the F0-dependent features using the average feature of a recording. The performance of error correction is evaluated using an electric violin with two different brands of strings and an acoustic violin. By incorporating mean normalization, the recognition error of recognition accuracy due to changing the string alleviates by 8 points, and that due to change of instrument by 12 points. Error correction decreases the error due to change of string by 8 points and that due to different instrument by 9 points.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Nonparametric Bayesian dereverberation of power spectrograms based on infinite-order autoregressive processes

Akira Maezawa; Katsutoshi Itoyama; Kazuyoshi Yoshii; Hiroshi G. Okuno

This paper describes a monaural audio dereverberation method that operates in the power spectrogram domain. The method is robust to different kinds of source signals such as speech or music. Moreover, it requires little manual intervention, including the complexity of room acoustics. The method is based on a non-conjugate Bayesian model of the power spectrogram. It extends the idea of multi-channel linear prediction to the power spectrogram domain, and formulates a model of reverberation as a non-negative, infinite-order autoregressive process. To this end, the power spectrogram is interpreted as a histogram count data, which allows a nonparametric Bayesian model to be used as the prior for the autoregressive process, allowing the effective number of active components to grow, without bound, with the complexity of data. In order to determine the marginal posterior distribution, a convergent algorithm, inspired by the variational Bayes method, is formulated. It employs the minorization-maximization technique to arrive at an iterative, convergent algorithm that approximates the marginal posterior distribution. Both objective and subjective evaluations show advantage over other methods based on the power spectrum. We also apply the method to a music information retrieval task and demonstrate its effectiveness.


workshop on applications of signal processing to audio and acoustics | 2015

Unified inter- and intra-recording duration model for multiple music audio alignment

Akira Maezawa; Katsutoshi Itoyama; Kazuyoshi Yoshii; Hiroshi G. Okuno

This paper presents a probabilistic audio-to-audio alignment method that focuses on the relationship among the note durations of different performances of a piece of music. A key issue in probabilistic audio alignment methods is in expressing how interrelated are the durations of notes in the underlying piece of music. Existing studies focus either on the duration of adjacent notes within a recording (intra-recording duration model), or the duration of a given note across different recordings (inter-recording duration model). This paper unifies these approaches through a simple modification to them. Furthermore, the paper extends the unified model, allowing the dynamics of the note duration to change sporadically. Experimental evaluation demonstrated that the proposed models decrease the alignment error.


Computer Music Journal | 2015

Bayesian audio-to-score alignment based on joint inference of timbre, volume, tempo, and note onset timings

Akira Maezawa; Hiroshi G. Okuno

This article presents an offline method for aligning an audio signal to individual instrumental parts constituting a musical score. The proposed method is based on fitting multiple hidden semi-Markov models (HSMMs) to the observed audio signal. The emission probability of each state of the HSMM is described using latent harmonic allocation (LHA), a Bayesian model of a harmonic sound mixture. Each HSMM corresponds to one musical instrument’s part, and the state duration probability is conditioned on a linear dynamics system (LDS) tempo model. Variational Bayesian inference is used to jointly infer LHA, HSMM, and the LDS. We evaluate the capability of the method to align musical audio to its score, under reverberation, structural variations, and fluctuations in onset timing among different parts.


international conference on acoustics, speech, and signal processing | 2014

Audio part mixture alignment based on hierarchical nonparametric Bayesian model of musical audio sequence collection

Akira Maezawa; Hiroshi G. Okuno

This paper proposes “audio part mixture alignment,” a method for temporally aligning multiple audio signals, each of which is a rendition of a non-disjoint subset of a common piece of music. The method decomposes each audio signal into shared components and components unique to each rendition. At the same time, it aligns each audio signal based on the shared component. Decomposition of audio signal is modeled using a hierarchical Dirichlet process (Hierarchical DP, HDP), and sequence alignment is modeled as a left-to-right hidden Markov model (HMM). Variational Bayesian inference is used to jointly infer the alignment and component decomposition. The proposed method is compared with a classic audio-to-audio alignment method, and it is found that the proposed method is more robust to the discrepancy of parts between two audio signals.


international conference on acoustics, speech, and signal processing | 2017

Probabilistic transcription of sung melody using a pitch dynamic model

Luwei Yang; Akira Maezawa; Jordan B. L. Smith; Elaine Chew

Transcribing the singing voice into music notes is challenging due to pitch fluctuations such as portamenti and vibratos. This paper presents a probabilistic transcription method for monophonic sung melodies that explicitly accounts for these local pitch fluctuations. In the hierarchical Hidden Markov Model (HMM), an upper-level ergodic HMM handles the transitions between notes, and a lower-level left-to-right HMM handles the intra- and inter-note pitch fluctuations. The lower-level HMM employs the pitch dynamic model, which explicitly expresses the pitch curve characteristics as the observation likelihood over ƒ0 and Δƒ0 using a compact parametric distribution. A histogram-based tuning frequency estimation method, and some post-processing heuristics to separate merged notes and to allocate spuriously detected short notes, improve the note recognition performance. With model parameters that support intuitions about singing behavior, the proposed method obtained encouraging results when evaluated on a published monophonic sung melody dataset, and compared with state-of-the-art methods.


international symposium/conference on music information retrieval | 2010

Query-by-conducting: An interface to retrieve classical-music interpretations by real-time tempo input

Akira Maezawa; Masataka Goto; Hiroshi G. Okuno


international conference industrial engineering other applications applied intelligent systems | 2010

Violin fingering estimation based on violin pedagogical fingering model constrained by bowed sequence estimation from audio input

Akira Maezawa; Katsutoshi Itoyama; Toru Takahashi; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

Collaboration


Dive into the Akira Maezawa's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge