Masuzo Yanagida
Doshisha University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Masuzo Yanagida.
Signal Processing | 2008
Leandro E. Di Persia; Diego H. Milone; Hugo Leonardo Rufiner; Masuzo Yanagida
In a previous article, an evaluation of several objective quality measures as predictors of recognition rate after the application of a blind source separation algorithm was reported. In this work, the experiments were repeated using some new measures, based on the perceptual evaluation of speech quality (PESQ), which is part of the ITU P862 standard for evaluation of communication systems. The raw PESQ and a nonlinearly transformed PESQ were evaluated, together with several composite measures. The results show that the PESQ-based measures outperformed all the measures reported in the previous work. Based on these results, we recommend the use of PESQ-based measures to evaluate blind source separation algorithms for automatic speech recognition.
Signal Processing | 2007
Leandro E. Di Persia; Masuzo Yanagida; Hugo Leonardo Rufiner; Diego H. Milone
The determination of quality of the signals obtained by blind source separation is a very important subject for development and evaluation of such algorithms. When this approach is used as a pre-processing stage for automatic speech recognition, the quality measure of separation applied for assessment should be related to the recognition rates of the system. Many measures have been used for quality evaluation, but in general these have been applied without prior research of their capabilities as quality measures in the context of blind source separation, and often they require experimentation in unrealistic conditions. Moreover, these measures just try to evaluate the amount of separation, and this value could not be directly related to recognition rates. Presented in this work is a study of several objective quality measures evaluated as predictors of recognition rate of a continuous speech recognizer. Correlation between quality measures and recognition rates is analyzed for a separation algorithm applied to signals recorded in a real room with different reverberation times and different kinds and levels of noise. A very good correlation between weighted spectral slope measure and the recognition rate has been verified from the results of this analysis. Furthermore, a good performance of total relative distortion and cepstral measures for rooms with relatively long reverberation time has been observed.
IEEE Transactions on Audio, Speech, and Language Processing | 2009
L. Di Persia; Diego H. Milone; Masuzo Yanagida
Blind separation of convolutive mixtures is a very complicated task that has applications in many fields of speech and audio processing, such as hearing aids and man-machine interfaces. One of the proposed solutions is the frequency-domain independent component analysis. The main disadvantage of this method is the presence of permutation ambiguities among consecutive frequency bins. Moreover, this problem is worst when reverberation time increases. Presented in this paper is a new frequency-domain method, that uses a simplified mixing model, where the impulse responses from one source to each microphone are expressed as scaled and delayed versions of one of these impulse responses. This assumption, based on the similitude among waveforms of the impulse responses, is valid for a small spacing of the microphones. Under this model, separation is performed without any permutation or amplitude ambiguity among consecutive frequency bins. This new method is aimed mainly to obtain separation, with a small reduction of reverberation. Nevertheless, as the reverberation is included in the model, the new method is capable of performing separation for a wide range of reverberant conditions, with very high speed. The separation quality is evaluated using a perceptually designed objective measure. Also, an automatic speech recognition system is used to test the advantages of the algorithm in a real application. Very good results are obtained for both, artificial and real mixtures. The results are significantly better than those by other standard blind source separation algorithms.
Neuroscience Letters | 2010
Ryosuke Tachibana; Masuzo Yanagida; Hiroshi Riquimaroux
Music performance and speech production require neural circuits to integrate auditory information and motor commands to achieve rapid and accurate control of sound properties. This article proposes a novel approach for investigating neural substrates related to audiomotor integration. An experiment examined the brain activities involved in sensorimotor integration in a simplified audiomotor task: pitch regulation using finger-pinching force. The brain activities of the participants were measured using functional magnetic resonance imaging (fMRI) while they were performing the task. Two additional tasks were performed: an auditory-only task in which subjects listened to sound stimuli without any motor action and a motor-only task where they applied their finger force to the sensor in the absence of auditory feedback. The fMRI results showed the brain activities related to the online pitch regulation in the dorsal premotor cortex (dPMC), planum temporale (PT), primary auditory cortex, and part of the midbrain. The involvement of dPMC and PT was consistent with findings in previous studies on other audiomotor systems, implying that these regions appeared to be important for connecting the auditory feedback to motor actions.
international conference on acoustics, speech, and signal processing | 2008
Kenko Ota; Emmanuel Duflos; Philippe Vanheeghe; Masuzo Yanagida
This paper shows a method for the modeling of speech signal distributions based on Dirichlet process mixtures (DPM) and the estimation of noise sequences based on particle filtering. In real situations, the speech recognition rate degrades miser ably because of the effect of environmental noises, reflected waves and so on. To improve the speech recognition rate, a technique for the estimation of noise sequences is necessary. In this paper, the distribution of the clean speech is modeled using the DPM instead of the traditional model, which is a Gaussian mixture model (GMM). Speech signal sequences are generated according to the mean and covariance generated from the DPM. Then, noise signal sequences are estimated with a particle filter. The proposed method using extended Kalman filter (EKF) can improve the speech recognition rate significantly in the low SNR region. Applying unscented Kalman filter (UKF), better results can be obtained in also the high SNR.
ieee automatic speech recognition and understanding workshop | 2003
Kunio Aono; Keiji Yasuda; Toshiyuki Takezawa; Seiichi Yamamoto; Masuzo Yanagida
This paper analyzes acoustic likelihood calculated from two acoustic models, a spontaneous speech acoustic model and a read speech acoustic model, from the viewpoint of linguistic information, such as word category and language likelihood. Experimental results show a significant tendency in the relationship between speaking style and linguistic information. According to the analysis results, a words acoustic likelihood calculated from the spontaneous speech acoustic model is higher, or more suitable, than that from the read speech acoustic model in the case when the word is an interjection or an auxiliary verb. On the other hand, even in human-to-human conversation, a words acoustic likelihood calculated from the read speech acoustic model can be higher than that from the spontaneous speech acoustic model in the case when the word is a noun. Applying this knowledge along with machine learning, post-processing experiments of the results of ASR using these two acoustic models are carried out. In this set of experiments, post-processing, based on a support vector machine, is applied. The experimental results show that the selection scheme, based on word category, reduces word error rate by 1.62 points over the single system.
Journal of the Acoustical Society of America | 1996
Tamotsu Shirado; Masuzo Yanagida
An algorithm for extracting fundamental frequencies from the sound of a violin duet is presented. The proposed algorithm is composed of two processes: (1) framewise enumeration of candidates for fundamental frequencies using the cepstrum method, and (2) grouping among candidates over its adjacent frames using a hierarchical clustering method. Process (1) includes a subprocess of discriminating between the fundamental component of a tone and harmonic components of the other for duet tones in an octave relation, in particular. This algorithm differs from most of the traditional methods for extracting fundamental frequencies in the fact that it introduces loose constraints on target sounds, but no acoustical model is assumed for the instruments concerned. Thus the algorithm is expected to be applicable to many more types of musical instruments than conventional methods. The correct identification rate evaluated by musical notes was 93% for violin duet sounds in the best case.
Journal of the Acoustical Society of America | 2016
Mayuko Yamashita; Masuzo Yanagida; Ichiro Umata; Tsuneo Kato; Seiichi Yamamoto
Tempo is one of the basic factors in music expression and perception. Although there have been studies on the perception of tempo change, little is known about how the type of music experience affects the sensitivity to this change. We analyze the effects of music experience on the perception of tempo change to contribute to music education. Our analysis focuses on sensitivity to tempo change. Participants were classified into three groups according to their musical experience: (A) inexperienced in any musical instrument, (B) players majoring in the piano, and (C) amateur players belonging to brass bands. We performed experiments in which monotone piano sequences that gradually change tempo from the initial inter-onset interval (IOI) to the target IOI were used. We manipulated three tempo change patterns, namely, (I) linear, (II) exponential, and (III) the average of (I) and (II). We compared the point of tempo change perception among the three groups with the assumption that the sensitivity would be high...
171st Meeting of the Acoustical Society of America | 2016
Masuzo Yanagida; Seiichi Yamamoto; Ichiro Umata
Tempo is one of the basic factors in musical expression. Although there are studies on perception of tempo change, little is known about how the mode of tempo change affects sensitivity to the change. In this paper, we analyze the effects of modes of tempo change on perception of tempo change. Our analysis focuses on sensitivity to tempo change. Forty-six subject participants were divided into three groups according to their musical experience and the type of playing they are used to. ((A) 15 inexperienced, (B) 21 pianists mostly playing solo, (C) 10 players of musical instruments other than piano mostly playing in groups). We used synthetic piano single tone sequences that change tempo gradually from the common initial value to various target values as stimuli. We also manipulated the mode of tempo change: linear, exponential, and their average. Subject participants were asked to indicate the time point of perception by pressing a key as soon as they perceived the tempo change. Contrary to our presumptio...
signal processing systems | 2011
Leandro E. Di Persia; Diego H. Milone; Masuzo Yanagida
In a recent publication the pseudoanechoic mixing model for closely spaced microphones was proposed and a blind audio sources separation algorithm based on this model was developed. This method uses frequency-domain independent component analysis to identify the mixing parameters. These parameters are used to synthesize the separation matrices, and then a time-frequency Wiener postfilter to improve the separation is applied. In this contribution, key aspects of the separation algorithm are optimized with two novel methods. A deeper analysis of the working principles of the Wiener postfilter is presented, which gives an insight in its reverberation reduction capabilities. Also a variation of this postfilter to improve the performance using the information of previous frames is introduced. The basic method uses a fixed central frequency bin for the estimation of the mixture parameters. In this contribution an automatic selection of the central bin, based in the information of the separability of the sources, is introduced. The improvements obtained through these methods are evaluated in an automatic speech recognition task and with the PESQ objective quality measure. The results show an increased robustness and stability of the proposed method, enhancing the separation quality and improving the speech recognition rate of an automatic speech recognition system.
Collaboration
Dive into the Masuzo Yanagida's collaboration.
National Institute of Information and Communications Technology
View shared research outputsNational Institute of Advanced Industrial Science and Technology
View shared research outputs