Minoru Tsuzaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Minoru Tsuzaki is active.

Explore More

Publication

Featured researches published by Minoru Tsuzaki.

Journal of the Acoustical Society of America | 1997

Concurrent vowel identification. I. Effects of relative amplitude and F0 difference

Alain de Cheveigné; Hideki Kawahara; Minoru Tsuzaki; Kiyoaki Aikawa

Subjects identified concurrent synthetic vowel pairs that differed in relative amplitude and fundamental frequency (F0). Subjects were allowed to report one or two vowels for each stimulus, rather than forced to report two vowels as was the case in previously reported experiments of the same type. At all relative amplitudes, identification was better at a fundamental frequency difference (ΔF0) of 6% than at 0%, but the effect was larger when the target vowel amplitude was below that of the competing vowel (−10 or −20 dB). The existence of a ΔF0 effect when the target is weak relative to the competing vowel is interpreted as evidence that segregation occurs according to a mechanism of cancellation based on the harmonic structure of the competing vowel. Enhancement of the target based on its own harmonic structure is unlikely, given the difficulty of estimating the fundamental frequency of a weak target. Details of the pattern of identification as a function of amplitude and vowel pair were found to be inco...

Journal of the Acoustical Society of America | 1997

Acceptability for temporal modification of consecutive segments in isolated words

Hiroaki Kato; Minoru Tsuzaki; Yoshinori Sagisaka

Perceptual sensitivity to temporal modification in two consecutive speech segments was measured in word contexts to explore the following two questions: (1) whether there is an interaction between multiple segmental durations, and (2) what aspect of the stimulus context determines the perceptually salient temporal markers? Experiment 1 obtained acceptability ratings for words with temporal modifications. The results showed that the compensatory change in duration of a vowel (V) and its adjacent consonant (C) is not perceptually so salient as expected for the simultaneous modifications in the two segments. This finding suggests the presence of a time perception range wider than a single segment (V or C). The results of experiment 1 also showed that rating scores for compensatory modification between V and C do not depend on the temporal order of modified pairs (VC or CV), but rather on the loudness difference between V and C; the acceptability decreased when the loudness difference between V and C became high. This suggests that perceptually salient markers locate around major jumps in loudness. The second finding, the dependence on the loudness jump, was replicated in experiment 2, which utilized a detection task for temporal modifications on nonspeech stimuli modeling the time-loudness features of the speech stimuli. Experiment 3 further investigated the influence of the temporal order of V and C by utilizing the detection task for the speech stimuli instead of the acceptability ratings.

international conference on acoustics, speech, and signal processing | 2002

Unit selection algorithm for Japanese speech synthesis based on both phoneme unit and diphone unit

Tomoki Toda; Hisashi Kawai; Minoru Tsuzaki; Kiyohiro Shikano

This paper proposes a novel unit selection algorithm for Japanese Text-To-Speech (TTS) systems. Since Japanese syllables consist of CV (C: Consonant, V: Vowel) or V, except when a vowel is devoiced, CV units are basic to concatenative TTS systems for Japanese. However, speech synthesized with CV units sometimes have discontinuities due to V-V concatenation; In order to alleviate such discontinuities, longer units (CV* or non-uniform units) have been proposed. However, the concatenation between V and V is still unavoidable. To address this problem, we propose a novel unit selection algorithm that incorporates not only phoneme units but also diphone units. The concatenation in the proposed algorithm is performed at the vowel center as well as at the phoneme boundary. Results of evaluation experiments clarify that the proposed algorithm outperforms the conventional algorithm.

Speech Communication | 2006

An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis

Tomoki Toda; Hisashi Kawai; Minoru Tsuzaki; Kiyohiro Shikano

In this paper, we evaluate various cost functions for selecting a segment sequence in terms of the correspondence between the cost and perceptual scores to the naturalness of synthetic speech. The results demonstrate that the conventional average cost, which shows the degradation of naturalness over the entire synthetic utterance, has better correspondence to the perceptual scores than the maximum cost, which shows the worst local degradation of naturalness. Furthermore, it is shown that root mean square (RMS) cost, which takes into account both the average cost and the maximum cost, has the best correspondence. We also show that the naturalness of synthetic speech can be improved by using the RMS cost for segment selection. Then, we investigate the effects of applying the RMS cost to segment selection in comparison to those of applying the average cost. Experimental results show that in segment selection based on the RMS cost, a larger number of concatenations causing slight local degradation are performed so that concatenations causing greater local degradation are avoided.

Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002. | 2002

Perceptual evaluation of cost for segment selection in concatenative speech synthesis

Tomoki Toda; Hisashi Kawai; Minoru Tsuzaki; Kiyohiro Shikano

In segment selection for concatenative text-to-speech (TTS), it is important to utilize a cost that corresponds to the perceptual characteristics. We clarify correspondence to the perceptual scores of the cost, and then various functions to integrate the costs are evaluated. The perceptual scores are determined from results of perceptual experiments on the naturalness of synthetic speech. The results show that the average cost, which shows the naturalness degradation over the entire synthetic speech has better correspondence to the perceptual scores than the maximum cost, which shows the local naturalness degradation. Furthermore, RMS (root mean square) cost, which is affected by both the average cost and the maximum cost, has the best correspondence.

international conference on acoustics, speech, and signal processing | 2009

Objective evaluation of English learners' timing control based on a measure reflecting perceptual characteristics

Shizuka Nakamura; Shigeki Matsuda; Hiroaki Kato; Minoru Tsuzaki; Yoshinori Sagisaka

Automatic evaluation of English timing control proficiency is carried out by comparing segmental duration differences between learners and reference native speakers. To obtain an objective measure matched to human subjective evaluation, we introduced a measure reflecting perceptual characteristics. The proposed measure evaluates duration differences weighted by the loudness of the corresponding speech segment and the differences or jumps in loudness from the two adjacent speech segments. Experiments showed that estimated scores using the new perception-based measure provided a correlation coefficient of 0.72 with subjective evaluation scores given by native English speakers on the basis of naturalness in timing control. This correlation turned out to be significantly higher than that of 0.54 obtained when using a simple duration difference measure.

natural language processing and knowledge engineering | 2009

Communicative prosody generation using language common features provided by input lexicons

Yoko Greenberg; Minoru Tsuzaki; Hiroaki Kato; Yoshinori Sagisaka

We already examined language independent control characteristics of the communicative prosody generation using multi-dimensional impressions of input lexicons. In this paper, we synthesized English single phrase utterances using prosodic characteristics of Japanese speech aiming at language independent applications. The reading-style speech prosodies of English phrases were modified by prosodic characteristics derived from one-word utterance of Japanese speech “n”. Modifications were carried out based on lexical impressions corresponding to six impressions consisting of confident, doubtful, allowable, unacceptable, positive and negative. The perceptual evaluation experiment showed the naturalness of speech with communicative prosody modified by the impression of input lexicons. These experimental results support the usefulness of the communicative prosody control based on the impression of input lexicons and suggest possibilities of language independent applications.

Journal of the Acoustical Society of America | 1995

Pitch ringing induced by frequency‐modulated tones

Kiyoaki Aikawa; Minoru Tsuzaki; Hideki Kawahara; Yoh’ichi Tohkura

It was discovered that specific FM (frequency‐modulated) tones induce a ringing of a perceived pitch. A mathematical model was derived to explain this phenomenon. An abrupt change of the slope in a unidirectional FM tone induces ringing of the perceived pitch. The typical ring‐inducing FM tone had a piecewise linear frequency trajectory in the log frequency axis with three parts: (1) frequency onset at 1 kHz rising to 0.732 kHz in 200 ms; (2) constant frequency at 1.732 kHz for 200 ms; and (3) frequency rising to 3 kHz in 200 ms. Several listeners reported one to three ringings around the middle part of the piecewise linear sweep. Frequent repetition of the listening test decreased the sensitivity of ringing perception. The ringing phenomenon can be explained by a second‐order system as a functional model of sweep tracking. In order to provide experimental evidence for the second‐order tracking system, specific values for the natural frequency and the damping factor were estimated. An inverse filter of this system was then constructed and was called the antiringing filter. In a psychophysical experiment, subjects compared the original piecewise sweep tone to that tone processed by the antiringing filter. Results demonstrated that pitch ringing was significantly suppressed by the antiringing filter.

PLOS ONE | 2016

A Role of Medial Olivocochlear Reflex as a Protection Mechanism from Noise-Induced Hearing Loss Revealed in Short-Practicing Violinists

Sho Otsuka; Minoru Tsuzaki; Junko Sonoda; Satomi Tanaka; Shigeto Furukawa

Previous studies have indicated that extended exposure to a high level of sound might increase the risk of hearing loss among professional symphony orchestra musicians. One of the major problems associated with musicians’ hearing loss is difficulty in estimating its risk simply on the basis of the physical amount of exposure, i.e. the exposure level and duration. The aim of this study was to examine whether the measurement of the medial olivocochlear reflex (MOCR), which is assumed to protect the cochlear from acoustic damage, could enable us to assess the risk of hearing loss among musicians. To test this, we compared the MOCR strength and the hearing deterioration caused by one-hour instrument practice. The participants in the study were music university students who are majoring in the violin, whose left ear is exposed to intense violin sounds (broadband sounds containing a significant number of high-frequency components) during their regular instrument practice. Audiogram and click-evoked otoacoustic emissions (CEOAEs) were measured before and after a one-hour violin practice. There was a larger exposure to the left ear than to the right ear, and we observed a left-ear specific temporary threshold shift (TTS) after the violin practice. Left-ear CEOAEs decreased proportionally to the TTS. The exposure level, however, could not entirely explain the inter-individual variation in the TTS and the decrease in CEOAE. On the other hand, the MOCR strength could predict the size of the TTS and CEOAE decrease. Our findings imply that, among other factors, the MOCR is a promising measure for assessing the risk of hearing loss among musicians.

Neuroreport | 2013

Activation of the left superior temporal gyrus of musicians by music-derived sounds.

Toshie Matsui; Satomi Tanaka; Koji Kazai; Minoru Tsuzaki; Haruhiro Katayose

Previous studies have suggested that professional musicians comprehend features of music-derived sound even if the sound sequence lacks the traditional temporal structure of music. We tested this hypothesis through behavioral and functional brain imaging experiments. Musicians were better than nonmusicians at identifying scrambled pieces of piano music in which the original temporal structure had been destroyed. Bilateral superior temporal gyri (STG) activity was observed while musicians listened to the scrambled stimuli, whereas this activity was present only in the right STG of nonmusicians under the same experimental conditions. We suggest that left STG activation is related to the processing of deviants, which appears to be enhanced in musicians. This may be because of the superior knowledge of musical temporal structure held by this population.

Explore More