Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hiroaki Kato is active.

Publication


Featured researches published by Hiroaki Kato.


Journal of the Acoustical Society of America | 2008

Training English listeners to perceive phonemic length contrasts in Japanese.

Keiichi Tajima; Hiroaki Kato; Amanda Rothwell; Reiko Akahane-Yamada; Kevin G. Munhall

The present study investigated the extent to which native English listeners perception of Japanese length contrasts can be modified with perceptual training, and how their performance is affected by factors that influence segment duration, which is a primary correlate of Japanese length contrasts. Listeners were trained in a minimal-pair identification paradigm with feedback, using isolated words contrasting in vowel length, produced at a normal speaking rate. Experiment 1 tested listeners using stimuli varying in speaking rate, presentation context (in isolation versus embedded in carrier sentences), and type of length contrast. Experiment 2 examined whether performance varied by the position of the contrast within the word, and by whether the test talkers were professionally trained or not. Results did not show that trained listeners improved overall performance to a greater extent than untrained control participants. Training improved perception of trained contrast types, generalized to nonprofessional talkers productions, and improved performance in difficult within-word positions. However, training did not enable listeners to cope with speaking rate variation, and did not generalize to untrained contrast types. These results suggest that perceptual training improves non-native listeners perception of Japanese length contrasts only to a limited extent.


Speech Communication | 2009

Analysis on paralinguistic prosody control in perceptual impression space using multiple dimensional scaling

Yoko Greenberg; Nagisa Shibuya; Minoru Tsuzaki; Hiroaki Kato; Yoshinori Sagisaka

A multi-dimensional perceptual space for communicative speech prosodies was derived using a psychometric method from multi-dimensional expressions of impressions to characterize paralinguistic information conveyed by prosody in communication. Single word utterances of n were employed to allow freedom from lexical effects and to cover communicative prosodic variations as much as possible. The analysis of daily conversations showed that conversational speech impressions were manifested in the global F0 control of n as differences of average height (high-low) and dynamic patterns (rise, fall, gradual fall, and rise&fall). Using controlled single utterances of n, multiple dimensional scaling analysis was applied to a mutual distance matrix obtained by 26 dimensional vectors expressing perceptual impressions. The result showed the three-dimensional structure of a perceptual impression space, and each dimension corresponded to different F0 control characteristics. The positive-negative impression can be controlled by average F0 height while confident-doubtful or allowable-unacceptable impressions can be controlled by F0 dynamic patterns. Unlike conventional categorical classification of prosodic patterns frequently observed in studies of emotional prosody, this control characterization enables us to flexibly and quantitatively describe prosodic impressions. These experimental results allow the possibility of input specifications for communicative prosody generation using impression vectors and control through average F0 height and F0 dynamic patterns. Instead of the generation of speech with categorical prototypical prosody, more adequate communicative speech synthesis can be approached through input specification and its correspondence with control characteristics.


international conference on acoustics, speech, and signal processing | 2009

Objective evaluation of English learners' timing control based on a measure reflecting perceptual characteristics

Shizuka Nakamura; Shigeki Matsuda; Hiroaki Kato; Minoru Tsuzaki; Yoshinori Sagisaka

Automatic evaluation of English timing control proficiency is carried out by comparing segmental duration differences between learners and reference native speakers. To obtain an objective measure matched to human subjective evaluation, we introduced a measure reflecting perceptual characteristics. The proposed measure evaluates duration differences weighted by the loudness of the corresponding speech segment and the differences or jumps in loudness from the two adjacent speech segments. Experiments showed that estimated scores using the new perception-based measure provided a correlation coefficient of 0.72 with subjective evaluation scores given by native English speakers on the basis of naturalness in timing control. This correlation turned out to be significantly higher than that of 0.54 obtained when using a simple duration difference measure.


international conference on acoustics, speech, and signal processing | 2011

Pinna sensitivity patterns reveal reflecting and diffracting surfaces that generate the first spectral notch in the front median plane

Parham Mokhtari; Hironori Takemoto; Ryouichi Nishimura; Hiroaki Kato

Finite-Difference Time Domain (FDTD) acoustic simulation was used to calculate Pinna-Related Transfer Functions (PRTFs) of the KEMAR manikins DB60 pinna. A baseline set of 25 PRTFs were first calculated at regular intervals of elevation angle in the front median plane. The simulation was then repeated 1784 times, corresponding to every unique, single-voxel perturbation of the pinnas outer surface geometry. All perturbed PRTFs were compared with the baseline set, in order to precisely quantify the frequency shifts in all the spectral peaks and notches up to 14 kHz. This paper focuses on the pinna sensitivity patterns for the first spectral notch N1, known to be an auditory cue to elevation in the median plane. In particular, N1 sensitivity patterns revealed the elevation dependence of broad areas of the pinnas upper structures that are involved in reflection, and the role of the tragus region involved in diffraction.


international conference on acoustics, speech, and signal processing | 2009

Interpolation of head-related transfer functions by spatial linear prediction

Ryouichi Nishimura; Hiroaki Kato; Naomi Inoue

Head-related transfer functions (HRTFs) are essential for creating a virtual sound source when sound waves are transmitted through headphones. The measurement of HRTFs is a complicated and time-consuming task. Therefore, the interpolation of HRTFs is crucial for virtual auditory display systems in which both listeners and sound objects are likely to move in a virtual auditory space. In this study, HRTFs are measured with a high spatial resolution in order to develop an effective interpolation method. An analysis of HRTFs indicates that the functions exhibit periodicity in amplitude along the azimuthal angle. The optimum filter coefficients required to interpolate HRTFs from several functions measured along multiple azimuthal directions are derived. Computer simulations indicate that in comparison to conventional methods, the proposed method yields less estimation error in the interpolation of HRTFs. Listening tests indicate that the proposed method can provide better perception of virtual sound.


international universal communication symposium | 2010

On the human ability to auditorily perceive human speaker's facing angle

Hiroaki Kato; Hironori Takemoto; Ryouichi Nishimura; Parham Mokhtari

This paper summarizes an empirical study exploring whether or not human listeners can tell the facing direction of a human speaker solely by the auditory sense, and if so, how accurately they are able to do it. The purpose is to find the sound information necessary for ultimately realistic and human-centered telecommunications. The study consists of two parts: listeners performance and acoustic analysis. The first part describes the methodologies of assessing human perception and the results obtained. The second part provides the candidates of sound information that convey speakers facing angles. Twelve blindfolded listeners were tested on their ability to sense the facing angle of a male who spoke a sentence in an anechoic chamber. The listeners were able to indicate the speakers facing angles with a reasonably high confidence. The average response errors were 23.5 degrees for horizontal angles and 12.9 degrees for vertical angles. To estimate acoustic cues that the listeners exploited, the acoustic transfer characteristics from the speakers mouth to the listeners ears were measured for all the facing angles tested. The results suggested that the overall level and spectral tilt were used for the listeners judgment along the front-back axis, and the level difference between left and right ears was used for the judgment along the left-right axis.


international universal communication symposium | 2009

Headphone calibration for 3D-audio listening

Ryouichi Nishimura; Parham Mokhtari; Hironori Takemoto; Hiroaki Kato

This paper proposes a new headphone calibration function for precise reproduction of 3D audio generated using simulated head-related transfer functions (HRTFs) or binaural recordings. In order to compensate for individual characteristics of the earcanal transfer functions and the eardrum impedance, which are generally different from person to person, the method consists of two steps: measuring sound pressure with blocked earcanals and that with open earcanals. The vibration of the eardrum can thereby be precisely reproduced as if the listener were in the original sound scene. Results of experiments using a head and torso simulator (HATS) revealed that sound pressure is correctly reproduced at the position of eardrum as well as at the entrance of the earcanal within a certain wide frequency range.


Journal of the Acoustical Society of America | 2016

Contextual analysis on geminate/singleton identification difficulties for L2 learners of Japanese based on perceptual features

Yanlong Zhang; Hideharu Nakajima; Mee Sonu; Hiroaki Kato; Yoshinori Sagisaka

It is widely known that Japanese geminate/singleton consonant identification is one of the biggest problems for L2 learners. We have been analyzing identification error characteristics based on their perceptually motivated features. In this presentation, after briefly introducing observed remarkable identification error chracteristics on phonetic context differences with speech rate variations, we have tried to analyze and quantify the difficulties using identification error rates by Japanese beginners of Korean natives. 36 geminate/singleton pairs in 2-5 mora with three speech rates were used for identification experiments. To understand the difficulties quantitatively, objectively measurable acoustic variables such as duration length, loudness and its jumps of the corresponding geminate/singleton consonants were employed to carry out prediction analysis for L2’s identification error rates. Correlation between predicted error rates and observed ones has turned out to be around 0.6 with the speech rates w...


2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA) | 2016

L2 speech timing analysis based on L1 timing characteristics

Yanlong Zhang; Hiroaki Kato; Mee Sonu; Yoshinori Sagisaka

Aiming at a better understanding of L2 learning problems resulting from L1 and L2 timing control differences, we analyzed the differences in generation of geminate and singleton consonants which has been known to be a L2 learning problem between Japanese natives and Chinese learners. These differences were analyzed by applying tempo normalization to raw segmental duration measurements. To get a systematic understanding of L2 learners time control characteristics, speech data of learners with different proficiency and syllable structures was employed. Analyses clearly showed that (1) timing control differences between natives and L2 learners gradually decrease as the learners language proficiency increases, (2) the differences between natives and L2 speakers show big variations depending on the syllable structure and (3) timing control differences can be understood based on timing unit differences between morae and syllables. Based on those findings, preliminary results of a novel training method that provides additional moraic context to isolated speech samples are being presented and showed the effectiveness.


international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2015

Analysis on L2 learners' perception errors between geminate and singleton of Japanese consonants using loudness related parameters

Yanlong Zhang; Mee Sonu; Hiroaki Kato; Yoshinori Sagisaka

For better understanding of the identification difficulties in Japanese geminate/singleton consonants for second language (L2) learners, a perceptual factor is newly introduced to supply the insufficiencies of conventional explanations solely using acoustic duration differences. To systematically explain speech-rate related serious errors of geminate/singleton identification in fast/slow speech, loudness related parameters are used to reflect perceptual characteristics on duration. Correlation analyses have shown that these parameters can provide reasonable explanation for error characteristics obtained in the perceptual experiment on Japanese geminate/singleton consonants by Korean learners. A new possibility is suggested to design L2 teaching materials for Japanese geminate/singleton consonants with different phonetic contexts based on expected difficulties resulting from their loudness differences.

Collaboration


Dive into the Hiroaki Kato's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ryouichi Nishimura

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Chatchawarn Hansakunbuntheung

Thailand National Science and Technology Development Agency

View shared research outputs
Top Co-Authors

Avatar

Minoru Tsuzaki

Kyoto City University of Arts

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hironori Takemoto

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Parham Mokhtari

National Institute of Information and Communications Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge