Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tatsuya Hirahara is active.

Publication


Featured researches published by Tatsuya Hirahara.


Speech Communication | 2010

Silent-speech enhancement using body-conducted vocal-tract resonance signals

Tatsuya Hirahara; Makoto Otani; Shota Shimizu; Tomoki Toda; Keigo Nakamura; Yoshitaka Nakajima; Kiyohiro Shikano

The physical characteristics of weak body-conducted vocal-tract resonance signals called non-audible murmur (NAM) and the acoustic characteristics of three sensors developed for detecting these signals have been investigated. NAM signals attenuate 50dB at 1kHz; this attenuation consists of 30-dB full-range attenuation due to air-to-body transmission loss and -10dB/octave spectral decay due to a sound propagation loss within the body. These characteristics agree with the spectral characteristics of measured NAM signals. The sensors have a sensitivity of between -41 and -58dB [V/Pa] at 1kHz, and the mean signal-to-noise ratio of the detected signals was 15dB. On the basis of these investigations, three types of silent-speech enhancement systems were developed: (1) simple, direct amplification of weak vocal-tract resonance signals using a wired urethane-elastomer NAM microphone, (2) simple, direct amplification using a wireless urethane-elastomer-duplex NAM microphone, and (3) transformation of the weak vocal-tract resonance signals sensed by a soft-silicone NAM microphone into whispered speech using statistical conversion. Field testing of the systems showed that they enable voice impaired people to communicate verbally using body-conducted vocal-tract resonance signals. Listening tests demonstrated that weak body-conducted vocal-tract resonance sounds can be transformed into intelligible whispered speech sounds. Using these systems, people with voice impairments can re-acquire speech communication with less effort.


Journal of the Acoustical Society of America | 2009

Numerical study on source-distance dependency of head-related transfer functions

Makoto Otani; Tatsuya Hirahara; Shiro Ise

This paper investigates the source-distance dependency of head-related transfer functions (HRTFs) on the horizontal and median sagittal planes using the boundary-element method and a dummy head scanned with laser and computer tomography scanners. First, the HRTF spectra are compared among various source positions in a head-centered coordinate system, confirming that the major HRTF spectral features vary with source distance as stated in previous works. Furthermore, the HRTF spectra are compared in an ear-centered coordinate system, revealing how the outer ear angle of incidence affects the source-distance dependency of the HRTFs. Next, the comparison across coordinate systems reveals that the source-distance dependency of the ipsilateral HRTFs on the horizontal plane is mainly attributable to the outer ear angle of incidence, whereas the contralateral HRTFs vary with the source distance mainly due to the heads presence. Finally, results also show that, in an ear-centered coordinate system, the ipsilateral HRTFs do not depend strongly on a source distance greater than 0.2 m from the center of the head, whereas the contralateral HRTFs depend on source distance less than 1.8 m. Results also show that HRTFs on the median sagittal plane depend on a source distance of less than 0.4 m.


Presence: Teleoperators & Virtual Environments | 2008

Sound localization using an acoustical telepresence robot: Telehead ii

Iwaki Toshima; Shigeaki Aoki; Tatsuya Hirahara

TeleHead I is an acoustical telepresence robot that we built on the basis of the concept that remote sound localization could be best achieved by using a user-like dummy head whose movement synchronizes with the users head movement in real time. We clarified the characteristics of the latest version of TeleHead I, TeleHead II, and verified the validity of this concept by sound localization experiments. TeleHead II can synchronize stably with the users head movement with a 120-ms delay. The driving noise level measured through headphones is below 24 dB SPL from 1 to 4 kHz. The shape difference between the dummy head and the user is about 3 in head width and 5 in head length. An overall measurement metric indicated that the difference between the head-related transfer functions (HRTFs) of the dummy head and the modeled listener is about 5 dB. The results of the sound localization experiments using TeleHead II clarified that head movement improves horizontal-plane sound localization performance even when the dummy head shape differs from the users head shape. In contrast, the results for head movement when the dummy head shape and user head shape are different were inconsistent in the median plane. The accuracy of sound localization when using the same-shape dummy head with movement tethered to the users head movement was always good. These results show that the TeleHead concept is acceptable for building an acoustical telepresence robot. They also show that the physical characteristics of TeleHead II are sufficient for conducting sound localization experiments.


Journal of the Acoustical Society of America | 1996

One, two, many—Judging the number of concurrent talkers.

Makio Kashino; Tatsuya Hirahara

The ability of listeners to judge the number of concurrent talkers was examined. Ten female and 11 male Japanese talkers each recorded 20 familiar Japanese words consisting of four consonant–vowel syllables each. In each trial, a number of different talkers was chosen randomly from the same‐sex group, and presented synchronously to four native Japanese listeners, who were asked to judge how many talkers were present. The range of talker numbers was unknown to the listeners. To eliminate cues associated with level, the over‐all sound pressure level was varied randomly in each trial, with RMS levels of the individual words equalized. It was found that judgments were nearly perfect for up to two talkers, but deteriorated abruptly for three or more talkers. In the latter case, the number of talkers was underestimated, although estimates increased slightly as the number of talkers increased. Factors that may promote sound source separation, such as lexicality (e.g., forward versus reverse speech) and spatial s...


IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 2008

Auditory Artifacts due to Switching Head-Related Transfer Functions of a Dynamic Virtual Auditory Display

Makoto Otani; Tatsuya Hirahara

Auditory artifacts due to switching head-related transfer functions (HRTFs) are investigated, using a software-implemented dynamic virtual auditory display (DVAD) developed by the authors. The DVAD responds to a listeners head rotation using a head-tracking device and switching HRTFs to present a highly realistic 3D virtual auditory space to the listener. The DVAD operates on Windows XP and does not require high-performance computers. A total system latency (TSL), which is the delay between head motion and the corresponding change of the ear input signal, is a significant factor of DVADs. The measured TSL of our DVAD is about 50 ms, which is sufficient for practical applications and localization experiments. Another matter of concern is the auditory artifact in DVADs caused by switching HRTFs. Switching HRTFs gives rise to wave discontinuity of synthesized binaural signals, which can be perceived as click noises that degrade the quality of presented sound image. A subjective test and excitation patterns (EPNs) analysis using an auditory filter are performed with various source signals and HRTF spatial resolutions. The results of the subjective test reveal that click noise perception depends on the source signal and the HRTF spatial resolution. Furthermore, EPN analysis reveals that switching HRTFs significantly distorts the EPNs at the off signal frequencies. Such distortions, however, are masked perceptually by broad-bandwidth source signals, whereas they are not masked by narrow-bandwidth source signals, thereby making the click noise more detectable. A higher HRTF spatial resolution leads to smaller distortions. But, depending on the source signal, perceivable click noises still remain even with 0.5-degree spatial resolution, which is less than minimum audible angle (1 degree in front).


international conference on acoustics, speech, and signal processing | 1989

A computational cochlear nonlinear preprocessing model with adaptive Q circuits

Tatsuya Hirahara; Takashi Komakine

A computational nonlinear cochlear filter model with adaptive Q circuits is described. The model is built by introducing adaptive Q circuits into the linear cascade/parallel cochlear filter bank. The adaptive Q circuit is composed of two parts: a second-order low-pass function (LPF) and a Q decision circuit which calculates the LPFs Q in every time frame according to the input spectrum level. This model functionally simulates three level-dependent characteristics observed in the basilar membrane motion: level-dependent selectivity, level-dependent sensitivity, and level-dependent resonance frequency shift. The model output gives better internal speech spectrum representation than those of a linear cochlear filter bank. Weak consonant and higher formants are enhanced, both the temporal and the harmonic structure are consistent at the same time frame, and the spectra are spread and enhanced where the spectrum changes abruptly. These advantages, which are the phenomena observed in the real auditory frequency analysis, support the effectiveness of the model.<<ETX>>


Journal of the Acoustical Society of America | 1988

On the role of the fundamental frequency in vowel perception

Tatsuya Hirahara

Vowel identification tests were carried out using 200 synthesized vowel‐like stimuli to examine the role of the fundamental frequency F0 in vowel perception. These stimuli were synthetic versions of the five Japanese vowels, /i/, /e/, /a/, /o/, and /u/, of which the F0 and/or the formant frequencies Fi (i = 1,2,3,4) were modified: ten F0 values were formed by adding n/3 Bark (n = 0,1,…,9) to the original F0. Four formant frequency sets were formed by adding m Bark (m = 0,1,2,3) to the original formant frequencies for each vowel. The results are the following: (1) perceived vowel height articulation shifts upward when the F0 shifts upward, while all formant frequencies remain the same: (2) this shift in vowel height is more distinct amid mid and low vowels than for high vowels; and (3) vowel height does not change when the F0 as well as all formant frequencies are shifted upward the same amount along the Bark scale. Further results, along with the hypothesis that a high F0 is regarded as the first formant ...


Journal of the Acoustical Society of America | 2013

Impact of dynamic binaural signal associated with listener's voluntary movement in auditory spatial perception

Tatsuya Hirahara; Daisuke Yoshisaki; Daisuke Morikawa

The effect of listeners voluntary movement on the horizontal sound localization was investigated using a binaural recording/reproduction system with TeleHead, a steerable dummy head. Stimuli were static binaural signals recorded with a still dummy-head in head-still condition, dynamic binaural signals recorded with a dummy-head that followed precise or modified listeners head rotation, dynamic binaural signals produced by steering-wheel rotation with listeners hands in head-still condition, and dynamic binaural signals produced by an experimenter in head-still condition. For the static binaural signals, some were localized within the head and the front-back errors often occurred. For the dynamic binaural signals, none of them was localized within the head, and the front-back confusions seldom occurred. Sound images of the dynamic binaural stimuli produced by head rotation were localized out-of head, while those produced by the steering-wheel rotation or by an experimenter were moving around the listene...


Journal of the Acoustical Society of America | 1990

Investigation of headphones suitable for psychophysical experiments

Tatsuya Hirahara; Kazuo Ueda

In order to find the appropriate headphone to use in psychophysical experiments, the frequency responses of 12 headphones were measured by three physical methods: on an IEC coupler (B&K 4134), on a C coupler attached to a head and torso simulator (Kohken SAMRAI) (Okabe et al., J. Acoust. Soc. Jpn. (E) 5, 95–104), and by using a probe microphone in real ears. The results showed that very few electrostatic circumaural headphones (e.g., STAX SR‐Lambda Pro.) have relatively flat frequency characteristics, with excellent invariance among the measuring methods. In contrast, many dynamic supraaural headphones (e.g., Rion AD02, Beyer DT48, Elega DR831, etc.) have poor frequency characteristics, especially at lower frequencies, with many differences occurring between the three measuring methods. For these headphones, the energy leakage at the lower frequency region is inevitable, since the headphone pad fitting to the pinna is usually incomplete, and acoustic impedance of the diaphragm is very high. These undesira...


Journal of the Acoustical Society of America | 1990

A glottal waveform model for high‐quality speech synthesis

Seiichi Tenpaku; Tatsuya Hirahara

A new glottal waveform model for high‐quality speech synthesis is proposed and the results of the perceptual evaluations for synthesized speech using the proposed model and other models are compared. The proposed glottal waveform model consists of two parts: a waveform generator and a spectrum shaping filter. A third‐order polynomial, of which coefficients are determined by combinations of OQ (open quotient), SQ (speed quotient), AV (amplitude of voicing), and F0, is used for the waveform generator. A second‐order IIR filter, which is designed to control the spectral tilt and the relative amplitudes of lower harmonics components by two parameters, serves as the spectrum shaping filter. Thus the parameters have a direct effect on the waveform and its spectral shape. Using the three kinds of information (F0, power, and formant) extracted from the eight different Japanese words pronounced by a male and a female announcer, 80 synthesized speech stimuli were prepared for the preference test. The stimuli were g...

Collaboration


Dive into the Tatsuya Hirahara's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daisuke Morikawa

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Makio Kashino

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shota Shimizu

Toyama Prefectural University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Noriyuki Matsunaga

Toyama Prefectural University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge