Yoh’ichi Tohkura | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yoh’ichi Tohkura is active.

Explore More

Publication

Featured researches published by Yoh’ichi Tohkura.

Attention Perception & Psychophysics | 1999

Training Japanese listeners to identify English /r/and /l /: Long-term retention of learning in perception and production

Ann R. Bradlow; Reiko Akahane-Yamada; David B. Pisoni; Yoh’ichi Tohkura

Previous work from our laboratories has shown that monolingual Japanese adults who were given intensive high-variability perceptual training improved in both perception and production of English /r/-/l/ minimal pairs. In this study, we extended those findings by investigating the long-term retention of learning in both perception and production of this difficult non-native contrast. Results showed that 3 months after completion of the perceptual training procedure, the Japanese trainees maintained their improved levels of performance on the perceptual identification task. Furthermore, perceptual evaluations by native American English listeners of the Japanese trainees’ pretest, posttest, and 3-month follow-up speech productions showed that the trainees retained their long-term improvements in the general quality, identifiability, and overall intelligibility of their English /r/-/l/ word productions. Taken together, the results provide further support for the efficacy of high-variability laboratory speech sound training procedures, and suggest an optimistic outlook for the application of such procedures for a wide range of “special populations.” nt]mis|This work was supported by NIDCD Training Grant DC-00012 and by NIDCD Research Grant DC-00111 to Indiana University.

Attention Perception & Psychophysics | 1992

The effects of experimental variables on the perception of American English /r/ and /l/ by Japanese listeners.

Reiko A. Yamada; Yoh’ichi Tohkura

The effects of variations in response categories, subjects’ perception of natural speech, and stimulus range on the identification of American English /r/ and /l/ by native speakers of Japanese were investigated. Three experiments using a synthesized /rait/-/lait/ series showed that all these variables affected identification and discrimination performance by Japanese-subjects. Furthermore, some of the perceptual characteristics of /r/ and /l/ for Japanese listeners were clarified: (1) Japanese listeners identified some of the stimuli of the series-as/w/.(2). Apositive correlation between the perception of synthesized stimuli and naturally-spoken stimuli was found. Japanese listeners who were able to easily identify naturally spoken stimuli perceived the synthetic series categorically but still perceived a /w/ category on the series. (3) The stimulus range showed a striking effect on identification consistency; identification of /r/ and /l/ was strongly affected by the stimulus range, the /w/ identification less so. This indicates that Japanese listeners tend to make relative judgments between /r/ and /l/.

Journal of the Acoustical Society of America | 1996

Cepstral representation of speech motivated by time–frequency masking: An application to speech recognition

Kiyoaki Aikawa; Harald Singer; Hideki Kawahara; Yoh’ichi Tohkura

A new spectral representation incorporating time-frequency forward masking is proposed. This masked spectral representation is efficiently represented by a quefrency domain parameter called dynamic-cepstrum (DyC). Automatic speech recognition experiments have demonstrated that DyC powerfully improves performance in phoneme classification and phrase recognition. This new spectral representation simulates a perceived spectrum. It enhances formant transition, which provides relevant cues for phoneme perception, while suppressing temporally stationary spectral properties, such as the effect of microphone frequency characteristics or the speaker-dependent time-invariant spectral feature. These features are advantageous for speaker-independent speech recognition. DyC can efficiently represent both the instantaneous and transitional aspects of a running spectrum with a vector of the same size as a conventional cepstrum. DyC is calculated from a cepstrum time sequence using a matrix lifter. Each column vector of the matrix lifter performs spectral smoothing. Smoothing characteristics are a function of the time interval between a masker and a signal. DyC outperformed a conventional cepstrum parameter obtained through linear predictive coding (LPC) analysis for both phoneme classification and phrase recognition by using hidden Markov models (HMMs). Compared with speaker-dependent recognition, an even greater improvement over the cepstrum parameter was found in speaker-independent speech recognition. Furthermore, DyC with only 16 coefficients exhibited higher speech recognition performance than a combination of the cepstrum and a delta-cepstrum with 32 coefficients for the classification experiment of phonemes contaminated by noises.

Journal of the Acoustical Society of America | 1995

Testing the importance of talker variability in non-native speech contrast training

James S. Magnuson; Reiko A. Yamada; Yoh’ichi Tohkura; Ann R. Bradlow

In contrast to results of training with stimuli produced by 5 talkers, Lively et al. [ 1242–1255 (1993)] reported that Japanese adults trained to perceive English /r/ and /l/ with stimuli produced by a single talker failed to improve from pretest to post‐test, or to generalize to novel stimuli. That study was extended by training 5 groups of subjects each with a different talker, and by examining the retention of learning after 3 and 6 months. The previous results were partially replicated: Although all subjects showed significant learning during training, subjects in 3 of the 5 groups did not show significant improvement in a pretest–post‐test comparison, did not generalize well to new stimuli, and did not show good retention in 3‐ and 6‐month follow‐up tests. Subjects in two of the five groups improved significantly from pretest to post‐test, generalized well to new stimuli, and showed retention comparable to that of subjects trained with multiple talkers. The results indicate that while multiple‐talker...

Journal of the Acoustical Society of America | 1993

Training Japanese listeners to identify English /r/ and /l/: A replication and extension

David B. Pisoni; Scott E. Lively; Reiko A. Yamada; Yoh’ichi Tohkura; Tsuneo Yamada

Monolingual native speakers of Japanese were trained to identify English /r/ and /l/ using a modified version of Logan, Lively, and Pisoni’s [J. S. Logan et al., J. Acoust. Soc. Am. 89, 874–886 (1991)] high variability training procedure. Both the talker’s voice and the phonetic environment were varied during training. Subjects improved in their ability to identify /r/ and /l/ from the pre‐test to the post‐test and during training. Generalization accuracy depended on the voice of the talker producing the /r/–/l/ contrasts: Subjects were significantly more accurate when words were produced by a familiar talker than when they were produced by an unfamiliar talker. Three months after the conclusion of training, subjects were given the post‐test and the tests of generalization again. Surprisingly, accuracy decreased only slightly on each test, even though no training or exposure to /r/ and /l/ occurred during the 3‐month interval. These results demonstrate that the high variability training paradigm is effect...

Journal of the Acoustical Society of America | 1995

Pitch ringing induced by frequency‐modulated tones

Kiyoaki Aikawa; Minoru Tsuzaki; Hideki Kawahara; Yoh’ichi Tohkura

It was discovered that specific FM (frequency‐modulated) tones induce a ringing of a perceived pitch. A mathematical model was derived to explain this phenomenon. An abrupt change of the slope in a unidirectional FM tone induces ringing of the perceived pitch. The typical ring‐inducing FM tone had a piecewise linear frequency trajectory in the log frequency axis with three parts: (1) frequency onset at 1 kHz rising to 0.732 kHz in 200 ms; (2) constant frequency at 1.732 kHz for 200 ms; and (3) frequency rising to 3 kHz in 200 ms. Several listeners reported one to three ringings around the middle part of the piecewise linear sweep. Frequent repetition of the listening test decreased the sensitivity of ringing perception. The ringing phenomenon can be explained by a second‐order system as a functional model of sweep tracking. In order to provide experimental evidence for the second‐order tracking system, specific values for the natural frequency and the damping factor were estimated. An inverse filter of this system was then constructed and was called the antiringing filter. In a psychophysical experiment, subjects compared the original piecewise sweep tone to that tone processed by the antiringing filter. Results demonstrated that pitch ringing was significantly suppressed by the antiringing filter.

Journal of the Acoustical Society of America | 1996

Three converging tests of improvement in speech production after perceptual identification training on a non‐native phonetic contrast

Ann R. Bradlow; Reiko Akahane-Yamada; David B. Pisoni; Yoh’ichi Tohkura

Previous work from these laboratories has shown that monolingual Japanese adults who were subjected to intensive perceptual identification training improved in both perception and production of English /r/‐/l/ minimal pairs. The present study expanded on this finding by comparing the results from three different perceptual analysis procedures. The initial test was a direct comparison by American English listeners of the Japanese trainees’ pre‐test and post‐test utterances. This test assessed the extent to which AE listeners could discriminate pre‐test and post‐test productions. The second test used a two‐alternative forced‐choice identification paradigm to assess the intelligibility of the pre‐test and post‐test /r/–/l/ words. The final test employed an open‐set transcription task in which the AE listeners simply transcribed the Japanese productions. Each of these perceptual analysis tests submitted the Japanese productions to increasingly stringent judgment criteria, and thus provided different informati...

Journal of the Acoustical Society of America | 1992

Dynamic cepstral parameter incorporating time‐frequency masking and its application to speech recognition

Kiyoaki Aikawa; Hideki Kawahara; Yoh’ichi Tohkura

A ‘‘dynamic cepstrum’’ is proposed as a new spectral parameter that outperforms conventional cepstrum in speech recognition. The dynamic cepstrum incorporates both the static and dynamic aspects of speech spectral sequences by implementing forward masking, which is one of the most important mechanisms for extracting the spectral dynamics that provide acoustic cues in speech perception. Recent research on auditory perception [E. Miyasaka, J. Acoust. Soc. Jpn. 39, 614–623 (1983) (in Japanese)] reports that forward masking becomes more widespread over the frequency axis as the masker‐signal interval increases. This masking characteristic facilitates the novel filtering methodology of time‐dependent spectral filtering. The new dynamic cepstrum spectral parameter can simulate this function. The parameter is obtained by subtracting the masking level from the current cepstrum. The masking level at the current time is calculated as the sum of the masking levels obtained by filtering the preceding spectral sequenc...

Journal of the Acoustical Society of America | 1997

Effects of audio‐visual training on the identification of English /r/ and /l/ by Japanese speakers

Reiko Akahane-Yamada; Ann R Bradlow; David B. Pisoni; Yoh’ichi Tohkura

Two groups of Japanese speakers were trained to identify AE /r/ and /l/ using two different types of training: audio visual and auditory only. In audiovisual training, a movie of the talker’s face was presented together with the auditory stimuli, whereas only auditory stimuli were presented in audio‐only training. Improvement in /r/–/l/ identification from pretest to post‐test on three types of tests (audio‐only, visual only and audiovisual) did not differ substantially across the two training groups. Interestingly, the audio‐only group showed improved identification in the visual‐only tests, suggesting that training in the auditory domain transferred to the visual domain. A McGurk‐type test using /r/ and /l/ stimuli with conflicting audio and visual information was also conducted. Identification accuracies on this test showed a greater effect of conflicting visual information at post‐test than at pretest for the audio–visual training group, but not for the audio‐only training group, suggesting that audio‐visual training facilitated integration of auditory and visual information. Taken together, these results suggest that the internal representation of second‐language phonetic categories incorporates both auditory and visual information. Implications for theories of perceptual learning and phonological development will be discussed.

Wiley Encyclopedia of Electrical and Electronics Engineering | 1999