Is this you? Create Your Porfile

Takayuki Arai

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Takayuki Arai is active.

Explore More

Publication

Featured researches published by Takayuki Arai.

international conference on spoken language processing | 1996

Intelligibility of speech with filtered time trajectories of spectral envelopes

Takayuki Arai; Misha Pavel; Hynek Hermansky; Carlos Avendano

The effect of filtering the time trajectories of spectral envelopes on speech intelligibility was investigated. Since the LPC cepstrum forms the basis of many automatic speech recognition systems, the authors filtered time trajectories of the LPC cepstrum of speech sounds, and the modified speech was reconstructed after the filtering. For processing, they applied low-pass, high-pass and band-pass filters. The accuracy results from the perceptual experiments for Japanese syllables show that speech intelligibility is not severely impaired as long as the filtered spectral components have 1) a rate of change faster than 1 Hz when high-pass filtered, 2) a rate of change slower than 24 Hz when low-pass filtered, and 3) a rate of change between 1 and 16 Hz when band-pass filtered.

international conference on acoustics speech and signal processing | 1998

Speech intelligibility in the presence of cross-channel spectral asynchrony

Takayuki Arai; Steven Greenberg

The spectrum of spoken sentences was partitioned into quarter-octave channels and the onset of each channel shifted in time relative to the others so as to desynchronize spectral information across the frequency axis. Human listeners are remarkably tolerant of cross-channel spectral asynchrony induced in this fashion. Speech intelligibility remains relatively unimpaired until the average asynchrony spans three or more phonetic segments. Such perceptual robustness is correlated with the magnitude of the low-frequency (3-6 Hz) modulation spectrum and thus highlights the importance of syllabic segmentation and analysis for robust processing of spoken language. High-frequency channels (>1.5 kHz) play a particularly important role when the spectral asynchrony is sufficiently large as to significantly reduce the power in the low-frequency modulation spectrum (analogous to acoustic reverberation) and may thereby account for the deterioration of speech intelligibility among the hearing impaired under conditions of acoustic interference (such as background noise and reverberation) characteristic of the real world.

international conference on acoustics, speech, and signal processing | 1994

Analysis of phoneme-based features for language identification

Kay M. Berkling; Takayuki Arai; Etienne Barnard

This paper presents an analysis of the phonemic language identification system introduced previously (see Eurospeech, vol.2, p.1307, 1993), now extended to recognize German in addition to English and Japanese. In this system language identification is based on features derived from a superset of phonemes of all three languages. As we increase the number of languages, the need to reduce the feature space becomes apparent. Practical analysis of single-feature statistics in conjunction with linguistic knowledge leads to 90% reduction of the feature space with only a 5% loss in performance. Thus, the system discriminates between Japanese and English with 84.1% accuracy based on only 15 features compared to 84.6% based on the complete set of 318 phonemic features (or 83.6% using 333 broad-category features). Results indicate that a language identification system may be designed based on linguistic knowledge and then implemented with a neural network of appropriate complexity.<<ETX>>

international conference on acoustics speech and signal processing | 1998

On properties of modulation spectrum for robust automatic speech recognition

Noboru Kanedera; Hynek Hermansky; Takayuki Arai

We report on the effect of band-pass filtering of the time trajectories of spectral envelopes on speech recognition. Several types of filter (linear-phase FIR, DCT, and DFT) are studied. Results indicate the relative importance of different components of the modulation spectrum of speech for ASR. General conclusions are: (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz, (2) it is important to preserve the phase information in the modulation frequency domain, (3) the features which include components at around 4 Hz in the modulation spectrum outperform the conventional delta features, (4) the features which represent the several modulation frequency bands with appropriate center frequency and bandwidth increase recognition performance.

Journal of the Acoustical Society of America | 1998

Speech intelligibility is highly tolerant of cross‐channel spectral asynchrony

Steven Greenberg; Takayuki Arai

A detailed auditory analysis of the short‐term acoustic spectrum is generally considered essential for understanding spoken language. This assumption is called into question by the results of an experiment in which the spectrum of spoken sentences (from the TIMIT corpus) was partitioned into quarter‐octave channels and the onset of each channel shifted in time relative to the others so as to desynchronize spectral information across the frequency plane. Intelligibility of sentential material (as measured in terms of word accuracy) is unaffected by a (maximum) onset jitter of 80 ms or less and remains high (>75%) even for jitter intervals of 140 ms. Only when the jitter imposed across channels exceeds 220 ms does intelligibility fall below 50%. These results imply that the cues required to understand spoken language are not optimally specified in the short‐term spectral domain, but may rather be based on some other set of representational cues such as the modulation spectrogram [S. Greenberg and B. Kingsbury, Proc. IEEE ICASSP (1997), pp. 1647–1650]. Consistent with this hypothesis is the fact that intelligibility (as a function of onset‐jitter interval) is highly correlated with the magnitude of the modulation spectrum between 3 and 8 Hz.

IEEE Transactions on Signal Processing | 1997

Reconstruction of a signal using the spectrum-reversal technique

Takayuki Arai; Yuichi Yoshida

Our procedure of real-zero conversion uses a spectrum-reversal technique to convert the information of a bandlimited signal to real zeros. We conducted a simple reconstruction experiment and showed that our proposed method is essentially equivalent to the conventional technique of sine-wave crossings.

conference of the international speech communication association | 1997