Kiyoaki Aikawa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kiyoaki Aikawa is active.

Explore More

Publication

Featured researches published by Kiyoaki Aikawa.

Journal of the Acoustical Society of America | 1997

Concurrent vowel identification. I. Effects of relative amplitude and F0 difference

Alain de Cheveigné; Hideki Kawahara; Minoru Tsuzaki; Kiyoaki Aikawa

Subjects identified concurrent synthetic vowel pairs that differed in relative amplitude and fundamental frequency (F0). Subjects were allowed to report one or two vowels for each stimulus, rather than forced to report two vowels as was the case in previously reported experiments of the same type. At all relative amplitudes, identification was better at a fundamental frequency difference (ΔF0) of 6% than at 0%, but the effect was larger when the target vowel amplitude was below that of the competing vowel (−10 or −20 dB). The existence of a ΔF0 effect when the target is weak relative to the competing vowel is interpreted as evidence that segregation occurs according to a mechanism of cancellation based on the harmonic structure of the competing vowel. Enhancement of the target based on its own harmonic structure is unlikely, given the difficulty of estimating the fundamental frequency of a weak target. Details of the pattern of identification as a function of amplitude and vowel pair were found to be inco...

meeting of the association for computational linguistics | 2003

Corpus-Based Discourse Understanding in Spoken Dialogue Systems

Ryuichiro Higashinaka; Mikio Nakano; Kiyoaki Aikawa

This paper concerns the discourse understanding process in spoken dialogue systems. This process enables the system to understand user utterances based on the context of a dialogue. Since multiple candidates for the understanding result can be obtained for a user utterance due to the ambiguity of speech understanding, it is not appropriate to decide on a single understanding result after each user utterance. By holding multiple candidates for understanding results and resolving the ambiguity as the dialogue progresses, the discourse understanding accuracy can be improved. This paper proposes a method for resolving this ambiguity based on statistical information obtained from dialogue corpora. Unlike conventional methods that use hand-crafted rules, the proposed method enables easy design of the discourse understanding process. Experiment results have shown that a system that exploits the proposed method performs sufficiently and that holding multiple candidates for understanding results is effective.

international conference on acoustics, speech, and signal processing | 1993

A dynamic cepstrum incorporating time-frequency masking and its application to continuous speech recognition

Kiyoaki Aikawa; Harald Singer; Hideki Kawahara; Yoh'ichi Tohkura

A dynamic cepstrum parameter that incorporates the time-frequency characteristics of auditory forward masking is proposed. A masking model is derived from psychological experimental results. A novel operational method using a lifter array is derived to perform the time-frequency masking. The parameter simulates the effective input spectrum at the front-end of the auditory system and can enhance the spectral dynamics. The parameter represents both the instantaneous and transitional aspects of a spectral time series. Phoneme and continuous speech recognition experiments demonstrated that the dynamic cepstrum outperforms the conventional cepstrum individually and in various combinations with other spectral parameters. The phoneme recognition results were improved for ten male and ten female speakers. The masking lifter with a Gaussian window provided a better performance than that with a square window.<<ETX>>

international conference on acoustics speech and signal processing | 1998

Rejection of out-of-vocabulary words using phoneme confidence likelihood

Takatoshi Jitsuhiro; Satoshi Takahashi; Kiyoaki Aikawa

The rejection of unknown words is important in improving the performance of speech recognition. The anti-keyword model method can reject unknown words with high accuracy in a small vocabulary and specified task. Unfortunately, it is either inconvenient or impossible to apply if words in the vocabulary change frequently. We propose a new method for task independent rejection of unknown words, where a new phoneme confidence measure is used to verify partial utterances. It is used to verify each phoneme while locating candidates. Furthermore, the whole utterance is verified by a phonetic typewriter. This method can improve the accuracy of verification in each phoneme, and improve the speed of candidate search. Tests show that the proposed method improves the recognition rate by 4% compared to the conventional algorithm at equal error rates. Furthermore, a 3% improvement is obtained by training acoustic models with the MCE algorithm.

international conference on acoustics, speech, and signal processing | 1997

Discrete mixture HMM

Satoshi Takahashi; Kiyoaki Aikawa; Shigeki Sagayama

This paper proposes a new type of acoustic model called the discrete mixture HMM (DMHMM). As large scale speech databases have been constructed for speaker-independent HMMs, continuous mixture HMMs (CMHMMs) are needed to increase the number of mixture components in order to represent complex distributions. This leads to a high computational cost for calculating the output probabilities. The DMHMM represents the feature parameter space by using the mixtures of multivariate distributions in the same way as the diagonal covariance CMHMM. Instead of using Gaussian mixtures to represent the feature distributions in each dimension, the DMHMM uses the mixtures of the discrete distributions based on scalar quantization (SQ). Since the discrete distribution has a higher degree-of-freedom in terms of representation, the DMHMM is advantageous in representing the feature distributions efficiently with fewer mixture components. In isolated word recognition experiments for telephone speech, we have found that the DMHMM outperformed the CMHMMs when those models had the same number of mixture components.

Journal of the Acoustical Society of America | 1996

Speech recognition method using time-frequency masking mechanism

Kiyoaki Aikawa; Hideki Kawahara; Yoh'ichi Tohkura

A speech recognition method in which input speech signals are converted to digital signals and then time sequentially converted to cepstrum coefficients or logarithmic spectra. Dynamic spectrum time sequence is obtained by time frequency filtering of cepstrum coefficients, or masked spectrum time sequence is obtained by time frequency masking of the logarithmic vector time sequence. Based on the dynamic cepstrum time sequence or masked spectrum time sequence obtained in this manner, speech is recognized.

ACM Transactions on Speech and Language Processing | 2004

Evaluating discourse understanding in spoken dialogue systems

Ryuichiro Higashinaka; Noboru Miyazaki; Mikio Nakano; Kiyoaki Aikawa

This article describes a method for creating an evaluation measure for discourse understanding in spoken dialogue systems. No well-established measure has yet been proposed for evaluating discourse understanding, which has made it necessary to evaluate it only on the basis of the systems total performance. Such evaluations, however, are greatly influenced by task domains and dialogue strategies. To find a measure that enables good estimation of system performance only from discourse understanding results, we enumerated possible discourse-understanding-related metrics and calculated their correlation with the systems total performance through dialogue experiments.

Journal of the Acoustical Society of America | 1996

Cepstral representation of speech motivated by time–frequency masking: An application to speech recognition

Kiyoaki Aikawa; Harald Singer; Hideki Kawahara; Yoh’ichi Tohkura

A new spectral representation incorporating time-frequency forward masking is proposed. This masked spectral representation is efficiently represented by a quefrency domain parameter called dynamic-cepstrum (DyC). Automatic speech recognition experiments have demonstrated that DyC powerfully improves performance in phoneme classification and phrase recognition. This new spectral representation simulates a perceived spectrum. It enhances formant transition, which provides relevant cues for phoneme perception, while suppressing temporally stationary spectral properties, such as the effect of microphone frequency characteristics or the speaker-dependent time-invariant spectral feature. These features are advantageous for speaker-independent speech recognition. DyC can efficiently represent both the instantaneous and transitional aspects of a running spectrum with a vector of the same size as a conventional cepstrum. DyC is calculated from a cepstrum time sequence using a matrix lifter. Each column vector of the matrix lifter performs spectral smoothing. Smoothing characteristics are a function of the time interval between a masker and a signal. DyC outperformed a conventional cepstrum parameter obtained through linear predictive coding (LPC) analysis for both phoneme classification and phrase recognition by using hidden Markov models (HMMs). Compared with speaker-dependent recognition, an even greater improvement over the cepstrum parameter was found in speaker-independent speech recognition. Furthermore, DyC with only 16 coefficients exhibited higher speech recognition performance than a combination of the cepstrum and a delta-cepstrum with 32 coefficients for the classification experiment of phonemes contaminated by noises.

Journal of Information Processing | 2009

Construction of a Test Collection for Spoken Document Retrieval from Lecture Audio Data

Tomoyosi Akiba; Kiyoaki Aikawa; Yoshiaki Itoh; Tatsuya Kawahara; Hiroaki Nanjo; Hiromitsu Nishizaki; Norihito Yasuda; Yoichi Yamashita; Katunobu Itou

The lecture is one of the most valuable genres of audiovisual data. Though spoken document processing is a promising technology for utilizing the lecture in various ways, it is difficult to evaluate because the evaluation require a subjective judgment and/or the verification of large quantities of evaluation data. In this paper, a test collection for the evaluation of spoken lecture retrieval is reported. The test collection consists of the target spoken documents of about 2, 700 lectures (604 hours) taken from the Corpus of Spontaneous Japanese (CSJ), 39 retrieval queries, the relevant passages in the target documents for each query, and the automatic transcription of the target speech data. This paper also reports the retrieval performance targeting the constructed test collection by applying a standard spoken document retrieval (SDR) method, which serves as a baseline for the forthcoming SDR studies using the test collection.

annual meeting of the special interest group on discourse and dialogue | 2000

WIT: A Toolkit for Building Robust and Real-Time Spoken Dialogu Systems

Mikio Nakano; Noboru Miyazaki; Norihito Yasuda; Akira Sugiyama; Jun-ichi Hirasawa; Kohji Dohsaka; Kiyoaki Aikawa

This paper describes WIT, a toolkit for building spoken dialogue systems. WIT features an incremental understanding mechanism that enables robust utterance understanding and realtime responses. WITs ability to compile domain-dependent system specifications into internal knowledge sources makes building spoken dialogue systems much easier than it is from scratch.

Explore More