Sadaoki Furui | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sadaoki Furui is active.

Explore More

Publication

Featured researches published by Sadaoki Furui.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1986

Speaker-independent isolated word recognition using dynamic features of speech spectrum

Sadaoki Furui

This paper proposes a new isolated word recognition technique based on a combination of instantaneous and dynamic features of the speech spectrum. This technique is shown to be highly effective in speaker-independent speech recognition. Spoken utterances are represented by time sequences of cepstrum coefficients and energy. Regression coefficients for these time functions are extracted for every frame over an approximately 50 ms period. Time functions of regression coefficients extracted for cepstrum and energy are combined with time functions of the original cepstrum coefficients, and used with a staggered array DP matching algorithm to compare multiple templates and input speech. Speaker-independent isolated word recognition experiments using a vocabulary of 100 Japanese city names indicate that a recognition error rate of 2.4 percent can be obtained with this method. Using only the original cepstrum coefficients the error rate is 6.2 percent.

Pattern Recognition Letters | 1997

Recent advances in speaker recognition

Sadaoki Furui

Abstract This paper introduces recent advances in speaker recognition technology. The first part discusses general topics and issues. The second part is devoted to a discussion of more specific topics of recent interest that have led to interesting new approaches and techniques. They include VQ- and ergodic-HMM-based text-independent recognition methods, a text-prompted recognition method, parameter/distance normalization and model adaptation techniques, and methods of updating models and a priori thresholds in speaker verification. Although many recent advances and successes have been achieved in speaker recognition, there are still many problems for which good solutions remain to be found. The last part of this paper describes 16 open questions about speaker recognition. The paper concludes with a short discussion assessing the current status and future possibilities.

Archive | 1996

An Overview of Speaker Recognition Technology

Sadaoki Furui

This chapter overviews recent advances in speaker recognition technology. The first part of the chapter discusses general topics and issues. Speaker recognition can be divided in two ways: (a) speaker identification and verification, and (b) text-dependent and text-independent methods. The second part of the paper is devoted to discussion of more specific topics of recent interest which have led to interesting new approaches and techniques. They include parameter/distance normalization techniques, model adaptation techniques, VQ-/ergodic-HMM-based text-independent recognition methods, and a text-prompted recognition method. The chapter concludes with a short discussion assessing the current status and possibilities for the future.

international conference on acoustics, speech, and signal processing | 1992

Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs

Tomoko Matsui; Sadaoki Furui

A VQ (vector quantization)-distortion-based speaker recognition method and discrete/continuous ergodic HMM (hidden Markov model)-based ones are compared, especially from the viewpoint of robustness against utterance variations. It is shown that a continuous ergodic HMM is far superior to a discrete ergodic HMM. It is also shown that the information on transitions between different states is ineffective for text-independent speaker recognition. Therefore, the speaker identification rates using a continuous ergodic HMM are strongly correlated with the total number of mixtures irrespective of the number of states. It is also found that, for continuous ergodic HMM-based speaker recognition, the distortion-intersection measure (DIM), which was introduced as a VQ-distortion measure to increase the robustness against utterance variations, is effective.<<ETX>>

international conference on acoustics, speech, and signal processing | 1993

Concatenated phoneme models for text-variable speaker recognition

Tomoko Matsui; Sadaoki Furui

Methods that create models to specify both speaker and phonetic information accurately by using only a small amount of training data for each speaker are investigated. For a text-dependent speaker recognition method, in which arbitrary key texts are prompted from the recognizer, speaker-specific phoneme models are necessary to identify the key text and recognize the speaker. Two methods of making speaker-specific phoneme models are discussed: phoneme-adaptation of a phoneme-independent speaker model and speaker-adaptation of universal phoneme models. The authors also investigate supplementing these methods by adding a phoneme-independent speaker model to make up for the lack of speaker information. This combination achieves a rejection rate as high as 98.5% for speech that differs from the key text and a speaker verification rate of 100.0%.<<ETX>>

IEEE Transactions on Speech and Audio Processing | 2004

Speech-to-text and speech-to-speech summarization of spontaneous speech

Sadaoki Furui; Tomonori Kikuchi; Yousuke Shinnaka; Chiori Hori

This paper presents techniques for speech-to-text and speech-to-speech automatic summarization based on speech unit extraction and concatenation. For the former case, a two-stage summarization method consisting of important sentence extraction and word-based sentence compaction is investigated. Sentence and word units which maximize the weighted sum of linguistic likelihood, amount of information, confidence measure, and grammatical likelihood of concatenated units are extracted from the speech recognition results and concatenated for producing summaries. For the latter case, sentences, words, and between-filler units are investigated as units to be extracted from original speech. These methods are applied to the summarization of unrestricted-domain spontaneous presentations and evaluated by objective and subjective measures. It was confirmed that proposed methods are effective in spontaneous speech summarization.

international conference on acoustics, speech, and signal processing | 1989

Unsupervised speaker adaptation method based on hierarchical spectral clustering

Sadaoki Furui

An automatic speaker adaptation method is proposed for speech recognition in which a small amount of training material of unspecified text can be used. This method is easily applicable to vector-quantization-based speech recognition systems where each word is represented as multiple sequences of codebook entries. In the adaptation algorithm, either the codebook is modified for each new speaker or input speech spectra are adapted to the codebook, thereby using codebook sequences universally for all speakers. The important feature of this algorithm is that a set of spectra in training frames and the codebook entries are clustered hierarchically. Based on the deviation vectors between centroids of the training frame clusters and the corresponding codebook clusters, adaptation is performed hierarchically from small to large numbers of clusters. Results of recognition experiments indicate that the proposed adaptation method is highly effective. Possible variations using this method are presented.<<ETX>>

Journal of the Acoustical Society of America | 2004

Fifty years of progress in speech and speaker recognition

Sadaoki Furui

Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus‐base statistical modeling, e.g., HMM and n‐grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time‐normalization to DTW/DP matching, (4) from gdistanceh‐based to likelihood‐based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context‐independent units to context‐dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker‐independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single‐modality (audio signal only) to multi‐modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.

Speech Communication | 1986

Research on individuality features in speech waves and automatic speaker recognition techniques

Sadaoki Furui

Abstract This paper presents an overview of Japanese research on individuality information in speech waves, which have been performed from various points of view. Whereas physical correlates having perceptual voice individuality have been investigated from the psychological viewpoint, research from the engineering viewpoint is related to automatic speaker recognition, speaker-independent speech recognition, and training algorithms in speech recognition. Speaker recognition research can be classified into two classes, depending on whether or not the text is predetermined. However, it has been made clear that even if the text is not predetermined, text-dependent individual information can be used that is based on explicit or implicit phoneme recognition. Various examples of speaker recognition methods are classified into these variations, and their performances are presented in this paper. In particular, this paper focuses on the long-term intra-speaker variability of feature parameters as on the most crucial problems in speaker recognition. Additionally, it presents an investigation into methods for reducing the effects of long-term spectral variability on recognition accuracy.

Speech Communication | 1991

Speaker-dependent-feature extraction, recognition and processing techniques

Sadaoki Furui

Abstract This paper discusses recent advances in and perspectives of research on speaker-dependent-feature extraction from speech waves, automatic speaker identification and verification, speaker adaptation in speech recognition, and voice conversion techniques. Speaker-dependent information exists both in the spectral envelope and in the supra-segmental features of speech. This individual information can be further classified into temporal and dynamic features. Speaker identification/verification methods can be divided into text-dependent and tect-independent methods. Although text-dependent speaker verification techniques have almost reached the level suitable for practical implementation, text-independent techniques are still in the fundamental research stage. Both supervised and unsupervised speaker adaptation algorithms for speech recognition have recently been proposed, and remarkable progress has been achieved in this field. Improving synthesized speech quality by adding natural characteristics of voice individuality, and converting synthesized voice individuality from one speaker to another, are as yet little exploited research fields to be studied in the near future. Research on speaker-dependent information is one of the most important future directions for achieving advanced speech information processing systems.

Explore More