Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hiroaki Sakoe is active.

Publication


Featured researches published by Hiroaki Sakoe.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1978

Dynamic programming algorithm optimization for spoken word recognition

Hiroaki Sakoe; Seibi Chiba

This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition. First, a general principle of time-normalization is given using time-warping function. Then, two time-normalized distance definitions, called symmetric and asymmetric forms, are derived from the principle. These two forms are compared with each other through theoretical discussions and experimental studies. The symmetric form algorithm superiority is established. A new technique, called slope constraint, is successfully introduced, in which the warping function slope is restricted so as to improve discrimination between words in different categories. The effective slope constraint characteristic is qualitatively analyzed, and the optimum slope constraint condition is determined through experiments. The optimized algorithm is then extensively subjected to experimental comparison with various DP-algorithms, previously applied to spoken word recognition by different research groups. The experiment shows that the present algorithm gives no more than about two-thirds errors, even compared to the best conventional algorithm.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1979

Two-level DP-matching--A dynamic programming-based pattern matching algorithm for connected word recognition

Hiroaki Sakoe

This paper reports a pattern matching approach to connected word recognition. First, a general principle of connected word recognition is given based on pattern matching between unknown continuous speech and artificially synthesized connected reference patterns. Time-normalization capability is allowed by use of dynamic programming-based time-warping technique (DP-matching). Then, it is shown that the matching process is efficiently carried out by breaking it down into two steps. The derived algorithm is extensively subjected to recognition experiments. It is shown in a talker-adapted recognition experiment that digit data (one to four digits) connectedly spoken by five persons are recognized with as high as 99.6 percent accuracy. Computation time and memory requirement are both proved to be within reasonable limits.


Journal of the Acoustical Society of America | 1983

Speech recognition system with delayed output

Hiroaki Sakoe; Seibi Chiba

In a speech recognition system of the type including a recognition unit responsive to a voice input and a conditioning input for recognizing the voice input to produce a recognition output, a start signal is produced whenever a voice input exceeds a threshold level and a pause interval detection signal is produced whenever a voice input falls below a threshold level. An output timing signal is produced when the detection signal lasts a preselected interval of time that may be either about 250 milliseconds or about 250 milliseconds plus a delay. The recognition output from the recognition unit produced in response to the detection signal is displayed in response also to the detection signal. The result is delivered to a utilization device in response to the output timing signal. The delay may be given either by a predetermined duration or an interval between those instants at which the above-mentioned 250 milliseconds have just elapsed after production of the detection signal and after production of another pause interval detection signal for a next following voice input. During the delay, it is possible either by a manually operable switch or a cancel voice input to cancel delivery of the recognition result displayed to be incorrect.


international conference on acoustics, speech, and signal processing | 1989

Speaker-independent word recognition using dynamic programming neural networks

Hiroaki Sakoe; Ryosuke Isotani; Kazunaga Yoshida; Ken-ichi Iso; Takao Watanabe

A description is given of speaker-independent word recognition based on a new neural network model called the dynamic programming neural network (DNN), which can treat time-sequence patterns. DNN is based on the integration of a multilayer neural network and dynamic-programming-based matching. Speaker-independent isolated Japanese digit recognition experiments were carried out using data uttered by 107 speakers (50 speakers for training and 57 speakers for testing). The recognition accuracy was 99.3%, suggesting that the model can be effective for speech recognition.<<ETX>>


Journal of the Acoustical Society of America | 1985

Pattern recognition with a warping function decided for each reference pattern by the use of feature vector components of a few channels

Hiroaki Sakoe

In a pattern recognition device according to pattern matching, one or more specific dimensions of vector components are memorized for each reference pattern feature vector sequence in a reference pattern memory for the reference pattern feature vector sequences. A warping function for time-normalizing input pattern feature vectors of a sequence and the vectors of each reference pattern feature vector sequence is determined so as to minimize the difference between a pattern represented by the specific vector components of the specific dimension or dimensions and another pattern represented by the vector components corresponding in the input pattern feature vector sequence to the specific reference pattern feature vector components as regards the dimensions of a space in which each input or reference pattern feature vector is defined. The input pattern feature vector sequence and each reference pattern feature vector sequence are subjected to nonlinear pattern matching with reference to the warping function. The pattern matching may be between the vector components of all dimensions or those of several dimensions including the specific dimension or dimensions. Preferably, one or more dimensions are specified as the specific one or ones by selecting each dimension for which a variation with time of a pattern represented by the reference pattern feature vector components is a maximum of similar variations of patterns represented by the vector components of other dimensions.


Journal of the Acoustical Society of America | 1986

System for recognizing words continuously spoken according to a format

Hiroaki Sakoe

A continuous speech recognition system utilizes a format memory (14) which specifies a sequence of word sets and a plurality of words, or reference patterns, which may be included in each word set. The input pattern sequence is divided into all possible partial patterns having start points p and end points q, and each of these partial patterns is compared with all reference patterns to derive elementary similarity measures. The elementary similarity measures for each combination of a partial pattern and a permitted word in a word set under the specified format are then examined to determine the optimum input pattern segmentation points and corresponding sequence of reference patterns which will yield a maximum similarity result. The maximum similarity is represented by ##EQU1## where S(p(x-1), p(x),n(x)) indicates the degree of similarity between an input partial pattern having a start point p(x-1) and an n point p(x) and a reference word unit n(x) within a word set fx, and K represents the number of word sets permitted according to the specified format.


Journal of the Acoustical Society of America | 1986

System for recognizing a word sequence by dynamic programming and by the use of a state transition diagram

Hiroaki Sakoe

Operation of a continuous speech recognition system operable according to the dynamic programming technique, is controlled by a state transition diagram in compliance with which word sequences to be recognized by the system with reference to a predetermined number of reference words Bns are pronounced. The system comprises a state transition table accessed by the reference words Bns to successively produce particular states ys in the diagram and previous states zs for each particular state y. In cooperation with a recurrence value and an optimum parameter table, a matching unit determines a recurrence value Ty(m) and an optimum parameter set ZUNy(m) according to: Ty(m) = min or max [Tz(u) + D(u, m, n)] z, u, n and ZUNy(m) = arg min or max [Tz(u) + D(u, m, n)], z, u, n where u and m represent an end and a start point of a fragmentary pattern A(u, m) of an input pattern A representative of a word sequence and D(u, m, n), a similarity measure between the fragmentary pattern A(u, m) and a reference word Bn assigned to a permutation of the previous and the particular states z and y. By referring to the optimum parameter table and, as the case may be, to the recurrence value table, a decision unit recognizes the word sequence as a concatenation of optimum ones of the reference words Bns.


Systems and Computers in Japan | 1989

A high-speed dp-matching algorithm based on frame synchronization, beam search and vector quantization

Hiroaki Sakoe; Hiromi Fujii; Kazunaga Yoshida; Masao Watari

This paper discusses the high-speed DP-matching as the speech recognition algorithm including connected word sequence recognition. The first improvement is the frame synchronization. By this elaboration, an improvement of the speed by approximately one order of magnitude is achieved, compared with the consecutive word recognition of two-level DP-matching type, where DP-matching is iterated by assuming that any time in the input speech can be the word boundary. The second improvement is the introduction of the beam search. This paper discusses the practical aspects of combining the beam search and DP-matching. The discussion includes the construction of the work area, control of DP recursive expression and other problems, aiming at an effective reduction of the computational complexity for the recursive expression. The third improvement is the built-in vector quantization. It is shown that an effective reduction of the computational complexity for the local distance can be produced through a skillful integration of the beam search and the vector quantization. Through an evaluation experiment for the discrete word, it is seen that there is a possibility of achieving the speed improvement by a factor of 30. This corresponds to the speed improvement of two or more orders of magnitude, compared with the two-level DP-matching for the consecutive word sequence recognition algorithm.


international conference on acoustics, speech, and signal processing | 1983

A microprocessor for speech recognition

Hisao Ishizuka; Masao Watari; Hiroaki Sakoe; Seibi Chiba; Toshiki Iwata; Tomoko Matsuki; Yuichi Kawakami

A new single-chip microprocessor for speech recognition has been developed utilizing multi-processor architecture and pipelined structure. By DP-matching algorithm, the processor recognizes up to 340 isolated words or 40 connected words in realtime.


IEEE Journal on Selected Areas in Communications | 1985

A Microprocessor for Speech Recognition

Yuichi Kawakami; Hisao Ishizuka; Masao Watari; Hiroaki Sakoe; Toshiaki Hoshi; Toshiki Iwata

A new single-chip microprocessor for speech recognition, the SRP, has been developed, utilizing a multiprocessor architecture and a pipelined structure. It can recognize up to 340 isolated words or 40 connected words in real time. The SRP contains a vector distance calculator, a DP-equation calculator, and an I/O controller operating in a pipelined manner. Algorithm variations and operation parameters are user programmable, and the total size of the SRP program for a typical speech recognition system is about 700 words. The device has been fabricated with n-channel Si-gate E/D MOS technology with 2.5 μm design rules and employs 7296 three-transistor dynamic RAM cells for a total of more than 40 000 transistors.

Researchain Logo
Decentralizing Knowledge