Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shinta Kimura is active.

Publication


Featured researches published by Shinta Kimura.


Journal of the Acoustical Society of America | 1994

Voice recognition system having word frequency and intermediate result display features

Shinta Kimura

A voice recognition system includes a microphone for converting a voice to an electrical voice signal including voice and non-voice sound portions. An acoustic processing unit detects power and spectrum of the electrical voice signal, and outputs power time-series data and spectrum time series data. A voice section detection unit uses the power time-series data to detect a start point and an end point of the voice sound portion, and outputs an end decision signal indicative of such end point. A word dictionary stores word labels ordered in accordance with frequency of use, as well as word numbers and word templates. A recognition unit receives the feature time-series data and calculates a degree of similarity between the voice and the word templates. A sorting unit sorts data calculated in the recognition unit in accordance with the degree of similarity. A selection unit selects one or more words having a higher degree of similarity from words sorted in the sorting unit, and outputs these words to a display unit. A word frequency dictionary stores word labels, word numbers, word templates, and frequency data attached to each word label. Finally, a word dictionary sorting unit coupled between the word dictionary and the word frequency dictionary, sorts the word label of the word frequency dictionary in accordance with the order of higher frequency, and outputs sorted words to the word dictionary.


Journal of the Acoustical Society of America | 1995

Speaker adapted speech recognition system

Toru Sanada; Shinta Kimura

A speaker adapted speech recognition system achieving a high recognition rate for an unknown speaker, comprises a plurality of acoustic templates of speakers for managing correspondence between an acoustic feature of the speech and a content of the speech, a converting portion for converting the acoustic feature of the speech managed by the acoustic templates according to a set parameter, a learning portion for learning the parameter, at which the acoustic feature of the acoustic template as converted by the converting portion is approximately coincidence with the acoustic feature of a corresponding speech input for learning when the speech input for learning is provided, a selection portion for selecting one or more of the acoustic templates having the closest acoustic features to that of a speech input for selection; the acoustic features of which are converted by the converting portion by comparing the corresponding acoustic feature of the speech input for selection with the corresponding acoustic features converted by the converting portion when the speech input for selection is provided, and an acoustic template for the unknown speaker is created by converting the acoustic features of the acoustic templates of the speakers that are selected by the selection portion, by the converter, for recognize the content of the speech input of the unknown speaker by using the created acoustic template of the speaker.


Journal of the Acoustical Society of America | 2000

Apparatus and method for changing reproduction speed of speech sound and recording medium

Hideki Kojima; Shinta Kimura

A reproduction speed of speech sound changing apparatus which reproduces speech data at a speed in which essential part thereof can be caught so that the outline of the speech sound can be grasped even when changing the reproduction speed, besides remarkably reduces the whole reproducing time wherein a reproducing speed in each predetermined period is calculated according to a parameter value in every predetermined period of speech data in accordance with such a manner that a part having a high parameter value such as high power, high pitch or the like of speech data is judged to be the part, where important contents are involved, and such part of important contents is reproduced at such a speed that the contents can be caught, while the parts other than that described above are reproduced either at such a speed that the whole reproduction of speech data can be completed within a required time, or reproduced by skipping over the parts if at thus determined reproduction speed, reproduced speech sound cannot be caught, as a result of paying attention to such fact that voice is louder or pitch of voice becomes higher in the part containing important contents in speech data.


international conference on acoustics, speech, and signal processing | 1990

100000-word recognition using acoustic-segment networks

Shinta Kimura

Speech recognition for a vocabulary of 100000 words is described. Acoustic-segment networks are used as word templates in recognition. The acoustic-segment networks are automatically generated from orthographic strings of the words using rules that account for several kinds of variations in speech. To reduce the amount of computation in recognition, a tree representation of the networks and a preselection method based on input-frame sampling are used. It is confirmed that 98.75% of the computation can be eliminated without a significant increase of error, when using the preselection which outputs 500 candidates for main matching. Top-20 recognition accuracy is 93.5% for 10000 test utterances of five males and five females.<<ETX>>


international conference on acoustics, speech, and signal processing | 1987

Extraction of phonemic variation rules in continuous speech spoken by multiple speakers

Shinta Kimura; Y. Nara

This paper describes an interactive extraction of phonemic variation rules in continuous speech spoken by multiple speakers. To realize a continuous speech recognizer, we must first develop a highly accurate phoneme recognizer. The major problem related to phoneme recognizers is the phonemic variations in continuous speech. Our work focuses on the interactive analysis of phonemic variations in continuous speech and the extraction of the phonemic variation rules for many speakers. We extracted 317 rules related to 21 kinds of phonemic variation phenomena from 10,000 Japanese-language phrases spoken by 10 male speakers. With these rules, 97.6% of 36,000 Japanese-language phrases spoken by 36 test speakers (30 males and 6 females) were correctly segmented by our top-down phoneme segmentation system. Furthermore, a subset of the rules for each speaker was automatically obtained. On average, each subset contains 53.2% of the rules.


international conference on acoustics, speech, and signal processing | 1986

Interactive extraction of phonemic variation rules in continuous speech

Shinta Kimura; Y. Nara

An approach to continuous speech recognition is introduced, and an extraction method for phonemic variation rules and the extracted rules are reported. To realize a continuous speech recognizer, we must solve the problem of phonemic variations in continuous speech. We use a top-down method. Our current effort is focused on interactive analysis of phonemic variations in continuous speech and extraction of the phonemic variation rules, to construct a data base of these rules. We analyzed phonemic variations in 1000 Japanese-language phrases spoken by a male speaker, and confirmed that all the phonemic variations in the 1000 phrases can be represented by about 160 rules. In addition, we obtained the occurrence probability of each phonemic variation from the frequency of use of each rule.


international conference on acoustics, speech, and signal processing | 1982

Large-vocabulary spoken word recognition using simplified time-warping patterns

Yasuhiro Nara; K. Iwata; Yuji Kijima; Atsuhito Kobayashi; Shinta Kimura; S. Sasaki; J. Tanahashi

We propose a new matching algorithm for large vocabulary spoken word recognition, which gives a recognition score compatible to that of the traditional DP matching algorithm, but requires less than 1/10 as much calculation. By a computer simulation of 1,000 categories in speaker dependent recognition of speech samples uttered by five male adult speakers, an average recognition score of 95.8% was obtained. We have constructed a real-time speaker dependent speech recognizer using our algorithm. We are now examining the application of this recognizer to Japanese text input.


international conference on acoustics, speech, and signal processing | 1989

Extraction and evaluation of phonetic-acoustic rules for continuous speech recognition

Shinta Kimura; H. Iwamida; T. Sanada

The automatic extraction and evaluation of phonetic-acoustic rules for continuous speech recognition are described. The rules account for phonemic variations, segment spectra, and segment durations in continuous speech. The authors previously reported that the phonemic variations in 10000 words spoken by ten males can be represented using 317 phonemic variation rules, and they reported the evaluation results of the rules on top-down phoneme segmentation. The authors introduce the automatic extraction of rules for segment spectra and segment duration and report the evaluation results for the phonetic-acoustic rules used for recognizing a 1000-word vocabulary.<<ETX>>


Journal of the Acoustical Society of America | 1988

Discrimination of stop consonants using a data‐driven analysis

Hitoshi Iwamida; Shinta Kimura

The characteristics of stop consonants are time‐varying. In traditional running spectrum analysis, the frames are not always synchronized with the events of speech. In this presentation, a new data‐driven analysis method is proposed in which the frames are synchronized with the events to extract the features of the stop consonants accurately. In this method, a feature vector is extracted from four or five frames that are equally spaced in a consonant segment. A voiced stop segment is defined as the segment between the release and the point where power exceeds a threshold. A voiceless stop segment is defined as the segment between the release and the voice onset. In these discrimination experiments, 94.0% of the voiced and 96.3% of the voiceless stops were correctly discriminated. The speech database used for these experiments was the Japanese monosyllables (/b,d,g,p,t,k/ + /a,i,u,e,o/) uttered by 20 speakers. It was confirmed that analysis synchronizing with consonant segments is effective for stop conson...


Journal of the Acoustical Society of America | 2006

Development project for screen reader interface dynamically hastens speech while giving emphasized information to the tactile sense

Tohru Ifukube; Shinta Kimura

In present screen readers for blind people, users are still required to continue listening to synthesized voices and also they might easily miss important points of a document. A development project of a new screen reader interface has been promoted in cooperation with Japan’s Ministry (METI). The interface named TAJODA can dynamically control the speech rate while giving rich texts to the tactile sense. The speech rate can be changed according to word units by clicking a button of the interface using a thumb. From evaluation tests using sightless people, the maximum speech rate was determined at around 2000 morae/minute. Seven vibro‐tactile patterns were selected for presenting rich texts onto an index finger using a tactile matrix display (2×8). Both speech and tactile information are synchronized automatically through a USB interface. It was also ascertained that users can read some documents two to three times faster using the TAJODA interface [Asakwa et al., IEICE Trans. E87‐D(6) (2004)].

Collaboration


Dive into the Shinta Kimura's collaboration.

Researchain Logo
Decentralizing Knowledge