Tsuneo Nitta | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tsuneo Nitta is active.

Explore More

Publication

Featured researches published by Tsuneo Nitta.

international conference on acoustics speech and signal processing | 1999

Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA

Tsuneo Nitta

This paper describes an attempt to extract multiple topological structures, hidden in time-spectrum (TS) patterns, by using multiple mapping operators, and to incorporate the operators into the feature extractor of a speech recognition system. In the previous work, the author proposed a novel feature extraction method based on MAFP/KLT (MAFP: multiple acoustic-feature planes), in which 3/spl times/3 derivative filters were used for mapping operators, and showed that the method achieved significant improvement in preliminary experiments. In this paper, firstly, the mapping operators are directly extracted in the form of a 3/spl times/3 orthogonal basis from a speech database. Next, the operators are evaluated, together with 3/spl times/3 simplified operators modeled on the orthogonal basis. Finally, after comparing the experimental results, the author proposes an effective feature extraction method based on MAFP/LDA, in which a Sobel filter is used for mapping operators.

Journal of the Acoustical Society of America | 1995

Speech detection apparatus not affected by input energy or background noise levels

Hideki Satoh; Tsuneo Nitta

A speech detection apparatus capable of reliably detecting speech segments in audio signals regardless of the levels of input audio signals and background noises. In the apparatus, a parameter of input audio signals is calculated frame by frame, and then compared with a threshold in order to judge each input frame as one of a speech segment and a noise segment, while the parameters of the input frames judged as the noise segments are stored in the buffer and the threshold is updated according to the parameters stored in the buffer. The apparatus may utilize a transformed parameter obtained from the parameter, in which the difference between speech and noise is emphasized, and noise standard patterns are constructed from the parameters of the input frames pre-estimated as noise segments.

Journal of the Acoustical Society of America | 1997

Text-to-speech synthesis with controllable processing time and speech quality

Yoshiyuki Hara; Tsuneo Nitta

Synthesized speech is generated by a software-implemented system with a programmed central processing unit. Phonetic parameters are generated from a series of phonetic symbols of an input text to be converted into synthesized speech, and prosodic parameters are also generated from prosodic information of the input text. The activity ratio of the central processing unit is determined, and the order of phonetic parameters or the arrangement of a synthesis unit or filter for speech synthesis is determined depending on the determined activity ratio of the central processing unit. Synthesized speech sounds are generated and filtered based on the phonetic and prosodic parameters according to the determined order of phonetic parameters or the determined arrangement of the filter.

Journal of the Acoustical Society of America | 1994

Pattern recognition system and method using neural network

Tsuneo Nitta

An inner product computing unit computes inner products of an input pattern whose category is unknown, and orthogonalized dictionary sets of a plurality of reference patterns whose categories are known. A nonlinear converting unit nonlinearly converts the inner products in accordance with a positive-negative symmetrical nonlinear function. A neural network unit or a statistical discriminant function computing unit performs predetermined computations of the nonlinearly converted values on the basis of preset coefficients in units of categories using a neural network or a statistical discriminant function. A determining section compares values calculated in units of categories using the preset coefficients with each other to discriminate a category to which the input pattern belongs.

Journal of the Acoustical Society of America | 1998

Speech recognition system and method which permits a speaker's utterance to be recognized using a hidden markov model with subsequent calculation reduction

Tsuneo Nitta

A sound analyzer sound analyzes an input speech signal to obtain feature vectors. A matrix quantizer performs a matrix quantization process between the feature vectors obtained by the sound analyzer and a phonetic segment dictionary prepared in phonetic segment units to obtain a phonetic segment similarity sequence. A PS-phoneme integrating section integrates the phonetic segment similarity sequence into a phonemic feature vector. A HMM recognizer checks the phonemic feature vector using a HMM prepared in certain units, to thereby perform a recognition process.

Journal of the Acoustical Society of America | 1997

Speech recognition using continuous density hidden markov models and the orthogonalizing karhunen-loeve transformation

Tsuneo Nitta

A recognition system comprises a feature extractor for extracting a feature vector x from an input speech signal, and a recognizing section for defining continuous density Hidden Markov Models of predetermined categories k as transition network models each having parameters of transition probabilities p(k,i,j) that a state Si transits to a next state Sj and output probabilities g(k,s) that a feature vector x is output in transition from the state Si to one of the states Si and Sj, and recognizing the input signal on the basis of similarity between a sequence X of feature vectors extracted by the feature extractor and the continuous density HMMs. Particularly, the recognizing section includes a memory section for storing a set of orthogonal vectors φm (k,s) provided for the continuous density HMMs, and a modified CDHMM processor for obtaining each of the output probabilities g(k,s) for the continuous density HMMs in accordance with corresponding orthogonal vectors φm (k,s).

Journal of the Acoustical Society of America | 1992

Orthogonalized dictionary speech recognition apparatus and method thereof

Tsuneo Nitta

Speech pattern data representing speech of a plurality of speakers are stored in a pattern storage section in advance. Averaged pattern data obtained by averaging a plurality of speech pattern data of the first of the plurality of speakers are obtained. Data obtained by blurring and differentiating the averaged pattern data are stored in an orthogonalized dictionary as basic orthogonalized dictionary data of first and second axes, respectively. Blurred data and differentiated data obtained with respect to the second and subsequent of the plurality of speakers are selectively stored in the orthogonalized dictionary as additional dictionary data having new axes. Speech of the plurality of speakers is recognized by computing a similarity between the orthogonalized dictionary formed in this manner and input speech.

international conference on acoustics speech and signal processing | 1998

A novel feature-extraction for speech recognition based on multiple acoustic-feature planes

Tsuneo Nitta

This paper describes an attempt to incorporate the functions of the auditory nerve system into the feature extractor of speech recognition. The functions include four types of well-known responses to sound stimuli: the local peaks of the steady sound spectrum, ascending FM sound, descending FM sound, and sharply rising and falling sound. Each function is realized in the form of a three-level derivative operator and is applied to a time-spectrum (TS) pattern X(t,f) of the output of the BPF with 26-channels. The resultant acoustic cue of an input speech represented by multiple acoustic-feature planes (MAFP) is compressed by using the Karhuenen-Loeve transform (KLT), then classified. In the experiments performed on a Japanese E-set (12 consonantal parts of /Ci/) extracted from continuous speech, the MAFP significantly improved the error rate from 34.5% and 29.6% obtained by X(t,f) and X(t,f)+/spl Delta//sub t/X(t,f) to 17.0% for unknown speakers (dimension=64).

Journal of the Acoustical Society of America | 1981

Method and apparatus for measuring characteristics of a loudspeaker

Tsuneo Nitta; Masatoshi Tanaka

A loudspeaker and a microphone are arranged in a normal room, the loudspeaker being supplied with an impulse signal. A direct response sound from the loudspeaker and reflected sounds from wall surfaces in three directions of the normal room are converted into a response signal by the microphone. The response signal is A/D-converted, and then Fourier-transformed. The Fourier-transformed response signal is converted into a response signal with an absolute value, and then into a logarithmic response signal. The logarithmic response signal is filtered to eliminate signal components corresponding to the reflected sound. The filtered logarithmic response signal is A/D-converted, and supplied to a recorder.

IEEE Transactions on Signal Processing | 1992

A speaker-independent connected digit recognition system concatenating statistically discriminated words

Teruhiko Ukita; Etsuo Saito; Tsuneo Nitta; Sadakazu Watanabe

A recognition system for connected digits, which uses a statistical classifier to identify words in speaker-independent continuous speech, is described. The system uses the multiple similarity method, a statistical pattern recognition technique. For evaluating word strings, the system uses a scoring method that is independent of the number of words in the strings. It is derived from the a posteriori probability that a subinterval corresponds to a correct word position, giving a word similarity value. The system evaluates a word string using dynamic programming and a parallel search procedure. Experiments for the contextual effect of the training data set, for validation of the search algorithm, and for a large quantity of unspecified speakers including 40 males and 40 females were performed. For connected digits (unknown word lengths test), the string recognition rates were 90.1%-95.1% for two, three, or four connected digits, where the equivalent word (digit) rates were 97.4%-98.4%. >

Explore More