Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Takatoshi Jitsuhiro is active.

Publication


Featured researches published by Takatoshi Jitsuhiro.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

The ATR Multilingual Speech-to-Speech Translation System

Satoshi Nakamura; Konstantin Markov; Hiromi Nakaiwa; Genichiro Kikui; Hisashi Kawai; Takatoshi Jitsuhiro; Jin-Song Zhang; Hirofumi Yamamoto; Eiichiro Sumita; Seiichi Yamamoto

In this paper, we describe the ATR multilingual speech-to-speech translation (S2ST) system, which is mainly focused on translation between English and Asian languages (Japanese and Chinese). There are three main modules of our S2ST system: large-vocabulary continuous speech recognition, machine text-to-text (T2T) translation, and text-to-speech synthesis. All of them are multilingual and are designed using state-of-the-art technologies developed at ATR. A corpus-based statistical machine learning framework forms the basis of our system design. We use a parallel multilingual database consisting of over 600 000 sentences that cover a broad range of travel-related conversations. Recent evaluation of the overall system showed that speech-to-speech translation quality is high, being at the level of a person having a Test of English for International Communication (TOEIC) score of 750 out of the perfect score of 990.


international conference on acoustics speech and signal processing | 1998

Rejection of out-of-vocabulary words using phoneme confidence likelihood

Takatoshi Jitsuhiro; Satoshi Takahashi; Kiyoaki Aikawa

The rejection of unknown words is important in improving the performance of speech recognition. The anti-keyword model method can reject unknown words with high accuracy in a small vocabulary and specified task. Unfortunately, it is either inconvenient or impossible to apply if words in the vocabulary change frequently. We propose a new method for task independent rejection of unknown words, where a new phoneme confidence measure is used to verify partial utterances. It is used to verify each phoneme while locating candidates. Furthermore, the whole utterance is verified by a phonetic typewriter. This method can improve the accuracy of verification in each phoneme, and improve the speed of candidate search. Tests show that the proposed method improves the recognition rate by 4% compared to the conventional algorithm at equal error rates. Furthermore, a 3% improvement is obtained by training acoustic models with the MCE algorithm.


ieee automatic speech recognition and understanding workshop | 2005

Hands-free speech recognition and communication on PDAs using microphone array technology

W. Herbordt; T. Horiuchi; M. Fujimoto; Takatoshi Jitsuhiro; Satoshi Nakamura

In this paper, a personal digital assistant (PDA) for hands-free speech recognition and communication with a microphone array mounted on the PDA is presented. An outlier-robust generalized sidelobe canceller (RGSC) and a minimum mean-squared error (MMSE) estimator for log Mel-spectral energy coefficients using a Gaussian mixture model (GMM) for clean speech are implemented in real-time and evaluated for speech recognition based on a small experimental multichannel database. It is shown that the joint system of beamformer and single-channel noise suppression highly improves the noise-robustness of a large-vocabulary speech recognizer so that down to SNR = 5 dB more than 91% word accuracy is obtained


IEICE Transactions on Information and Systems | 2006

ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles

Shigeki Matsuda; Takatoshi Jitsuhiro; Konstantin Markov; Satoshi Nakamura

It is difficult to recognize speech distorted by various factors, especially when an ASR system contains only a single acoustic model. One solution is to use multiple acoustic models, one model for each different condition. In this paper, we discuss a parallel decoding-based ASR system that is robust to the noise type, SNR, speaker gender and speaking style. Our system consists of two recognition channels based on MFCC and Differential MFCC (DMFCC) features. Each channel has several acoustic models depending on SNR, speaker gender and speaking style, and each acoustic model is adapted by fast noise adaptation. From each channel, one hypothesis is selected based on its likelihood. The final recognition result is obtained by combining hypotheses from the two channels. We evaluate the performance of our system by normal and hyperarticulated test speech data contaminated by various types of noise at different SNR levels. Experiments demonstrate that the system could achieve recognition accuracy in excess of 80% for the normal speaking style data at a SNR of 0 dB. For hyper-articulated speech data, the recognition accuracy improved from about 10% to over 45% compared to a system without acoustic models for hyperarticulated speech.


international conference on acoustics, speech, and signal processing | 2012

Physical characteristics of vocal folds during speech under stress

Xiao Yao; Takatoshi Jitsuhiro; Chiyomi Miyajima; Norihide Kitaoka; Kazuya Takeda

We focus on variations in the glottal source of speech production, which is essential for understanding the generation of speech under psychological stress. In this paper, a two-mass vocal fold model is fitted to estimate the stiffness parameters of vocal folds during speech, and the stiffness parameters are then analyzed in order to classify recorded samples into neutral and stressed speech. Mechanisms of vocal folds under stress are derived from the experimental results. We propose using a Muscle Tension Ratio (MTR) to identify speech under stress. Our results show that MTR is more effective than a conventional method of stress measurement.


Eurasip Journal on Audio, Speech, and Music Processing | 2013

Classification of speech under stress based on modeling of the vocal folds and vocal tract

Xiao Yao; Takatoshi Jitsuhiro; Chiyomi Miyajima; Norihide Kitaoka; Kazuya Takeda

In this study, we focus on the classification of neutral and stressed speech based on a physical model. In order to represent the characteristics of the vocal folds and vocal tract during the process of speech production and to explore the physical parameters involved, we propose a method using the two-mass model. As feature parameters, we focus on stiffness parameters of the vocal folds, vocal tract length, and cross-sectional areas of the vocal tract. The stiffness parameters and the area of the entrance to the vocal tract are extracted from the two-mass model after we fit the model to real data using our proposed algorithm. These parameters are related to the velocity of glottal airflow and acoustic interaction between the vocal folds and the vocal tract and can precisely represent features of speech under stress because they are affected by the speaker’s psychological state during speech production. In our experiments, the physical features generated using the proposed approach are compared with traditionally used features, and the results demonstrate a clear improvement of up to 10% to 15% in average stress classification performance, which shows that our proposed method is more effective than conventional methods.


international conference on acoustics, speech, and signal processing | 2004

Automatic generation of non-uniform HMM structures based on variational Bayesian approach

Takatoshi Jitsuhiro; Satoshi Nakamura

We propose using the variational Bayesian (VB) approach for automatically creating nonuniform, context-dependent HMM topologies in speech recognition. The maximum likelihood (ML) criterion is generally used to create HMM topologies. However, it has an over-fitting problem. Information criteria have been used to overcome this problem, but theoretically they cannot be applied to complicated models like HMM. Recently, to avoid these problems, the VB approach has been developed in the machine-learning field. We introduce the VB approach to the successive state splitting (SSS) algorithm, which can create both contextual and temporal variations for HMM. We define the prior and posterior probability densities and free energy with latent variables as split and stop criteria. Experimental results show that the proposed method can automatically create a more efficient model and obtain better performance, especially for vowels, than the original method.


ieee automatic speech recognition and understanding workshop | 2007

Robust speech recognition using noise suppression based on multiple composite models and multi-pass search

Takatoshi Jitsuhiro; Tomoji Toriyama; Kiyoshi Kogure

This paper presents robust speech recognition using a noise suppression method based on multi-model compositions and multi-pass search. In real environments, many kinds of noise signals exists, and input speech for speech recognition systems include them. Our task in the E-Nightingale project is speech recognition of voice memoranda spoken by nurses during actual work at hospitals. To obtain good recognized candidates, suppressing many kinds of noise signals at once to find target speech is important. First, before noise suppression, to find speech and noise label sequences, we introduce multi-pass search with acoustic models including many kinds of noise models and their compositions, their n-gram models, and their lexicon. Second, noise suppression based on models is performed using the multiple composite models selected by recognized label sequences with time alignments. We evaluated this approach using the E-Nightingale task, and the proposed method outperformed the conventional method.


international conference on acoustics, speech, and signal processing | 2013

Estimation of vocal tract parameters for the classification of speech under stress

Xiao Yao; Takatoshi Jitsuhiro; Chiyomi Miyajima; Norihide Kitaoka; Kazuya Takeda

In this work, we propose a method for the classification of speech under stress that is based on a physical model. Using this method, the characteristics of the vocal folds and the vocal tract are taken into consideration, based on the process of speech production. In addition to vocal fold parameters, we estimate parameters of the vocal tract representing cross-sectional areas and vocal tract length, by fitting a two-mass model to real speech. Results show that calculation of vocal tract length for each speaker can improve the accuracy of the estimation of other physical parameters. Analysis is performed under vowel-dependent and vowel-independent conditions, showing that the proposed physical features are effective for the classification of neutral and stressed speech.


ieee automatic speech recognition and understanding workshop | 2003

Variational Bayesian approach for automatic generation of HMM topologies

Takatoshi Jitsuhiro; Satoshi Nakamura

We propose a new method of automatically creating non-uniform, context-dependent HMM topologies by using the variational Bayesian (VB) approach. The maximum likelihood (ML) criterion is generally used to create HMM topologies. However, it has an overfitting problem. Information criteria have been used to overcome this problem, but, theoretically, they cannot be applied to complicated models like HMMs. Recently, to avoid these problems, a VB approach has been developed in the machine-learning field. The successive state splitting (SSS) algorithm is a method of creating contextual and temporal variations for HMMs. We introduce the VB approach to the SSS algorithm, and define the prior and posterior probability densities and free energy as split and stop criteria. Experimental results show that the proposed method can automatically create the proper model and obtain better performance, especially for vowels, than the original method.

Collaboration


Dive into the Takatoshi Jitsuhiro's collaboration.

Top Co-Authors

Avatar

Satoshi Nakamura

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shigeki Matsuda

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hirofumi Yamamoto

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kiyoshi Kogure

Kanazawa Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Tomoji Toriyama

Toyama Prefectural University

View shared research outputs
Researchain Logo
Decentralizing Knowledge