Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Fumitada Itakura is active.

Publication


Featured researches published by Fumitada Itakura.


IEICE Transactions on Information and Systems | 2006

Driver Identification Using Driving Behavior Signals

Toshihiro Wakita; Koji Ozawa; Chiyomi Miyajima; Kei Igarashi; Katunobu Itou; Kazuya Takeda; Fumitada Itakura

In this paper, we propose a driver identification method that is based on the driving behavior signals that are observed while the driver is following another vehicle. Driving behavior signals, such as the use of the accelerator pedal, brake pedal, vehicle velocity, and distance from the vehicle in front, are measured using a driving simulator. We compared the identification rate obtained using different identification models and different features. As a result, we found the nonparametric models is better than the parametric models. Also, the drivers operation signals were found to be better than road environment signals and car behavior signals. The identification rate for thirty driver using actual vehicle driving in a city area was 73%.


IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 2005

Adaptive Nonlinear Regression Using Multiple Distributed Microphones for In-Car Speech Recognition

Weifeng Li; Chiyomi Miyajima; Takanori Nishino; Katsunobu Itou; Kazuya Takeda; Fumitada Itakura

In this paper, we address issues in improving hands-free speech recognition performance in different car environments using multiple spatially distributed microphones. In the previous work, we proposed the multiple linear regression of the log spectra (MRLS) for estimating the log spectra of speech at a close-talking microphone. In this paper, the concept is extended to nonlinear regressions. Regressions in the cepstrum domain are also investigated. An effective algorithm is developed to adapt the regression weights automatically to different noise environments. Compared to the nearest distant microphone and adaptive beamformer (Generalized Sidelobe Canceller), the proposed adaptive nonlinear regression approach shows an advantage in the average relative word error rate (WER) reductions of 58.5% and 10.3%, respectively, for isolated word recognition under 15 real car environments.


international conference on acoustics, speech, and signal processing | 2010

High quality voice manipulation method based on the vocal tract area function obtained from sub-band LSP of straight spectrum

Ayanori Arakawa; Yoshinori Uchimura; Hideki Banno; Fumitada Itakura; Hideki Kawahara

This paper describes a high-quality manipulation method of voice quality base on the vocal tract area function (VTAF) obtained from sub-band LSP of STRAIGHT spectrum. Our research group had developed the manipulation technique of voice quality based on VTAF that can generate natural formant transition. However, it is observed that the generated sound sometimes results in degradation when the input signal has a high sampling frequency. Therefore, we develop a new method that extracts VTAF properly from such input signal. This method firstly divides the input spectral envelope represented by STRAIGHT spectrum into lower and higher frequency bands, secondly extracts the Line spectrum pair (LSP) in each frequency band after spectral flattening that is appropriate for the frequency band, thirdly concatenates a pair of the sub-band LSP, and finally obtains VTAF from PARCOR coefficients converted from the concatenated LSP. A subjective experiment proved that the proposed method is high quality enough.


EURASIP Journal on Advances in Signal Processing | 2007

Robust in-car speech recognition based on nonlinear multiple regressions

Weifeng Li; Kazuya Takeda; Fumitada Itakura

We address issues for improving handsfree speech recognition performance in different car environments using a single distant microphone. In this paper, we propose a nonlinear multiple-regression-based enhancement method for in-car speech recognition. In order to develop a data-driven in-car recognition system, we develop an effective algorithm for adapting the regression parameters to different driving conditions. We also devise the model compensation scheme by synthesizing the training data using the optimal regression parameters and by selecting the optimal HMM for the test speech. Based on isolated word recognition experiments conducted in 15 real car environments, the proposed adaptive regression approach shows an advantage in average relative word error rate (WER) reductions of 52.5 and 14.8, compared to original noisy speech and ETSI advanced front end, respectively.


non linear speech processing | 2005

Maximum a posterior probability and cumulative distribution function equalization methods for speech spectral estimation with application in noise suppression filtering

Tran Huy Dat; Kazuya Takeda; Fumitada Itakura

In this work, we develop and compare noise suppression filtering systems based on maximum a posterior probability (MAP) and cumulative distribution function equalization (CDFE) estimation of speech spectrum. In these systems, we use a double-gamma modeling for both the speech and noise spectral components, in which the distributions are adapted to the actual parameters in each frequency bin. The performances of the proposed systems are tested using the Aurora database they are shown to be better than conventional systems derived from the MMSE method. Whereas the MAP-based method performed best in the SNR improvement, the CDFE-based system provides a lower musical noise level and shows a higher recognition rate.


Journal of New Music Research | 2004

Sho-So-In: Control of a Physical Model of the Sho by Means of Automatic Feature Extraction from Real Sounds

Takafumi Hikichi; Naotoshi Osaka; Fumitada Itakura

This paper proposes a synthesis framework for sound hybridization that creates sho-like sounds with articulations that are the same as that of a given input signal. This approach has three components: acoustic feature extraction, physical parameter estimation, and waveform synthesis. During acoustic feature extraction, the amplitude and fundamental frequency of the input signal are extracted, and in the parameter estimation stage these values are converted to control parameters for the physical model. Then, using these control parameters, a sound waveform is calculated during the synthesis stage. Based on the proposed method, a mapping function between acoustical parameters and physical parameters was determined using recorded sho sounds. Then, sounds with various articulations were synthesized using several kinds of instrumental tones. As a result, sounds with natural frequency and amplitude variations such as vibrato and portamento were created. The proposed method was used in music composition and proved to be effective.


international conference on data engineering | 2005

Improved Noise Spectra Estimation and Log-spectral Regression for In-car Speech Recognition

Weifeng Li; Katunobu Itou; Kazuya Takeda; Fumitada Itakura

In this paper, we present a two-stage noise spectra estimation approach. After the first-stage noise estimation using the improved minima controlled recursive averaging (IMCRA) method, the second-stage noise estimation is performed by employing a maximum a posteriori (MAP) noise amplitude estimator. We also develop a regression-based speech enhance system by approximating the clean speech with the estimated noise and original noisy speech. Evaluation experiments show that the proposed two-stage noise estimation method results in lower estimation error for all test noise types. Compared to original noisy speech, the proposed regression-based approach obtains an average relative word error rate (WER) reduction of 65% in our isolated word recognition experiments conducted in 12 real car environments.


IEICE Transactions on Information and Systems | 2008

Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation

Tran Huy Dat; Kazuya Takeda; Fumitada Itakura

We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.


IEICE Transactions on Information and Systems | 2006

Single-Channel Multiple Regression for In-Car Speech Enhancement

Weifeng Li; Katsunobu Itou; Kazuya Takeda; Fumitada Itakura

We address issues for improving hands-free speech enhancement and speech recognition performance in different car environments using a single distant microphone. This paper describes a new single-channel in-car speech enhancement method that estimates the log spectra of speech at a close-talking microphone based on the nonlinear regression of the log spectra of noisy signal captured by a distant microphone and the estimated noise. The proposed method provides significant overall quality improvements in our subjective evaluation on the regression-enhanced speech, and performed best in most objective measures. Based on our isolated word recognition experiments conducted under 15 real car environments, the proposed adaptive nonlinear regression approach shows an advantage in average relative word error rate (WER) reductions of 50.8% and 13.1%, respectively, compared to original noisy speech and ETSI advanced front-end (ETSI ES 202 050).


international conference on data engineering | 2005

A speech enhancement system based on data clustering and cumulative histogram equalization

Tran Huy Dat; Kazuya Takeda; Fumitada Itakura

We present a data driven noise suppression filtering system which combines the data clustering and the cumulative histogram equalization techniques.This method uses the SNRGMM index, which has been developed in our previous works, for clustering a speech data into sub-data with the same index. Furthermore,for each sub-data, the cumulative histogram equalization filtering is learned on each the subband log-spectral magnitude domain. The case, when a noisy speech data is not available, is also consdirered in this work. For that case the SNRGMM can be used for the very quick and flexible simulation of a noisy speech data and without any loss of quality in the final system. The experimental evaluation on the AURORA2 Japansese version shows the improvement of the proposed system in both SNR and ASR performances.

Collaboration


Dive into the Fumitada Itakura's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Weifeng Li

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge