Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yasunari Obuchi is active.

Publication


Featured researches published by Yasunari Obuchi.


ieee automatic speech recognition and understanding workshop | 2013

Elastic spectral distortion for low resource speech recognition with deep neural networks

Naoyuki Kanda; Ryu Takeda; Yasunari Obuchi

An acoustic model based on hidden Markov models with deep neural networks (DNN-HMM) has recently been proposed and achieved high recognition accuracy. In this paper, we investigated an elastic spectral distortion method to artificially augment training samples to help DNN-HMMs acquire enough robustness even when there are a limited number of training samples. We investigated three distortion methods - vocal tract length distortion, speech rate distortion, and frequency-axis random distortion - and evaluated those methods with Japanese lecture recordings. In a large vocabulary continuous speech recognition task with only 10 hours of training samples, a DNN-HMM trained with the elastic spectral distortion method achieved a 10.1% relative word error reduction compared with a normally trained DNN-HMM.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Optimized Speech Dereverberation From Probabilistic Perspective for Time Varying Acoustic Transfer Function

Masahito Togami; Yohei Kawaguchi; Ryu Takeda; Yasunari Obuchi; Nobuo Nukaga

A dereverberation technique has been developed that optimally combines multichannel inverse filtering (MIF), beamforming (BF), and non-linear reverberation suppression (NRS). It is robust against acoustic transfer function (ATF) fluctuations and creates less distortion than the NRS alone. The three components are optimally combined from a probabilistic perspective using a unified likelihood function incorporating two probabilistic models. A multichannel probabilistic source model based on a recently proposed local Gaussian model (LGM) provides robustness against ATF fluctuations of the early reflection. A probabilistic reverberant transfer function model (PRTFM) provides robustness against ATF fluctuations of the late reverberation. The MIF and multichannel under-determined source separation (MUSS) are optimized in an iterative manner. The MIF is designed to reduce the time-invariant part of the late reverberation by using optimal time-weighting with reference to the PRTFM and the LGM. The MUSS separates the dereverberated speech signal and the residual reverberation after the MIF, which can be interpreted as an optimized combination of the BF and the NRS. The parameters of the PRTFM and the LGM are optimized based on the MUSS output. Experimental results show that the proposed method is robust against the ATF fluctuations under both single and multiple source conditions.


international conference on acoustics speech and signal processing | 1998

Development of robust speech recognition middleware on microprocessor

Nobuo Hataoka; Hiroaki Kokubo; Yasunari Obuchi; Akio Amano

We have developed speech recognition middleware on a RISC microprocessor which has robust processing functions against environmental noise and speaker differences. The speech recognition middleware enables developers and users to use a speech recognition process for many possible speech applications, such as car navigation systems and handheld PCs. We report implementation issues of speech recognition process in middleware of microprocessors and propose robust noise handling functions using ANC (adaptive noise cancellation) and noise adaptive models. We also propose a new speaker adaptation algorithm in which the relationships among HMMs (hidden Markov models) transfer vectors are provided as a set of pre-trained interpolation coefficients. Experimental evaluations on 1000-word vocabulary speech recognition showed promising results for both robust processing functions of the proposed noise handling methods and the proposed speaker adaptation method.


multimedia signal processing | 2008

Open-vocabulary keyword detection from super-large scale speech database

Naoyuki Kanda; Hirohiko Sagawa; Takashi Sumiyoshi; Yasunari Obuchi

This paper presents our recent attempt to make a super-large scale spoken-term detection system, which can detect any keyword uttered in a 2,000-hour speech database within a few seconds. There are three problems to achieve such a system. The system must be able to detect out-of-vocabulary (OOV) terms (OOV problem). The system has to respond to the user quickly without sacrificing search accuracy (search speed and accuracy problem). The pre-stored index database should be sufficiently small (index size problem). We introduced a phoneme-based search method to detect the OOV terms, and combined it with the LVCSR-based method. To search for a keyword from large-scale speech databases accurately and quickly, we introduced a multistage rescoring strategy which uses several search methods to reduce the search space in a stepwise fashion. Furthermore, we constructed an out-of-vocabulary/in-vocabulary region classifier, which allows us to reduce the size of the index database for OOVs. We describe the prototype system and present some evaluation results.


international conference on acoustics, speech, and signal processing | 2005

Language identification using phonetic and prosodic HMMs with feature normalization

Yasunari Obuchi; Nobuo Sato

Phonetics and prosody are two important factors in automatic language identification. Prosodic HMMs enable language identification systems to use prosodic information in a similar manner to phonetic HMMs. The paper describes how to create prosodic HMMs and implement them in language identification systems. Linear discriminant analysis of the likelihood and N-gram scores of prosodic segment recognition realizes fast and reliable language identification. Moreover, combining prosodic HMMs with phonetic HMMs improves system performance. In this framework, feature normalization techniques that were originally developed for robust speech recognition can be applied to phonetic and prosodic features. Language identification accuracy increases using these techniques in clean and noisy environments.


ieee automatic speech recognition and understanding workshop | 2005

Mixture weight optimization for dual-microphone MFCC combination

Yasunari Obuchi

Feature combination in the MFCC domain can improve speech recognition accuracy of dual-microphone systems, if two microphones have different characteristics. If we can take advantage of the recognition hypotheses given by single channel decoding, cross-domain feature combination between the MFCC domain and the hypothesis domain provides better results than simple weighted average of MFCCs. However, it is problematic that the optimal mixture weight is unknown. In this paper, we propose to use the channel selection algorithm for mixture weight optimization, regarding various parameter values as separate channels. Evaluation experiments show that the recognition performance can be greatly improved when we use hypothesis-based feature combination and decoder-based channel selection


Archive | 2005

Robust Dialog Management Architecture Using VoiceXML for Car Telematics Systems

Yasunari Obuchi; Eric Nyberg; Teruko Mitamura; Scott Judy; Michael Duggan; Nobuo Hataoka

This chapter describes a dialog management architecture for car telematics systems. The system supports spontaneous user utterances and variable communication conditions between the in-car client and the remote server. The communication is based on VoiceXML over HTTP, and the design of the server-side application is based on DialogXML and ScenarioXML, which are layered extensions of VoiceXML. These extensions provide support for state-and-transition dialog programming, access to dynamic external databases, and sharing of commonly-used dialogs via templates. The client system includes a set of small grammars and lexicons for various tasks; only relevant grammars and lexicons are activated under the control of the dialog manager. The serverside applications are integrated via an abstract interface, and the client system may include compact versions of the same applications. The VoiceXML interpreter can switch between applications on both sides intelligently. This helps to reduce bandwidth utilization, and allows the system to continue even if the communication channel is lost.


international conference on acoustics, speech, and signal processing | 2012

Multichannel speech dereverberation and separation with optimized combination of linear and non-linear filtering

Masahito Togami; Yohei Kawaguchi; Ryu Takeda; Yasunari Obuchi; Nobuo Nukaga

In this paper, we propose a multichannel speech dereverberation and separation technique which is effective even when there are multiple speakers and each speakers transfer function is time-varying due to fluctuation of the corresponding speakers head. For robustness against fluctuation, the proposed method optimizes linear filtering with non-linear filtering simultaneously from probabilistic perspective based on a probabilistic reverberant transfer-function model, PRTFM. PRTFM is an extension of the conventional time-invariant transfer-function model under uncertain conditions, and PRTFM can be also regarded as an extension of recently proposed blind local Gaussian modeling. The linear filtering and the non-linear filtering are optimized in MMSE (Minimum Mean Square Error) sense during parameter optimization. The proposed method is evaluated in a reverberant meeting room, and the proposed method is shown to be effective.


multimedia signal processing | 2002

Compact and robust speech recognition for embedded use on microprocessors

Nobuo Hataoka; Hiroaki Kokubo; Yasunari Obuchi; Akio Amano

We propose a compact and noise robust embedded speech recognition system implemented on microprocessors aiming for sophisticated HMIs (human machine interfaces) of car information systems. The compactness is essential for embedded systems because there are strict restrictions of CPU (central processing unit) power and available memory capacities. In this paper, first we report noise robust acoustic HMMs (hidden Markov models) and a compact spectral subtraction (SS) method after exhausting evaluation stages using real speech data recorded at car running environments. Next, we propose very novel memory assignment of acoustic models based on the product codes or sub-vector quantization technique resulting on 1 fourth memory reduction for the 2000-word vocabulary.


international conference on acoustics, speech, and signal processing | 2009

DOA estimation method based on sparseness of speech sources for human symbiotic robots

Masahito Togami; Akio Amano; Takashi Sumiyoshi; Yasunari Obuchi

In this paper, direction of arrival (DOA) estimation methods (both azimuth and elevation) based on sparseness of human speech, “modified delay-and-sum beamformer based on sparseness (MDSBF)” and “stepwise phase difference restoration (SPIRE)”, are introduced for human symbiotic robots. MDSBF can achieve good DOA estimation, whose computational cost is proportional to resolution of azimuth and elevation space. DOA estimation result of SPIRE is less accurate than that of MDSBF, but computational cost is independent of resolution. To achieve more accurate DOA estimation result than SPIRE with small computational cost, we propose a novel DOA estimation method which is combination of MDSBF and SPIRE. In the proposed method, MDSBF with rough resolution is performed prior to SPIRE execution, and SPIRE precisely estimates DOA of sources after MDSBF. Experimental results show that sparseness based methods are superior to conventional methods. The proposed combination method achieved more accurate DOA estimation result than SPIRE with smaller computational cost than MDSBF.

Collaboration


Dive into the Yasunari Obuchi's collaboration.

Researchain Logo
Decentralizing Knowledge