Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Masahito Togami is active.

Publication


Featured researches published by Masahito Togami.


intelligent robots and systems | 2006

Basic Design of Human-Symbiotic Robot EMIEW

Yuji Hosoda; Saku Egawa; Junichi Tamamoto; Kenjiro Yamamoto; Ryousuke Nakamura; Masahito Togami

We are developing a robot that supports people in their daily lives: a human-symbiotic robot. Such robot must share space with its users, be user-friendly, and be able to assist its users. We have developed a prototype autonomous mobile robot that makes use of a self-balancing two-wheeled mobile system and a body swing mechanism to shift its center of gravity. This allows it to move nimbly at up to 6 km per hour. It also has capabilities to avoid collisions with obstacles for moving safely through complex environments. Distant-speech-recognition and high-quality speech-synthesis technologies enable it to communicate with people naturally (i.e., without special tools). These capabilities were demonstrated at the 2005 World Exposition in Aichi, Japan


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Optimized Speech Dereverberation From Probabilistic Perspective for Time Varying Acoustic Transfer Function

Masahito Togami; Yohei Kawaguchi; Ryu Takeda; Yasunari Obuchi; Nobuo Nukaga

A dereverberation technique has been developed that optimally combines multichannel inverse filtering (MIF), beamforming (BF), and non-linear reverberation suppression (NRS). It is robust against acoustic transfer function (ATF) fluctuations and creates less distortion than the NRS alone. The three components are optimally combined from a probabilistic perspective using a unified likelihood function incorporating two probabilistic models. A multichannel probabilistic source model based on a recently proposed local Gaussian model (LGM) provides robustness against ATF fluctuations of the early reflection. A probabilistic reverberant transfer function model (PRTFM) provides robustness against ATF fluctuations of the late reverberation. The MIF and multichannel under-determined source separation (MUSS) are optimized in an iterative manner. The MIF is designed to reduce the time-invariant part of the late reverberation by using optimal time-weighting with reference to the PRTFM and the LGM. The MUSS separates the dereverberated speech signal and the residual reverberation after the MIF, which can be interpreted as an optimized combination of the BF and the NRS. The parameters of the PRTFM and the LGM are optimized based on the MUSS output. Experimental results show that the proposed method is robust against the ATF fluctuations under both single and multiple source conditions.


ieee sensors | 2010

Use of water cluster detector for preventing drunk and drowsy driving

Minoru Sakairi; Masahito Togami

Implementing safety measures to prevent drunk and drowsy driving is a major technical challenge for the car industry. We have developed a system that involves a non-contact breath sensor to do this. The breath sensor detects breath by measuring electric currents of positively or negatively charged water clusters in breath that are separated by using an electric field. Our device couples a breath sensor with an alcohol sensor, and simultaneously detects the electrical signals of both breath and alcohol in the breath. This ensures that the sample is from a persons breath, not an artificial source. Furthermore, our breath sensor can detect breath from about 50 cm, and can also test the level of alertness of a subject sitting in the drivers seat. This is done by measuring the point of time the breathing changes from conscious, such as in pursed-lip breathing, to unconscious when the driver becomes drowsy. This is the first time that one device has been used to detect both drunk and drowsy driving.


international conference on acoustics, speech, and signal processing | 2013

Noise robust speech dereverberation with Kalman smoother

Masahito Togami; Yohei Kawaguchi

A speech dereverberation method is proposed that is robust against background noise. In contrast to conventional methods based on the linear prediction of the given microphone input signal, in which the linear prediction coefficients are not fully optimized when there is background noise, the proposed method optimizes the coefficients by linear prediction of the noiseless reverberant speech signal even when there is background noise. The noiseless reverberant speech signal and the parameters are iteratively updated on the basis of the expectation maximization algorithm. In the expectation step, sufficient statistics of latent variables which include noiseless reverberant speech signal are estimated using the Kalman smoother. Unlike the standard Kalman smoother, which uses a time-invariant covariance matrix as a state-transition covariance matrix, the proposed method utilizes a time-varying covariance matrix, enabling it to meet the time-varying speech characteristics. The parameters are updated so that the Q function is increased in the maximization step. Experimental results show that the proposed method is superior to conventional methods under noisy conditions.


international conference on acoustics, speech, and signal processing | 2011

Online speech source separation based on maximum likelihood of local Gaussian modeling

Masahito Togami

We propose an online speech source separation method which can separate sources under underdetemined conditions. The proposed method is based on local Gaussian modeling (LGM). At first, we derive an extended approach of conventional offline speech source separation methods based on LGM, which can separate speech sources in an online manner. The likelihood function of the online LGM based approach (OLGM) is approximately maximized by incremental EM based approach. Additionally, we propose an initialization method of OLGM based on a least squares approach to improve convergence time. Experimental results show that the proposed method can separate sources effectively even when the number of iterations is small.


ieee automatic speech recognition and understanding workshop | 2015

Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection

Yusuke Fujita; Ryoichi Takashima; Takeshi Homma; Rintaro Ikeshita; Yohei Kawaguchi; Takashi Sumiyoshi; Takashi Endo; Masahito Togami

In this paper, we propose a unified system that incorporates speech source separation and automatic speech recognition for various noise environments. There are three features in the proposed system. The first feature of the proposed method is the LGM (local Gaussian modeling) based source separation with the efficient permutation alignment method that integrates a power spectrum correlation based method and a direction-of-arrival (DOA) based method. Evaluation results show that using the separated speech with the baseline acoustic modeling method reduces the word error rate (WER) significantly. The second feature of the proposed method is multi-condition training with per-utterance normalized features and noise-aware features in the acoustic modeling step. In this paper, we show that the proposed training method is effective even when an input signal has been distorted through the source separation step. The third feature is the word hypothesis selection method for integrating multiple recognition results. The proposed selection method estimates correct words based on a recognizers confidence and co-occurrence characteristics. The evaluation results show that the proposed selection method outperforms the conventional recognizer output voting error reduction (ROVER) method. The proposed system is evaluated using the third CHiME challenge dataset. Evaluation results show that the proposed system resulted in an improvement of 66.1% over the baseline system.


international conference on acoustics, speech, and signal processing | 2012

Multichannel speech dereverberation and separation with optimized combination of linear and non-linear filtering

Masahito Togami; Yohei Kawaguchi; Ryu Takeda; Yasunari Obuchi; Nobuo Nukaga

In this paper, we propose a multichannel speech dereverberation and separation technique which is effective even when there are multiple speakers and each speakers transfer function is time-varying due to fluctuation of the corresponding speakers head. For robustness against fluctuation, the proposed method optimizes linear filtering with non-linear filtering simultaneously from probabilistic perspective based on a probabilistic reverberant transfer-function model, PRTFM. PRTFM is an extension of the conventional time-invariant transfer-function model under uncertain conditions, and PRTFM can be also regarded as an extension of recently proposed blind local Gaussian modeling. The linear filtering and the non-linear filtering are optimized in MMSE (Minimum Mean Square Error) sense during parameter optimization. The proposed method is evaluated in a reverberant meeting room, and the proposed method is shown to be effective.


international conference on acoustics, speech, and signal processing | 2010

Head orientation estimation of a speaker by utilizing kurtosis of a DOA histogram with restoration of distance effect

Masahito Togami; Yohei Kawaguchi

In this paper, we propose a head-orientation estimation method from multichannel acoustic signals. Sharpness of a DOA histogram which is extracted by using the sparseness based DOA estimation method varies depending on the head orientation of a speaker. The proposed method utilizes this phenomenon to estimate the head orientation of the speaker. The proposed method uses more than two microphone arrays. In addition to estimation of the speaker location, the proposed method estimates kurtosis of the DOA histogram of each array. Kurtosis is regarded as a measure of sharpness of a DOA histogram in the proposed method. However, kurtosis also depends on the distance between the speaker and the microphone array (distance effect). The distance effect is experimentally revealed by the regression analysis. The head orientation of a speaker is estimated by the restored kurtosis which is free from the distance effect. Experimental results on a reverberant environment show that the proposed method can estimate the head orientation of a speaker more accurately than a conventional head-orientation estimation method.


international conference on acoustics, speech, and signal processing | 2009

DOA estimation method based on sparseness of speech sources for human symbiotic robots

Masahito Togami; Akio Amano; Takashi Sumiyoshi; Yasunari Obuchi

In this paper, direction of arrival (DOA) estimation methods (both azimuth and elevation) based on sparseness of human speech, “modified delay-and-sum beamformer based on sparseness (MDSBF)” and “stepwise phase difference restoration (SPIRE)”, are introduced for human symbiotic robots. MDSBF can achieve good DOA estimation, whose computational cost is proportional to resolution of azimuth and elevation space. DOA estimation result of SPIRE is less accurate than that of MDSBF, but computational cost is independent of resolution. To achieve more accurate DOA estimation result than SPIRE with small computational cost, we propose a novel DOA estimation method which is combination of MDSBF and SPIRE. In the proposed method, MDSBF with rough resolution is performed prior to SPIRE execution, and SPIRE precisely estimates DOA of sources after MDSBF. Experimental results show that sparseness based methods are superior to conventional methods. The proposed combination method achieved more accurate DOA estimation result than SPIRE with smaller computational cost than MDSBF.


IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 2008

Stepwise Phase Difference Restoration Method for DOA Estimation of Multiple Sources

Masahito Togami; Yasunari Obuchi

We propose a new methodology of DOA (direction of arrival) estimation named SPIRE (Stepwise Phase dIfference REstoration) that is able to estimate sound source directions even if there is more than one source in a reverberant environment. DOA estimation in reverberant environments is difficult because the variance of the direction of an estimated sound source increases in reverberant environments. Therefore, we want the distance between microphones to be long. However, because of the spatial aliasing problem, the distance cannot be longer than half the wavelength of the maximum frequency of a source. DOA estimation performance of SPIRE is not limited by the spatial aliasing problem. The major feature of SPIRE is restoration of the phase difference of a microphone pair (M1) by using the phase difference of another microphone pair (M2) under the condition that the distance between the M1 microphones is longer than the distance between the M2 microphones. This restoration process enables the reduction of the variance of an estimated sound source direction and can alleviates the spatial aliasing problem that occurs with the M1 phase difference using direction estimation of the M2 microphones. The experimental results in a reverberant environment (reverberation time = about 300 ms) indicate that even when there are multiple sources, the proposed method can estimate the source direction more accurately than conventional methods. In addition, DOA estimation performance of SPIRE with the array length 0.2 m is shown to be almost equivalent to that of GCC-PHAT with the array length 0.5 m. SPIRE can executes DOA estimation with a smaller microphone array than GCC-PHAT. From the viewpoint of the hardware size and coherence problem, the array length is required to be as small as possible. This feature of SPIRE is preferable.

Collaboration


Dive into the Masahito Togami's collaboration.

Researchain Logo
Decentralizing Knowledge