Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tomohiro Nakatani is active.

Publication


Featured researches published by Tomohiro Nakatani.


workshop on applications of signal processing to audio and acoustics | 2013

The reverb challenge: Acommon evaluation framework for dereverberation and recognition of reverberant speech

Keisuke Kinoshita; Marc Delcroix; Takuya Yoshioka; Tomohiro Nakatani; Armin Sehr; Walter Kellermann; Roland Maas

Recently, substantial progress has been made in the field of reverberant speech signal processing, including both single- and multichannel dereverberation techniques, and automatic speech recognition (ASR) techniques robust to reverberation. To evaluate state-of-the-art algorithms and obtain new insights regarding potential future research directions, we propose a common evaluation framework including datasets, tasks, and evaluation metrics for both speech enhancement and ASR techniques. The proposed framework will be used as a common basis for the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. This paper describes the rationale behind the challenge, and provides a detailed description of the evaluation framework and benchmark results.


Archive | 2005

Single-Microphone Blind Dereverberation

Tomohiro Nakatani; Masato Miyoshi; Keisuke Kinoshita

Although a number of dereverberation methods have been studied, dereverberation is still a challenging problem especially when using a single microphone. An important aspect of single-channel speech enhancement is the characteristic feature of speech signals that allows us to restore the quality of source signals. In this chapter, we describe a single-channel speech dereverberation method based on the harmonicity of speech signals. We show that a filter that enhances the harmonic structure of reverberant speech signals approximates the inverse filter of the reverberation process, thus enabling us to achieve high quality blind dereverberation. The presented method is referred to as the harmonicity based dereverberation method, HERB. Simulation experiments show that HERB can work effectively to dereverberate speech signals in terms of energy decay curves of room impulse responses and automatic speech recognition performance even when the reverberation time is as long as 1.0 sec provided a sufficiently large number of observed signals are available. Further discussions on several future directions are also provided with a view to extending HERB so that it can cope with more realistic situations.


2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011

A microphone array system integrating beamforming, feature enhancement, and spectral mask-based noise estimation

Takuya Yoshioka; Tomohiro Nakatani

This paper proposes a microphone array system that integrates beamforming, feature enhancement, and highly accurate noise feature model estimation based on spectral masking. Previously proposed methods for combining beamformers and single-channel post-filters estimate noise power spectra or noise features based only on spatial information acquired from multiple microphones. These methods suffer from low noise estimation accuracy when the available microphones are limited or when there are array calibration or steering vector estimation errors. By contrast, the proposed method estimates a noise feature model accurately in a highly adaptive way by capitalizing on both spatial information and the characteristics of speech. Specifically, the method leverages an inter-microphone phase difference model, a clean feature model, and a harmonicitybased spectral mask model for the accurate estimation of spectral masks, each of which indicates the presence or absence of speech at a particular frequency bin. The estimated spectral masks are used to obtain the time-varying noise feature model. Results of a digit recognition experiment prove that the proposed system significantly outperforms an existing microphone array system combining a beamformer and a post-filter.


Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014 4th Joint Workshop on | 2014

Spectrogram patch based acoustic event detection and classification in speech overlapping conditions

Miquel Espi; Masakiyo Fujimoto; Yotaro Kubo; Tomohiro Nakatani

Speech does not always contain all the information needed to understand a conversation scene. Non-speech events can reveal aspects of the scene that speakers miss or neglect to mention, which could further support speech enhancement and recognition systems with information about the surrounding noise. This paper focuses on the task of detecting and classifying acoustic events in a conversation scene where these often overlap with speech. State-of-the-art techniques are based on derived features (e.g. MFCC, or Mel-filter banks), which have successfully parameterized speech spectrograms, but that reduce both resolution and detail when we are targeting other kinds of events. In this paper, we propose a method that learns hidden features directly from spectrogram patches, and integrates them within the deep neural network framework to detect and classify acoustic events. The result is a model that performs feature extraction and classification simultaneously. Experiments confirm that the proposed method outperforms deep neural networks with derived features as well as related work on the CHIL2007-AED task, showing that there is room for further improvement.


2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011

Discriminative approach to dynamic variance adaptation for noisy speech recognition

Marc Delcroix; Shinji Watanabe; Tomohiro Nakatani; Atsushi Nakamura

The performance of automatic speech recognition suffers from severe degradation in the presence of noise or reverberation. One conventional approach for handling such acoustic distortions is to use a speech enhancement technique prior to recognition. However, most speech enhancement techniques introduce artifacts that create a mismatch between the enhanced speech features and the acoustic model used for recognition, therefore limiting the improvement in recognition performance. Recently, there has been increased interest in methods capable of compensating for such a mismatch by accounting for the feature variance during decoding. In this paper, we propose to estimate the feature variance using an adaptation technique based on a discriminative criterion. In an experiment using the Aurora2 database, the proposed method could achieve significant digit error rate reduction compared with a spectral subtraction pre-processor, and using a discriminative criterion for adaptation provided further improvement compared with maximum likelihood estimation.


Archive | 2010

Inverse Filtering for Speech Dereverberation Without the Use of Room Acoustics Information

Masato Miyoshi; Marc Delcroix; Keisuke Kinoshita; Takuya Yoshioka; Tomohiro Nakatani; Takafumi Hikichi

This chapter discusses multi-microphone inverse filtering, which does not use a priori information of room acoustics, such as room impulse responses between the target speaker and the microphones. One major problem as regards achieving this type of processing is the degradation of the recovered speech caused by excessive equalization of the speech characteristics. To overcome this problem, several approaches have been studied based on a multichannel linear prediction framework, since the framework may be able to perform speech dereverberation as well as noise attenuation. Here, we first discuss the relationship between optimal filtering and linear prediction. Then, we review our four approaches, which differ in terms of their treatment of the statistical properties of a speech signal.


Archive | 2010

Speech Dereverberation and Denoising Based on Time Varying Speech Model and Autoregressive Reverberation Model

Takuya Yoshioka; Tomohiro Nakatani; Keisuke Kinoshita; Masato Miyoshi

Speech dereverberation and denoising have been important problems for decades in the speech processing field. As regards to denoising, a model-based approach has been intensively studied and many practical methods have been developed. In contrast, research on dereverberation has been relatively limited. It is in very recent years that studies on a model-based approach to dereverberation have made rapid progress. This chapter reviews a model-based dereverberation method developed by the authors. This dereverberation method is effectively combined with a traditional denoising technique, specifically a multichannel Wiener filter. This combined method is derived by solving a dereverberation and denoising problem with a modelbased approach. The combined dereverberation and denoising method as well as the original dereverberation method are developed by using a multichannel autoregressive model of room acoustics and a time-varying power spectrum model of clean speech signals.


Archive | 2018

Recent Advances in Multichannel Source Separation and Denoising Based on Source Sparseness

Nobutaka Ito; Shoko Araki; Tomohiro Nakatani

This chapter deals with multichannel source separation and denoising based on sparseness of source signals in the time-frequency domain. In this approach, time-frequency masks are typically estimated based on clustering of source location features, such as time and level differences between microphones. In this chapter, we describe the approach and its recent advances. Especially, we introduce a recently proposed clustering method, observation vector clustering, which has attracted attention for its effectiveness. We introduce algorithms for observation vector clustering based on a complex Watson mixture model (cWMM), a complex Bingham mixture model (cBMM), and a complex Gaussian mixture model (cGMM). We show through experiments the effectiveness of observation vector clustering in source separation and denoising.


Archive | 2015

Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

Yasuaki Iwata; Tomohiro Nakatani; Takuya Yoshioka; Masakiyo Fujimoto; Hirofumi Saito

When speech signals are captured in real acoustical environments, the captured signals are distorted by certain types of interference, such as ambient noise, reverberation, and extraneous speakers’ utterances. There are two important approaches to speech enhancement that reduce such interference in the captured signals. One approach is based on the spatial features of the signals, such as direction of arrival and acoustic transfer functions, and enhances speech using multichannel audio signal processing. The other approach is based on speech spectral models that represent the probability density function of the speech spectra, and it enhances speech by distinguishing between speech and noise based on the spectral models. In this chapter, we propose a new approach that integrates the above two approaches. The proposed approach uses the spatial and spectral features of signals in a complementary manner to achieve reliable and accurate speech enhancement. The approach can be applied to various speech enhancement problems, including denoising, dereverberation, and blind source separation (BSS). In particular, in this chapter, we focus on applying the approach to BSS. We show experimentally that the proposed integration can improve the performance of BSS compared with a conventional approach.


Archive | 2003

IMPLEMENTATION AND EFFECTS OF SINGLE CHANNEL DEREVERBERATION BASED ON THE HARMONIC STRUCTURE OF SPEECH

Tomohiro Nakatani; Masato Miyoshi; Keisuke Kinoshita

Collaboration


Dive into the Tomohiro Nakatani's collaboration.

Top Co-Authors

Avatar

Keisuke Kinoshita

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shoko Araki

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Hiroshi Sawada

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Kentaro Ishizuka

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge