Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hideki Banno is active.

Publication


Featured researches published by Hideki Banno.


international conference on acoustics, speech, and signal processing | 2008

Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation

Hideki Kawahara; Masanori Morise; Toru Takahashi; Ryuichi Nisimura; Toshio Irino; Hideki Banno

A simple new method for estimating temporally stable power spectra is introduced to provide a unified basis for computing an interference-free spectrum, the fundamental frequency (F0), as well as aperiodicity estimation. F0 adaptive spectral smoothing and cepstral liftering based on consistent sampling theory are employed for interference-free spectral estimation. A perturbation spectrum, calculated from temporally stable power and interference-free spectra, provides the basis for both F0 and aperiodicity estimation. The proposed approach eliminates ad-hoc parameter tuning and the heavy demand on computational power, from which STRAIGHT has suffered in the past.


international conference on acoustics, speech, and signal processing | 2009

Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown

Hideki Kawahara; Ryuichi Nisimura; Toshio Irino; Masanori Morise; Toru Takahashi; Hideki Banno

A generalized framework of auditory morphing based on the speech analysis, modification and resynthesis system STRAIGHT is proposed that enables each morphing rate of representational aspects to be a function of time, including the temporal axis itself. Two types of algorithms were derived: an incremental algorithm for real-time manipulation of morphing rates and a batch processing algorithm for off-line post-production applications. By defining morphing in terms of the derivative of mapping functions in the logarithmic domain, breakdown of morphing resynthesis found in the previous formulation in the case of extrapolations was eliminated. A method to alleviate perceptual defects in extrapolation is also introduced.


international conference on acoustics speech and signal processing | 1998

Efficient representation of short-time phase based on group delay

Hideki Banno; Jinlin Lu; Satoshi Nakamura; Kiyohiro Shikano; Hideki Kawahara

An efficient representation of short-time phase characteristics of speech sounds is proposed, based on findings which suggest the perceptual importance of phase characteristics. Subjective tests indicated that the synthesized speech sounds by the proposed method are indistinguishable from the original speech sounds with a moderate data compression. The proposed representation uses lower-order coefficients of the inverse Fourier transform of the group delay of speech. It also alleviates the voiced/unvoiced decision, which is an indispensable part in conventional speech coding algorithms. These features make our method potentially very useful in many applications like speech morphing.


asia-pacific signal and information processing association annual summit and conference | 2013

Temporally variable multi-aspect N-way morphing based on interference-free speech representations

Hideki Kawahara; Masanori Morise; Hideki Banno; Verena G. Skuk

Voice morphing is a powerful tool for exploratory research and various applications. A temporally variable multi-aspect morphing is extended to enable morphing of arbitrarily many voices in a single step procedure. The proposed method is implemented based on interference-free representations of periodic signals and found to yield highly-naturally sounding manipulated voices which are useful for investigating human perception of voice. The formulation of the proposed method is general enough to be applicable to other representations and easily modified depending on application needs.


international conference on acoustics, speech, and signal processing | 2002

Synthesis of car noise based on a composition of engine noise and friction noise

Yoshihide Ban; Hideki Banno; Kazuya Takeda; Fumitada Itakura

This paper describes a method for generation of car noise based on the engine noise and the “friction noise.” The engine noise is modeled by composition of a stationary background noise that depends on the size of the engine and nonstationary noise that depends rotational speed of the engine. The friction noise is modeled as a white noise with ranging power. Based on these models, methods for synthesis of these components are developed. Subjective assessment of the car noise synthesis method shows that it is fairly similar to the actual noise.


international conference on acoustics, speech, and signal processing | 2004

Algorithm amalgam: morphing waveform based methods, sinusoidal models and STRAIGHT

Hideki Kawahara; Hideki Banno; Toshio Irino; Parham Zolfaghari

A tool to investigate an important fundamental question in speech processing is proposed aiming to promote research on voice quality and para and non linguistic aspects of speech. The proposed method effectively emulates waveform-based methods, sinusoidal models and the high quality source filter model STRAIGHT The key idea that enables blending these seemingly disjoint algorithms is a group delay based representation of signal excitation. By using a STRAIGHT-based smoothed time-frequency representation that is shared by these three types of speech processing methods, a unified source representation is used to implement the proposed system. Informal listening tests using the proposed system indicated that phase manipulation introduces different timbre, but it does not need to reproduce the exact waveform to reproduce the same timbre. This may suggest that the possibility of further information reduction exists in synthesizing close to natural quality speech.


Electronics and Communications in Japan Part Iii-fundamental Electronic Science | 1999

SPEECH MORPHING BY INDEPENDENT INTERPOLATION OF A SPECTRAL ENVELOPE AND SOURCE EXCITATION

Hideki Banno; Kazuya Takeda; Kiyohiro Shikano; Fumitada Itakura

A speech morphing algorithm based on progressive interpolation of spectral envelopes and source signals is proposed. The basic morphing scheme is; 1) determine the time correspondence for unit waveforms of original and target speech, 2) separate speech spectra into an envelope and a sound source, 3) obtain the frequency correspondence for spectral channels of original and target speech for each envelope, 4) interpolate both source signal and envelope, 5) construct a unit waveform, and 6) generate morphing speech by PSOLA. In the objective test, the proposed method can reduce the spectral distortion by 1.9 dB in comparison with the method based on progressive substitution of spectra, when it is used for interpolating two vowels in a real speech. The effectiveness of the method is also confirmed by a subjective test in which 89% (male to female) or 93% (female to male) subjects preferred the proposed method.


international conference on acoustics, speech, and signal processing | 2003

In-car speech recognition using distributed microphones-adapting to automatically detected driving conditions

Hideki Banno; Tetsuya Shinde; Kazuya Takeda; Fumitada Itakura

In this paper, we describe a multichannel method of noisy speech recognition that can adapt to various in-car noise situations during driving. The method allows us to estimate the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by multiple distributed microphones. Through clustering of the spatial noise distributions under various driving conditions, the regression weights for MRLS are effectively adapted to the driving conditions. The experimental evaluation shows an average error rate reduction of 43 % in isolated word recognition under 15 different driving conditions.


international conference on acoustics, speech, and signal processing | 2010

High quality voice manipulation method based on the vocal tract area function obtained from sub-band LSP of straight spectrum

Ayanori Arakawa; Yoshinori Uchimura; Hideki Banno; Fumitada Itakura; Hideki Kawahara

This paper describes a high-quality manipulation method of voice quality base on the vocal tract area function (VTAF) obtained from sub-band LSP of STRAIGHT spectrum. Our research group had developed the manipulation technique of voice quality based on VTAF that can generate natural formant transition. However, it is observed that the generated sound sometimes results in degradation when the input signal has a high sampling frequency. Therefore, we develop a new method that extracts VTAF properly from such input signal. This method firstly divides the input spectral envelope represented by STRAIGHT spectrum into lower and higher frequency bands, secondly extracts the Line spectrum pair (LSP) in each frequency band after spectral flattening that is appropriate for the frequency band, thirdly concatenates a pair of the sub-band LSP, and finally obtains VTAF from PARCOR coefficients converted from the concatenated LSP. A subjective experiment proved that the proposed method is high quality enough.


ieee intelligent vehicles symposium | 2011

Development and evaluation of a scheme for detecting multiple approaching vehicles through acoustic sensing

Kensaku Asahi; Hideki Banno; Osami Yamamoto; Akira Ogawa; Keiichi Yamada

We propose a robust scheme for detecting vehicles approaching from several directions to prevent crossing collisions. The scheme consists of three sequential operations. The first operation computes cross-power spectrum phase (CSP) coefficients of sound signals from a microphone array; the second operation classifies the information obtained by the CSP coefficients into multiple ranges of directions so as to separate multiple approaching vehicles. The last one detects approaching vehicles through a threshold operation of the clustered information. In order to evaluate the performance of the scheme, we constructed a large-scale database based on the acoustic and related information concerned with approaching vehicles. Using the database, we show that the scheme is robust for detecting not only a single but also multiple approaching vehicles.

Collaboration


Dive into the Hideki Banno's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Akira Ogawa

Iwate Medical University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge