Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hideki Kawahara is active.

Publication


Featured researches published by Hideki Kawahara.


Journal of the Acoustical Society of America | 2002

YIN, a fundamental frequency estimator for speech and music

Alain de Cheveigné; Hideki Kawahara

An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.


international conference on acoustics, speech, and signal processing | 1997

Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited

Hideki Kawahara

A simple new procedure called STRAIGHT (speech transformation and representation using adaptive interpolation of weighted spectrum) has been developed. STRAIGHT uses pitch-adaptive spectral analysis combined with a surface reconstruction method in the time-frequency region, and an excitation source design based on phase manipulation. It preserves the bilinear surface in the time-frequency region and allows for over 600% manipulation of such speech parameters as pitch, vocal tract length, and speaking rate, without further degradation due to the parameter manipulation.


international conference on acoustics, speech, and signal processing | 2008

Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation

Hideki Kawahara; Masanori Morise; Toru Takahashi; Ryuichi Nisimura; Toshio Irino; Hideki Banno

A simple new method for estimating temporally stable power spectra is introduced to provide a unified basis for computing an interference-free spectrum, the fundamental frequency (F0), as well as aperiodicity estimation. F0 adaptive spectral smoothing and cepstral liftering based on consistent sampling theory are employed for interference-free spectral estimation. A perturbation spectrum, calculated from temporally stable power and interference-free spectra, provides the basis for both F0 and aperiodicity estimation. The proposed approach eliminates ad-hoc parameter tuning and the heavy demand on computational power, from which STRAIGHT has suffered in the past.


Journal of the Acoustical Society of America | 2005

The processing and perception of size information in speech sounds.

David R. R. Smith; Roy D. Patterson; Richard E. Turner; Hideki Kawahara; Toshio Irino

There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds.


international conference on acoustics, speech, and signal processing | 2003

Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation

Hideki Kawahara; Hisami Matsui

An elastic spectral distance measure based on a F0 adaptive pitch synchronous spectral estimation and selective elimination of periodicity interferences, that was developed for a high-quality speech modification procedure STRAIGHT [1], is introduced to provide a basis for auditory morphing. The proposed measure is implemented on a low dimensional piecewise bilinear time-frequency mapping between the target and the original speech representations. A preliminary test results of morphing emotional speech samples indicated that proposed procedure provides perceptually monotonic and high-quality interpolation and extrapolation of CD quality speech samples.


Current Biology | 2008

Auditory Adaptation in Voice Perception

Stefan R. Schweinberger; Christoph Casper; Nadine Hauthal; Jürgen M. Kaufmann; Hideki Kawahara; Nadine Kloth; David M.C. Robertson; Adrian P. Simpson; Romi Zäske

Perceptual aftereffects following adaptation to simple stimulus attributes (e.g., motion, color) have been studied for hundreds of years. A striking recent discovery was that adaptation also elicits contrastive aftereffects in visual perception of complex stimuli and faces [1-6]. Here, we show for the first time that adaptation to nonlinguistic information in voices elicits systematic auditory aftereffects. Prior adaptation to male voices causes a voice to be perceived as more female (and vice versa), and these auditory aftereffects were measurable even minutes after adaptation. By contrast, crossmodal adaptation effects were absent, both when male or female first names and when silently articulating male or female faces were used as adaptors. When sinusoidal tones (with frequencies matched to male and female voice fundamental frequencies) were used as adaptors, no aftereffects on voice perception were observed. This excludes explanations for the voice aftereffect in terms of both pitch adaptation and postperceptual adaptation to gender concepts and suggests that contrastive voice-coding mechanisms may routinely influence voice perception. The role of adaptation in calibrating properties of high-level voice representations indicates that adaptation is not confined to vision but is a ubiquitous mechanism in the perception of nonlinguistic social information from both faces and voices.


Speech Communication | 1999

Multiple period estimation and pitch perception model

Alain de Cheveigné; Hideki Kawahara

Abstract The pitch of a periodic sound is strongly correlated with its period. To perceive the multiple pitches evoked by several simultaneous sounds, the auditory system must estimate their periods. This paper proposes a process in which the periodic sounds are canceled in turn (multistep cancellation model) or simultaneously (joint cancellation model). As an example of multistep cancellation, the pitch perception model of Meddis and Hewitt, 1991a , Meddis and Hewitt, 1991b can be associated with the concurrent vowel identification model of Meddis and Hewitt (1992) . A first period estimate is used to suppress correlates of the dominant sound, and a second period is then estimated from the remainder. The process may be repeated to estimate further pitches, or else to recursively refine the initial estimates. Meddis and Hewitts models are spectrotemporal (filter channel selection based on temporal cues) but multistep cancellation can also be performed in the spectral or time domain. In the joint cancellation model , estimation and cancellation are performed together in the time domain: the parameter space of several cascaded cancellation filters is searched exhaustively for a minimum output. The parameters that yield this minimum are the period estimates. Joint cancellation is guaranteed to find all periods, except in certain situations for which the stimulus is inherently ambiguous.


Journal of the Acoustical Society of America | 1997

Concurrent vowel identification. I. Effects of relative amplitude and F0 difference

Alain de Cheveigné; Hideki Kawahara; Minoru Tsuzaki; Kiyoaki Aikawa

Subjects identified concurrent synthetic vowel pairs that differed in relative amplitude and fundamental frequency (F0). Subjects were allowed to report one or two vowels for each stimulus, rather than forced to report two vowels as was the case in previously reported experiments of the same type. At all relative amplitudes, identification was better at a fundamental frequency difference (ΔF0) of 6% than at 0%, but the effect was larger when the target vowel amplitude was below that of the competing vowel (−10 or −20 dB). The existence of a ΔF0 effect when the target is weak relative to the competing vowel is interpreted as evidence that segregation occurs according to a mechanism of cancellation based on the harmonic structure of the competing vowel. Enhancement of the target based on its own harmonic structure is unlikely, given the difficulty of estimating the fundamental frequency of a weak target. Details of the pattern of identification as a function of amplitude and vowel pair were found to be inco...


Archive | 2005

Underlying Principles of a High-quality Speech Manipulation System STRAIGHT and Its Application to Speech Segregation

Hideki Kawahara; Toshio Irino

Testing human performance using ecologically relevant stimuli is crucial. STRAIGHT provide powerful means and strategies for doing this. This article outlined the underlying principles of STRAIGHT and the morphing procedure to provide general understanding for potential users of a new research strategy, “systematic downgrading.” The strategy seems to open up new research possibilities of testing human performance without disturbing their natural conditions.


IEEE Transactions on Signal Processing | 1993

Signal reconstruction from modified auditory wavelet transform

Toshio Irino; Hideki Kawahara

The authors propose a new method for signal modification in auditory peripheral representation: an auditory wavelet transform and algorithms for reconstructing a signal from a modified wavelet transform. They present the characteristics of signal analysis, synthesis, and reconstruction and also the data reduction criteria for signal modification. >

Collaboration


Dive into the Hideki Kawahara's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kiyoaki Aikawa

Tokyo University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge