Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Toshio Irino is active.

Publication


Featured researches published by Toshio Irino.


Journal of the Acoustical Society of America | 1997

A TIME-DOMAIN, LEVEL-DEPENDENT AUDITORY FILTER : THE GAMMACHIRP

Toshio Irino; Roy D. Patterson

A frequency-modulation term has been added to the gammatone auditory filter to produce a filter with an asymmetric amplitude spectrum. When the degree of asymmetry in this “gammachirp” auditory filter is associated with stimulus level, the gammachirp is found to provide an excellent fit to 12 sets of notched-noise masking data from three different studies. The gammachirp has a well-defined impulse response, unlike the conventional roex auditory filter, and so it is an excellent candidate for an asymmetric, level-dependent auditory filterbank in time-domain models of auditory processing.


international conference on acoustics, speech, and signal processing | 2008

Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation

Hideki Kawahara; Masanori Morise; Toru Takahashi; Ryuichi Nisimura; Toshio Irino; Hideki Banno

A simple new method for estimating temporally stable power spectra is introduced to provide a unified basis for computing an interference-free spectrum, the fundamental frequency (F0), as well as aperiodicity estimation. F0 adaptive spectral smoothing and cepstral liftering based on consistent sampling theory are employed for interference-free spectral estimation. A perturbation spectrum, calculated from temporally stable power and interference-free spectra, provides the basis for both F0 and aperiodicity estimation. The proposed approach eliminates ad-hoc parameter tuning and the heavy demand on computational power, from which STRAIGHT has suffered in the past.


Journal of the Acoustical Society of America | 2005

The processing and perception of size information in speech sounds.

David R. R. Smith; Roy D. Patterson; Richard E. Turner; Hideki Kawahara; Toshio Irino

There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds.


Speech Communication | 2002

Segregating information about the size and shape of the vocal tract using a time-domain auditory model: the stabilised wavelet-Mellin transform

Toshio Irino; Roy D. Patterson

We hear vowels pronounced by men and women as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that the auditory system can extract and separate information about the size of the vocal-tract from information about its shape. The duration of the impulse response of the vocal tract expands or contracts as the length of the vocal tract increases or decreases. There is a transform, the Mellin transform, that is immune to the effects of time dilation; it maps impulse responses that differ in temporal scale onto a single distribution and encodes the size information separately as a scalar constant. In this paper we investigate the use of the Mellin transform for vowel normalisation. In the auditory system, sounds are initially subjected to a form of wavelet analysis in the cochlea and then, in each frequency channel, the repeating patterns produced by periodic sounds appear to be stabilised by a form of time-interval calculation. The result is like a two-dimensional array of interval histograms and it is referred to as an auditory image. In this paper, we show that there is a two-dimensional form of the Mellin transform that can convert the auditory images of vowel sounds from vocal tracts with different sizes into an invariant Mellin image (MI) and, thereby, facilitate the extraction and separation of the size and shape information associated with a given vowel type. In signal processing terms, the MI of a sound is the Mellin transform of a stabilised wavelet transform of the sound. We suggest that the MI provides a good model of auditory vowel normalisation, and that this provides a good framework for auditory processing from cochlea to cortex.


Journal of the Acoustical Society of America | 2004

Robust and accurate fundamental frequency estimation based on dominant harmonic components

Tomohiro Nakatani; Toshio Irino

This paper presents a new method for robust and accurate fundamental frequency (F0) estimation in the presence of background noise and spectral distortion. Degree of dominance and dominance spectrum are defined based on instantaneous frequencies. The degree of dominance allows one to evaluate the magnitude of individual harmonic components of the speech signals relative to background noise while reducing the influence of spectral distortion. The fundamental frequency is more accurately estimated from reliable harmonic components which are easy to select given the dominance spectra. Experiments are performed using white and babble background noise with and without spectral distortion as produced by a SRAEN filter. The results show that the present method is better than previously reported methods in terms of both gross and fine F0 errors.


Journal of the Acoustical Society of America | 2006

Comparison of the roex and gammachirp filters as representations of the auditory filter

Masashi Unoki; Toshio Irino; Brian R. Glasberg; Brian C. J. Moore; Roy D. Patterson

Although the rounded-exponential (roex) filter has been successfully used to represent the magnitude response of the auditory filter, recent studies with the roex(p, w, t) filter reveal two serious problems: the fits to notched-noise masking data are somewhat unstable unless the filter is reduced to a physically unrealizable form, and there is no time-domain version of the roex(p, w, t) filter to support modeling of the perception of complex sounds. This paper describes a compressive gammachirp (cGC) filter with the same architecture as the roex(p, w, t) which can be implemented in the time domain. The gain and asymmetry of this parallel cGC filter are shown to be comparable to those of the roex(p, w, t) filter, but the fits to masking data are still somewhat unstable. The roex(p, w, t) and parallel cGC filters were also compared with the cascade cGC filter [Patterson et al., J. Acoust. Soc. Am. 114, 1529-1542 (2003)], which was found to provide an equivalent fit with 25% fewer coefficients. Moreover, the fits were stable. The advantage of the cascade cGC filter appears to derive from its parsimonious representation of the high-frequency side of the filter. It is concluded that cGC filters offer better prospects than roex filters for the representation of the auditory filter.


Archive | 2005

Underlying Principles of a High-quality Speech Manipulation System STRAIGHT and Its Application to Speech Segregation

Hideki Kawahara; Toshio Irino

Testing human performance using ecologically relevant stimuli is crucial. STRAIGHT provide powerful means and strategies for doing this. This article outlined the underlying principles of STRAIGHT and the morphing procedure to provide general understanding for potential users of a new research strategy, “systematic downgrading.” The strategy seems to open up new research possibilities of testing human performance without disturbing their natural conditions.


IEEE Transactions on Signal Processing | 1993

Signal reconstruction from modified auditory wavelet transform

Toshio Irino; Hideki Kawahara

The authors propose a new method for signal modification in auditory peripheral representation: an auditory wavelet transform and algorithms for reconstructing a signal from a modified wavelet transform. They present the characteristics of signal analysis, synthesis, and reconstruction and also the data reduction criteria for signal modification. >


international conference on acoustics, speech, and signal processing | 2009

Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown

Hideki Kawahara; Ryuichi Nisimura; Toshio Irino; Masanori Morise; Toru Takahashi; Hideki Banno

A generalized framework of auditory morphing based on the speech analysis, modification and resynthesis system STRAIGHT is proposed that enables each morphing rate of representational aspects to be a function of time, including the temporal axis itself. Two types of algorithms were derived: an incremental algorithm for real-time manipulation of morphing rates and a batch processing algorithm for off-line post-production applications. By defining morphing in terms of the derivative of mapping functions in the logarithmic domain, breakdown of morphing resynthesis found in the previous formulation in the case of extrapolations was eliminated. A method to alleviate perceptual defects in extrapolation is also introduced.


Speech Communication | 2008

A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments

Tomohiro Nakatani; Shigeaki Amano; Toshio Irino; Kentaro Ishizuka; Tadahisa Kondo

This paper proposes a method for fundamental frequency (F0) estimation and voicing decision that can handle wide-ranging speech signals including adult and infant utterances recorded in real noisy environments. In particular, infant utterances have unique characteristics that are different from those of adults, such as a wide F0 range, F0 abrupt transitions, and unique energy distribution patterns over frequencies. Therefore, conventional methods that were developed mainly for adult utterances do not necessarily work well for infant utterances especially when the signals are contaminated by background noise. Several techniques are introduced into the proposed method to cope with this problem. We show that the ripple-enhanced power spectrum based method (REPS) can estimate the F0s robustly, and that the use of instantaneous frequency (IF) enables us to refine the accuracy of the F0 estimates. In addition, the degree of dominance defined based on the IF is introduced as a robust voicing decision measure. The effectiveness of the proposed method is confirmed in terms of gross pitch errors and voicing decision errors in comparison with the recently proposed methods, Praat and YIN, using both longitudinal recordings of Japanese infant utterances and adult utterances.

Collaboration


Dive into the Toshio Irino's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tomohiro Nakatani

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Minoru Tsuzaki

Kyoto City University of Arts

View shared research outputs
Top Co-Authors

Avatar

Yasutaka Shimizu

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge