Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tetsuo Funada is active.

Publication


Featured researches published by Tetsuo Funada.


Signal Processing | 1987

A method for the extraction of spectral peaks and its application to fundamental frequency estimation of speech signals

Tetsuo Funada

Abstract This paper describes a method to estimate the temporal patterns of the fundamental frequency of speech signals. This method makes use of the first and the second derivatives of a short-time power spectrum with respect to frequency. The time window of the spectrum is designated as i exp(−BiT) for i⩾0, and an effective algorithm to calculate the derivatives is proposed. An algorithm to determine voiced intervals is also proposed. The results of the application to the speech signals of Japanese vowels and the ten digit utterances show the effectiveness of the proposed method for obtaining a smoothed pitch contour.


international symposium on communications and information technologies | 2007

Noisy word recognition using a feature based on ternarized spectral slope

Megumi Umeno; Tetsuo Funada; Hideyuki Nomura

In previous paper, we proposed a feature FTTSS (Fourier transform of ternarized spectral slope) based on power spectrum derivatives with regard to frequency to develop a robust word recognition system under noisy environments, and we confirmed noise robustness of the feature compared with MFCC by applying it to word recognition with HMM. Generally, word recognition with HMM is improved by adding features that may express temporal variations, such as DeltaMFCC or DeltaFTTSS, because HMM can deal with only piecewise stationary signals. Actually, we have examined effectiveness of using DeltaFTTSS in word recognition. It is supposed that features showing raw temporal variations of spectral power are effective in speech recognition and ternary conversion of features may decrease deteriorations of recognition performance by noise corruption. Therefore in this research, we propose a new feature FTTTS (Fourier transform of ternarized temporal slope) instead of DeltaFTTSS. The FTTTS is defined by Fourier transform along frequency of smoothed ternarized temporal variations of spectral power at specific frequency. As a result, we have confirmed experimentally that the proposed feature FTTTS have noise robustness for SNR 0-20 dB compared with FTTSS+DeltaFTTSS or the conventional feature MFCC+DeltaMFCC by applying them to word recognition with HMM.


Systems and Computers in Japan | 2006

Subtopic segmentation in lecture speech for the creation of lecture video contents

Noboru Kanedera; Asuka Sumida; Takao Ikehata; Tetsuo Funada

Although still rare, video instructional materials which can be used over a network are on the increase. One of the reasons for the rarity of video instructional materials is thought to be the time and effort necessary for video editing. In this paper, the authors examine a method for automatically estimating subtopic segmentation positions from the speech information of an unedited lecture video, with the purpose of supporting the preparation of video instructional materials. Subtopic segmentation positions were estimated with comparisons of successive indexes using dynamic programming. The indexes were obtained by independent component analysis of text information attained from the speech recognition processing of the video. Through an experiment using unedited lecture video from five instructors, the proposed method was found to have a segmentation capacity equal to or better than the Hearst method, while allowing the number of segments to be set freely. It was also confirmed that the subtopic segmentation capacity using speech recognition output was equivalent to the use of transcribed text.


Electronics and Communications in Japan Part Iii-fundamental Electronic Science | 1997

Pitch extraction and voiced/unvoiced detection of speech by cross‐coupling multi‐layered neural network with feedback architecture

Hideo Miyabayashi; Tetsuo Funada

Pitch frequency is one of the most important voice characteristics, and its accurate extraction is important not only in speech analysis and synthesis, but also in speech coding, speech recognition, speaker recognition, and the like. Existing methods of improving extraction accuracy include waveform processing, correlative processing, and spectral processing. This paper describes the use of a neural network to extract pitch from voice features delivered from the bandpass filter pairs (BPFPs) proposed by Fonda et al. Three types of multi-layered neural networks able to learn time-continuity and high accuracy discrimination functions and have a recurrent structure are tested. The cross-coupling multi-layered neural network with feedback architecture gives the best improvement over conventional neural networks, and exhibits superior ability for learning time continuity of pitch and U/V information.


Speech Communication | 1990

A pitch extraction method using a bank of bandpass filter-pairs

Tetsuo Funada; Tatsuya Suzuki; Long Yu

Abstract Pitch detection remains one of the most difficult problems in speech analysis. Therefore, to detect pitch contours of speech, we have developed a new method which is quite different from conventional ones. The method utilizes a bank of bandpass filter-pairs; it is a fully continuous method in the time domain. This paper describes the parameter optimization of the filter-pair for a database of enlarged vocabulary and the integration of the filter-pair method by adding a voicing detector. Compared with conventional pitch detection methods, the proposed method produced a low Gross Pitch Error rate.


Journal of the Acoustical Society of America | 2008

A numerical analysis of fluctuations in pressure wave within the larynx using two‐dimensional asymmetrical vocal folds model

Hideyuki Nomura; Tomoo Kamakura; Tetsuo Funada

Numerical simulations of pathological voice production and estimations of pressure wave fluctuations are performed based on a two‐dimensional asymmetrical vocal folds (VFs) model. The asymmetrical VFs model takes into account of geometrical asymmetries (the thickness, effective depth of vibration region, and lateral rest position) and mechanical asymmetries (the Youngs modulus, density, and viscosity of VF tissues). Simulation results based on the asymmetrical VFs model show that the left and right VFs vibrate with a phase difference. Obtained pressure waves within the larynx and vocal tract indicate fluctuations of fundamental frequency, amplitude, and waveform. In order to quantitatively evaluate the fluctuations, the coefficient of variation of the fundamental frequency, the coefficient of variation of the amplitude, and the harmonic‐to‐noise ratio are estimated. With increasing the VF asymmetries, especially on the effective depth and the density of VF elements, remarkable fluctuations are observed n...


Journal of the Acoustical Society of America | 2008

Dependency of recognition rate on number of words for text‐independent speaker recognition using vector quantization

Hidenori Shimizu; Tetsuo Funada

In this research, we discuss speaker recognition using the Kohonen feature map. The map is constructed for each speaker, and it is trained by using log‐power and fourteenth‐order mel‐frequency cepstral coefficients (MFCC) and their temporal difference. The quantization distortion is computed between the input speech and a specific vector on the feature map of each speaker. We conduct speaker recognition experiment based on VQ distortion. Utterances of prefectural name in Japan are used as speech data. We examine particularly the dependency of recognition rate on number of words used for recognition. According to our experiments of speaker identification, this system correctly recognizes 98.9% by using a single word for 40 male speakers, while it attains 100% by using more than three words. Moreover, we confirmed superiority of using VQ over HMM under the same experimental conditions.


Journal of the Acoustical Society of America | 2006

Lung pressure dependence of glottal sound source

Hideyuki Nomura; Tetsuo Funada

The present study investigates the lung pressure dependence of the vibration of vocal cords using numerical experiments based on our proposed glottal sound source model. The glottal sound source model is described as a coupled problem between unsteady glottal jets and mechanical vocal cords. The vocal cord is assumed to be an elastic cover with effective mass of vocal cord. To simulate the mechanical properties of vocal cords, the elastic cover is supported by distributed small mechanical elements of a spring and damper. The speech production process can be predicted by alternately solving the motion of glottal jets and the vocal cords’ vibration. Results of this simulation show that the fundamental frequency of vocal cords’ vibration and the propagation velocity of the mucosal wave first increase and then remain constant with lung pressure. The threshold lung pressure of 200–400 Pa and the propagation velocity of mucosal waves is of the order of 1 m/s, which are consistent with measured values. These res...


Systems and Computers in Japan | 1991

Phoneme recognition with elliptic discrimination neural units

Noboru Kanedera; Tetsuo Funada

Many researchers achieved high phoneme recognition rates by multilayered neural networks with linear discrimination neural (LDN) units. However, it is difficult to analyze which components of the input are important to each unit in those LDN networks. This paper proposed a multilayer neural network with elliptic discrimination neural (EDN) units so that the functions of each unit in the network may be interpreted more definitely. The center of the elliptic discrimination boundary of a neural unit corresponds to a typical point in an input space. The radii of the ellipse express the extent of the corresponding components in the input space, hence it becomes clear which components of the input space are important to each unit in the EDN network. To compare the performance of EDN and LDN networks, recognition experiments of phonemes /b, d, g/ in 5240 tokens of a Japanese speech database were carried out. In the experiments, recognition rates were obtained by EDN networks as high as the rate by an LDN network. Also, it was confirmed which components of the input space are important to each unit in the EDN network.


Journal of the Acoustical Society of America | 1988

Speech analysis using a time‐varying ARX model for separating the source‐tract coupling of vowels

Tetsuo Funada

The purpose of this research is to extract formant frequencies precisely and to classify voiced/unvoiced intervals accurately based on a source‐tract model. A sequential estimation of the source wave (i.e., the glottal volume flow) and the vocal tract (VT) characteristics is achieved by using a time‐varying “ARX model,” where the term ARX model refers to an AR (autoregressive) model with an auxiliary nonwhite input (X input). This X input indicates the glottal volume flow in the present research. Applications to synthetic vowels generated by the two‐mass model demonstrated the following results: (1) Much information on the glottal closure and opening was obtained from the X input; and (2) compared to the conventional (autocorrelation) LP method, formant frequencies (especially the first formant) during the open period of the glottis were estimated more accurately. It has also been observed from real vowels uttered by a male speaker that the phase of the X input agrees with the phase of the glottal movemen...

Collaboration


Dive into the Tetsuo Funada's collaboration.

Top Co-Authors

Avatar

Hideyuki Nomura

University of Electro-Communications

View shared research outputs
Top Co-Authors

Avatar

Noboru Kanedera

International Computer Science Institute

View shared research outputs
Top Co-Authors

Avatar

Hideo Miyabayashi

Toyama National College of Maritime Technology

View shared research outputs
Top Co-Authors

Avatar

Asuka Sumida

Ishikawa National College of Technology

View shared research outputs
Top Co-Authors

Avatar

Noboru Kanedera

International Computer Science Institute

View shared research outputs
Top Co-Authors

Avatar

Takao Ikehata

Ishikawa National College of Technology

View shared research outputs
Top Co-Authors

Avatar

Tomoo Kamakura

University of Electro-Communications

View shared research outputs
Top Co-Authors

Avatar

Sukeyasu Kanno

Industrial Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge