Satoru Hayamizu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Satoru Hayamizu is active.

Explore More

Publication

Featured researches published by Satoru Hayamizu.

IEEE Transactions on Speech and Audio Processing | 2000

Speech enhancement based on the subspace method

Futoshi Asano; Satoru Hayamizu; Takeshi Yamada; Satoshi Nakamura

A method of speech enhancement using microphone-array signal processing based on the subspace method is proposed and evaluated. The method consists of the following two stages corresponding to the different types of noise. In the first stage, less-directional ambient noise is reduced by eliminating the noise-dominant subspace. It is realized by weighting the eigenvalues of the spatial correlation matrix. This is based on the fact that the energy of less-directional noise spreads over all eigenvalues while that of directional components is concentrated on a few dominant eigenvalues. In the second stage, the spectrum of the target source is extracted from the mixture of spectra of the multiple directional components remaining in the modified spatial correlation matrix by using a minimum variance beamformer. Finally, the proposed method is evaluated in both a simulated model environment and a real environment.

ieee international conference on automatic face and gesture recognition | 1998

Gesture recognition using HLAC features of PARCOR images and HMM based recognizer

Takio Kurita; Satoru Hayamizu

The paper proposes a gesture recognition method which uses higher order local autocorrelation (HLAC) features extracted from PARCOR images. To extract dominant information from a sequence of images, the authors apply a linear prediction coding technique to the sequence of pixel values and PARCOR images are constructed from the PARCOR coefficients of the sequences of the pixel values. From the PARCOR images, HLAC features are extracted and the sequences of the features are used as the input vectors of the hidden Markov model (HMM) based recognizer. Since HLAC features are inherently shift-invariant and computationally inexpensive, the proposed method becomes robust to changes of shift of the persons position and makes real-time gesture recognition possible. Experimental results of gesture recognition are shown to evaluate the performance of the proposed method.

hawaii international conference on system sciences | 1993

HMM with protein structure grammar

Kiyoshi Asai; Satoru Hayamizu; Kentaro Onizuka

The authors propose a structure-prediction framework for proteins that uses hidden Markov models (HMM) with a protein structure grammar. By adopting a protein structure grammar, the HMM makes it possible to treat global interactions, the interaction between two secondary structures which are apart in the sequence. In this framework, prediction of local and global structures are totally treated through global and local interactions which are expressed by the protein sequence grammar. The relations between some of the previous methods for secondary structure prediction and HMMs are discussed. Some experimental results on secondary structure prediction are included. The learning algorithms for the HMMs are presented.<<ETX>>

Proceedings of the International Gesture Workshop on Gesture and Sign Language in Human-Computer Interaction | 1997

Are Listeners Paying Attention to the Hand Gestures of an Anthropomorphic Agent? An Evaluation Using a Gaze Tracking Method

Shuichi Nobe; Satoru Hayamizu; Osamu Hasgawa; Hideaki Takahashi

The information that listeners are looking at and paying attention to is significant in the evaluation of the human-anthropomorphic agent interaction system. A pilot study was conducted, using a gaze tracking method, on relevant aspects of an anthropomorphic agents hand gestures in a real-time setting. It revealed that a highly informative, one-handed gesture with seemingly-interactive speech attracted attention when it had a slower stroke and/or a long post-stroke hold at the Center-Center space and upper position.

asia pacific signal and information processing association annual summit and conference | 2015

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading

Satoshi Tamura; Hiroshi Ninomiya; Norihide Kitaoka; Shin Osuga; Yurie Iribe; Kazuya Takeda; Satoru Hayamizu

This paper develops an Audio-Visual Speech Recognition (AVSR) method, by (1) exploring high-performance visual features, (2) applying audio and visual deep bottleneck features to improve AVSR performance, and (3) investigating effectiveness of voice activity detection in a visual modality. In our approach, many kinds of visual features are incorporated, subsequently converted into bottleneck features by deep learning technology. By using proposed features, we successfully achieved 73.66% lipreading accuracy in speaker-independent open condition, and about 90% AVSR accuracy on average in noisy environments. In addition, we extracted speech segments from visual features, resulting 77.80% lipreading accuracy. It is found VAD is useful in both audio and visual modalities, for better lipreading and AVSR.

international conference on acoustics, speech, and signal processing | 2005

An auto-regressive, non-stationary excited signal parameter estimation method and an evaluation of a singing-voice recognition

Akira Sasou; Masataka Goto; Satoru Hayamizu; Kazuyo Tanaka

We have previously described an auto-regressive hidden Markov model (AR-HMM) and an accompanying parameter estimation method. The AR-HMM was obtained by combining an AR process with an HMM introduced as a non-stationary excitation model. We demonstrated that the AR-HMM can accurately estimate the characteristics of both articulatory systems and excitation signals from high-pitched speech. In this paper, we apply the AR-HMM to feature extraction from singing voices and evaluate the recognition accuracy of the AR-HMM-based approach.

international conference on spoken language processing | 1996

RWC multimodal database for interactions by integration of spoken language and visual information

Satoru Hayamizu; Osamu Hasegawa; Katunobu Itou; Katsuhiko Sakaue; Kazuyo Tanaka; Shigeki Nagaya; Masayuki Nakazawa; Takehiro Endoh; Fumio Togawa; Kenji Sakamoto; Kazuhiko Yamamoto

The paper describes the design policy and prototype data collection of RWC (Real World Computing Program) multimodal database. The database is intended for research and development on the integration of spoken language and visual information for human computer interactions. The interactions are supposed to use image recognition, image synthesis, speech recognition, and speech synthesis. Visual information also includes non-verbal communication such as interactions using hand gestures and facial expressions between human and a human-like CG (computer graphics) agent with a face and hands. Based on the experiments of interactions with these modes, specifications of the database are discussed from the viewpoint of controlling the variability and cost for the collection.

Journal of the Acoustical Society of America | 1996

Design and data collection for a spoken dialog database in the Real World Computing (RWC) program

Kazuyo Tanaka; Satoru Hayamizu; Yoichi Yamashita; Kiyohiro Shikano; Shuichi Itahashi; Ryu-ichi Oka

The RWC program is constructing substantial databases for advancing and evaluating research and development conducted under its program and related domains. In this presentation the motivation of this effort, a basic design of spoken dialog databases, and the current status of data collection work are described. At the first stage, some fundamental data collection has been carried out to determine several environmental conditions and data‐filing specifications. Here, two topics are selected for the dialog: one was dialogs between car dealers and customers, and the other was dialogs between travel agents and customers. Professional dealers and agents were employed to produce reality in the conversations. To date, 60 samples of dialogs were recorded and 48 of them were filed into CD‐ROMs which included about 10 h of speech waveforms with transcriptions and labeling‐related information. The speech data are almost completely spontaneous but are of good quality in the acoustic‐phonetic sense. The CD‐ROMs are r...

international conference on acoustics, speech, and signal processing | 1997

Speech enhancement using CSS-based array processing

Futoshi Asano; Satoru Hayamizu

A method for recovering the LPC spectrum from a microphone array input signal corrupted by ambient noise is proposed. This method is based on the CSS (coherent subspace) method, which is designed for DOA (direction of arrival) estimation of broadband array input signals. The noise energy is reduced in the subspace domain by the maximum likelihood method. To enhance the performance of noise reduction, elimination of the noise-dominant subspace using projection is further employed, which is effective when the SNR is low and classification of noise and signals in the subspace domain is difficult. The results of the simulation show that some small formants, which cannot be estimated by the conventional delay-and-sum beamformer, were well estimated by the proposed method.

Journal of the Acoustical Society of America | 1996

Array signal processing applicable to hearing aids

Futoshi Asano; Satoru Hayamizu; Yôiti Suzuki; Shinji Tsukui; Toshio Sone

For the use of hearing aids in noisy environments, noise reduction or speech enhancement techniques are being investigated by many researchers. In this report, array signal‐processing techniques and systems applicable to hearing aids, which are being developed by the authors, are introduced. A delay‐and‐sum beamformer, in which the microphone‐array outputs are summed with appropriate delays and weights, can focus on a signal coming from a certain direction and can reduce environmental noise. Moreover, by scanning the beam by controlling the delays and the weights, the directions of the sound sources can also be known. By combining these techniques, a noise reduction system, which tracks the movement of a talker and suppresses the environmental noise, can be constructed. In this report, hardware equipped with two DSPs (TMS320C40), which realizes the above method, is introduced. Moreover, by extending the beamformer to stereo outputs, a listener can perceive the direction from which the sound is coming. In ...

Explore More