Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nobuhiko Kitawaki is active.

Publication


Featured researches published by Nobuhiko Kitawaki.


IEEE Transactions on Speech and Audio Processing | 2003

Combined approach of array processing and independent component analysis for blind separation of acoustic signals

Futoshi Asano; Shiro Ikeda; Michiaki Ogawa; Hideki Asoh; Nobuhiko Kitawaki

Two array signal processing techniques are combined with independent component analysis (ICA) to enhance the performance of blind separation of acoustic signals in a reflective environment. The first technique is the subspace method which reduces the effect of room reflection when the system is used in a room. Room reflection is one of the biggest problems in blind source separation (BSS) in acoustic environments. The second technique is a method of solving permutation. For employing the subspace method, ICA must be used in the frequency domain, and precise permutation is necessary for all frequencies. In this method, a physical property of the mixing matrix, i.e., the coherency in adjacent frequencies, is utilized to solve the permutation. The experiments in a meeting room showed that the subspace method improved the rate of automatic speech recognition from 50% to 68% and that the method of solving permutation achieves performance that closely approaches that of the correct permutation, differing by only 4% in recognition rate.


IEEE Journal on Selected Areas in Communications | 1991

Pure delay effects on speech quality in telecommunications

Nobuhiko Kitawaki; Kenzo Itoh

The effect of transmission delay on speech quality in telecommunications is described, with human factors such as conversational mode and the talkers knowledge of the cause of delay taken into account. Objective quality estimation methods for delay effects are proposed, and these methods are applied in an actual communications network. In connection with delay perception in a telephone conversation, the assumption was verified that a talker expects a particular response time from a partner, and that delay that is outside this expectation time window is noticed. Taking this information into account, a subjective conversational experiment is controlled by six kinds of tasks by varying the temporal characteristics. Thus, a subjective assessment of delay effects is obtained by laboratory tests in relation to the detectability threshold, opinion rating, and conversational efficiency. Objective quality measures for each test were defined as a linear combination of temporal parameters that correspond closely to subjective qualities. >


IEEE Journal on Selected Areas in Communications | 1988

Objective quality evaluation for low-bit-rate speech coding systems

Nobuhiko Kitawaki; Hiromi Nagabuchi; Kenzo Itoh

An LPC (linear predictive coding) cepstrum distance measure (CD) is introduced as an objective measure for estimating the subjective quality of speech signals. Good correspondence between LPC CD and the subjective quality, expressed in terms of both opinion equivalent Q and mean opinion score, are shown. Good repeatability of objective quality evaluation using LPC CD is also shown. A method for generating an artificial voice signal that reflects the characteristics of real speech signals is described. The LPC CD values calculated using this artificial voice are almost the same as those calculated using real speech signals. The speaker-dependency of the coded-speech quality is shown to be an important factor in low-bit-rate speech coding. Even taking this factor into consideration, LPC CD is shown to be effective for estimating the subjective quality. >


IEEE Communications Magazine | 2004

Perceptual QoS assessment technologies for VoIP

Akira Takahashi; Hideaki Yoshino; Nobuhiko Kitawaki

Since quality is not generally guaranteed in an IP network, the proper design and management of networks and/or terminals for high-quality voice over IP services and maintenance of service levels is important. In terms of quality design and management, methodologies for appropriately and effectively evaluating the perceptual QoS of VoIP are indispensable. This article gives an overview of the state of the art of quality assessment technologies for VoIP, including recent work on improving their accuracy.


IEEE Transactions on Speech and Audio Processing | 1999

Common-acoustical-pole and zero modeling of head-related transfer functions

Youichi Haneda; Shoji Makino; Yutaka Kaneda; Nobuhiko Kitawaki

Use of a common-acoustical-pole and zero model is proposed for modeling head-related transfer functions (HRTFs) for various directions of sound incidence. The HRTFs are expressed using the common acoustical poles, which do not depend on the source directions, and the zeros, which do. The common acoustical poles are estimated as they are common to HRTFs for various source directions; the estimated values of the poles agree well with the resonance frequencies of the ear canal. Because this model uses only the zeros to express the HRTF variations due to changes in source direction, it requires fewer parameters (the order of the zeros) that depend on the source direction than do the conventional all-zero or pole/zero models. Furthermore, the proposed model can extract the zeros that are missed in the conventional models because of pole-zero cancellation. As a result, the directional dependence of the zeros can be traced well. Analysis of the zeros for HRTFs on the horizontal plane showed that the nonminimum-phase zero variation was well formulated using a simple pinna-reflection model. The common-acoustical-pole and zero (CAPZ) model is thus effective for modeling and analyzing HRTFs.


international conference on acoustics, speech, and signal processing | 2001

A combined approach of array processing and independent component analysis for blind separation of acoustic signals

Futoshi Asano; Shiro Ikeda; Michiaki Ogawa; Hideki Asoh; Nobuhiko Kitawaki

Two array signal processing techniques are combined with independent component analysis to enhance the performance of blind separation of acoustic signals in a reflective environment such as rooms. The first technique is the subspace method which reduces the effect of room reflection. The second technique is a method of solving the permutation, in which the coherency of the mixing matrix in adjacent frequencies is utilized.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Impairment Factor Framework for Wide-Band Speech Codecs

Sebastian Möller; Alexander Raake; Nobuhiko Kitawaki; Akira Takahashi; Marcel Wältermann

A new method is described for quantifying the quality degradation introduced by wide-band speech codecs via a one-dimensional impairment factor. The method is based on auditory listening-only tests, but the resulting impairment factors may be used for predicting speech quality in an instrumental way, e.g., for network planning purposes. Following the method, auditory test results are first transformed to an overall quality rating scale, and then adjusted to rule out test-specific effects. The derived impairment factors fit into the common framework which is defined by the E-model for narrow-band telephone networks, and which is hereby extended towards wide-band speech transmission. This paper presents the necessary auditory test data, describes the derivation and adjustment methodology, and provides numerical values for a range of wide-band speech codecs. The values are tested for their robustness in case of codec tandems and adjusted to represent the effects of packet loss


IEEE Communications Magazine | 1988

Quality assessment of speech coding and speech synthesis systems

Nobuhiko Kitawaki; Hiromi Nagabuchi

The concept of speech quality assessment is examined. Quality assessment methodologies for speech waveform coding, source coding, and speech synthesis by rule from the viewpoints of naturalness and intelligibility are reviewed. Both subjective and objective measures are considered.<<ETX>>


international conference on acoustics, speech, and signal processing | 2003

Estimation of the number of sound sources using support vector machines and its application to sound source separation

Kiyoshi Yamamoto; F. Asano; W.F.G. van Rooijen; E.Y.L. Ling; Takeshi Yamada; Nobuhiko Kitawaki

A method of estimating the number of sound sources in a reverberant sound field is proposed in this paper. It is known that the eigenvalue distribution of the spatial correlation matrix calculated from a multiple microphone input reflects information on the number of sources. However, in a reverberant sound field, the feature of the number of sources in the eigenvalue distribution is degraded by the room reverberation. In this paper, support vector machines is applied to classify the eigenvalue distributions which are not clearly separable. The proposed method is then applied to the source separation system and is evaluated via automatic speech recognition.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Performance Estimation of Speech Recognition System Under Noise Conditions Using Objective Quality Measures and Artificial Voice

Takeshi Yamada; Masakazu Kumakura; Nobuhiko Kitawaki

It is essential to ensure quality of service (QoS) when offering a speech recognition service for use in noisy environments. This means that the recognition performance in the target noise environment must be investigated. One approach is to estimate the recognition performance from a distortion value, which represents the difference between noisy speech and its original clean version. Previously, estimation methods using the segmental signal-to-noise ratio (SNRseg), the cepstral distance (CD), and the perceptual evaluation of speech quality (PESQ) have been proposed. However, their estimation accuracy has not been verified for the case when a noise reduction algorithm is adopted as a preprocessing stage in speech recognition. We, therefore, evaluated the effectiveness of these distortion measures by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. The results showed that in each case the distortion measure correlates well with the word accuracy when the estimators used are optimized for each individual noise reduction algorithm. In addition, it was confirmed that when a single estimator, optimized for all the noise reduction algorithms, is used, the PESQ method gives a more accurate estimate than SNRseg and CD. Furthermore, we have proposed the use of artificial voice of several seconds duration instead of a large amount of real speech and confirmed that a relatively accurate estimate can be obtained by using the artificial voice

Collaboration


Dive into the Nobuhiko Kitawaki's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Futoshi Asano

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hiromi Nagabuchi

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hideki Asoh

National Institute of Advanced Industrial Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge