Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Akitoshi Kataoka is active.

Publication


Featured researches published by Akitoshi Kataoka.


IEEE Transactions on Speech and Audio Processing | 1998

Design and description of CS-ACELP: a toll quality 8 kb/s speech coder

Redwan Salami; Claude Laflamme; Jean-Pierre Adoul; Akitoshi Kataoka; Shinji Hayashi; Takehiro Moriya; Claude Lamblin; Dominique Massaloux; Stéphane Proust; Peter Kroon; Yair Shoham

This paper describes the 8 kb/s speech coding algorithm G.729 which has been standardized by ITU-T. The algorithm is based on a conjugate-structure algebraic CELP (CS-ACELP) coding technique and uses 10 ms speech frames. The codec delivers toll-quality speech (equivalent to 32 kb/s ADPCM) for most operating conditions. This paper describes the coder structure in detail and discusses the reasons behind certain design choices. A 16-b fixed-point version has been developed as part of Recommendation G.729 and a summary of the subjective test results based on a real-time implementation of this version are presented.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Robust Speech Dereverberation Using Multichannel Blind Deconvolution With Spectral Subtraction

Ken'ichi Furuya; Akitoshi Kataoka

A robust dereverberation method is presented for speech enhancement in a situation requiring adaptation where a speaker shifts his/her head under reverberant conditions causing the impulse responses to change frequently. We combine correlation-based blind deconvolution with modified spectral subtraction to improve the quality of inverse-filtered speech degraded by the estimation error of inverse filters obtained in practice. Our method computes inverse filters by using the correlation matrix between input signals that can be observed without measuring room impulse responses. Inverse filtering reduces early reflection, which has most of the power of the reverberation, and then, spectral subtraction suppresses the tail of the inverse-filtered reverberation. The performance of our method in adaptation is demonstrated by experiments using measured room impulse responses. The subjective results indicated that this method provides superior speech quality to each of the individual methods: blind deconvolution and spectral subtraction.


international conference on acoustics, speech, and signal processing | 1993

An 8-bit/s speech coder based on conjugate structure CELP

Akitoshi Kataoka; Takehiro Moriya; Shinji Hayashi

A high-quality 8-bit/s speech coder based on CS:CELP (conjugate structure code excited linear prediction) with 10 ms frame length is presented. To provide high quality in both error-free and error conditions, it uses four schemes: LSP (line spectrum pair) quantization using interframe correlation, preselection of codebook search, a conjugate structure, and backward adaptation of the VQ (vector quantization) gain. LSP parameters are quantized by multistage VQ with MA prediction. The preselection of the codebook reduces computational complexity and improves robustness. The CS improves the ability to handle random bit errors and reduces memory requirements. The backward adaptation of the VQ gain provides high quality and robustness without having to transmit input speech power information. Subjective testing indicates that the quality of the proposed coder is equivalent to that of the 32 kbit/s ADPCM (adaptive differential pulse code modulation) under error-free conditions. It is also found that the proposed coder is robust against random bit errors.<<ETX>>


IEICE Transactions on Information and Systems | 2006

A G.711 Embedded Wideband Speech Coding for VoIP Conferences

Yusuke Hiwasaki; Hitoshi Ohmuro; Takeshi Mori; Sachiko Kurihara; Akitoshi Kataoka

This paper proposes a wideband speech coder in which a G.711 bitstream is embedded. This coder has an advantage over conventional coders in that it has a high interoperability with existing terminals so costly transcoding involving decoding and re-encoding can be avoided. We also propose a partial mixing method that effectively reduces the mixing complexity in multiple-point remote conferences. To reduce the complexity, we take advantage of the scalable structure of the bitstream and mix only the lower band of the signal. For the higher band, the main speaker location is selected among remote locations and is redistributed with the mixed lower-band signal. By subjective evaluations, we show that the speech quality can be maintained even when the speech signals are partially mixed.


IEEE Transactions on Speech and Audio Processing | 1996

An 8-kb/s conjugate structure CELP (CS-CELP) speech coder

Akitoshi Kataoka; Takehiro Moriya; Shinji Hayashi

This paper describes a high-quality 8-kb/s speech coder called conjugate structure code-excited linear prediction (CS-CELP) with a 10-ms frame length. To provide a short delay and high quality under both error-free and channel error conditions, it uses three new schemes: line spectrum pair (LSP) quantization using interframe prediction, preselection in the codebook search, and gain vector quantization (VQ) with backward prediction. The LSP parameters are quantized by using multistage VQ with moving-average (MA) prediction. This scheme can operate efficiently with various frequency responses of speech. The preselection of the codebook reduces the computational complexity and improves the robustness to channel errors. The gain VQ with backward prediction can provide a high quality and robustness without transmission of input speech power information. A conjugate structure for both random codebook and gain codebook is introduced to improve the ability to handle random bit errors and to reduce codebook storage memory requirements. Subjective testing indicates that the quality of this coder is equivalent to that of 32-kb/s adaptive differential pulse code modulation (ADPCM) under error-free conditions. Testing has further demonstrated that the coder is robust against random bit errors.


Signal Processing | 2006

Stereo echo cancellation algorithm using adaptive update on the basis of enhanced input-signal vector

Satoru Emura; Youichi Haneda; Akitoshi Kataoka; Shoji Makino

Stereo echo cancellation requires a fast converging adaptive algorithm because the stereo input signals are highly cross correlated and the convergence rate of the misalignment is slow even after preprocessing for unique identification of stereo echo paths. To speed up the convergence, we propose enhancing the contribution of the decorrelated components in the preprocessed input-signal vector to adaptive updates. The adaptive filter coefficients are updated on the basis of either a single or multiple past enhanced input-signal vectors.For a single-vector update, we show how this enhancement improves the convergence rate by analyzing the behavior of the filter coefficient error in the mean. For a two-past-vector update, simulation showed that the proposed enhancement leads to a faster decrease in misalignment than the corresponding conventional second-order affine projection algorithm while computational complexities are almost the same.


international conference on acoustics, speech, and signal processing | 1995

Improved CS-CELP speech coding in a noisy environment using a trained sparse conjugate codebook

Akitoshi Kataoka; Sachiko Hosaka; Jotaro Ikedo; Takehiro Moriya; Shinji Hayashi

A high-quality 8-kbit/s speech coder based on conjugate structure CELP (CS-CELP) is proposed that uses a trained sparse conjugate codebook. The trained sparse conjugate codebook improves speech quality for noisy speech. This codebook consists of two sub-codebooks and each sub-codebook consists of a random component and a trained component. Each component has excitation vectors consisting of a few pulses. In the random component, pulse position and amplitude are determined randomly. The trained component is determined by training. Subjective tests (differential mean opinion score, DMOS and mean opinion score, MOS) indicated that this codebook improves speech quality compared with the conventional trained codebook for noisy speech. The MOS showed that the quality of improved CS-CELP is equivalent to that of the 32-kbit/s ADPCM for clean speech.


international conference on consumer electronics | 2008

A Hands-Free Unit with Noise Reduction by Using Adaptive Beamformer

Kazunori Kobayashi; Yoichi Haneda; Ken’ichi Furuya; Akitoshi Kataoka

This paper presents an implementation of adaptive beamformer in a hands-free unit. The proposed adaptive beamformer suppresses stationary noise and interference sound. The adaptive beamformer and acoustic echo canceller were implemented in a compact hands-free unit with a low-cost DSP. The experimental results demonstrate the noise reduction performance of the hands-free unit.


international conference on acoustics, speech, and signal processing | 2006

Speech Dereverberation by Combining Mint-Based Blind Deconvolution and Modified Spectral Subtraction

Ken'ichi Furuya; Sumitaka Sakauchi; Akitoshi Kataoka

A dereverberation technique is developed to provide an alternative means of reducing reverberation in speech signals. The conventional MINT (the multiple-input/output inverse-filtering theorem) method uses the room impulse responses to calculate the inverse filters, so it cannot recover speech signals in practice, where the room impulse responses are unknown in advance. Our method blindly estimates the inverse filters by computing the correlation matrix between input signals that can be observed, instead of room impulse responses. We also combine the inverse filtering with modified spectral subtraction against the estimation error of inverse filters used in the field. The performance of the proposed method is demonstrated using actual room impulse responses


international conference on acoustics, speech, and signal processing | 1994

Implementation and performance of an 8-kbit/s conjugate structure CELP speech coder

Akitoshi Kataoka; Takehiro Moriya; Shinji Hayashi

This paper presents a high-quality 8-kbit/s speech coder (conjugate structure CELP: CS-CELP) that is a candidate for standardization by the ITU-T (formerly CCITT). To achieve high-quality for two types of speech (IRS and non-IRS (flat) speech) and real-time implementation, CS-CELP has been revised by two novel schemes. To handle two types of speech, the LSP parameters are quantized by multistage VQ with fourth-order interframe MA prediction. This scheme has little spectrum distortion, even if the two types of speech have many variations of the LSP parameters. The computational complexity of the implementation is reduced for adaptive and fixed-shape codebooks without degrading the speech quality. Multistage selection is adopted in the adaptive codebook; this selection uses a truncated impulse response. Improved pre-selection is proposed in the fixed-shape codebook. Subjective testing indicates that the quality of CS-CELP is equivalent to that of the 32-kbit/s ADPCM under error-free conditions for IRS and non-IRS speech. It also operates in real time using fixed-point DSP chips.<<ETX>>

Collaboration


Dive into the Akitoshi Kataoka's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yoichi Haneda

University of Electro-Communications

View shared research outputs
Top Co-Authors

Avatar

Shinji Hayashi

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Yusuke Hioka

University of Canterbury

View shared research outputs
Top Co-Authors

Avatar

Jotaro Ikedo

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Kazunori Mano

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shinji Hayashi

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge