Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hiroyuki Ehara is active.

Publication


Featured researches published by Hiroyuki Ehara.


international conference on acoustics, speech, and signal processing | 2007

ITU-T G.729.1: AN 8-32 Kbit/S Scalable Coder Interoperable with G.729 for Wideband Telephony and Voice Over IP

Stéphane Ragot; Balazs Kovesi; Romain Trilling; David Virette; Nicolas Duc; Dominique Massaloux; Stéphane Proust; Bernd Geiser; Martin Gartner; Stefan Schandl; Hervé Taddei; Yang Gao; Eyal Shlomot; Hiroyuki Ehara; Koji Yoshida; Tommy Vaillancourt; Redwan Salami; Mi Suk Lee; Do Young Kim

This paper describes the scalable coder - G.729.1 - which has been recently standardized by ITU-T for wideband telephony and voice over IP (VoIP) applications. G.729.1 can operate at 12 different bit rates from 32 down to 8 kbit/s with wideband quality starting at 14 kbit/s. This coder is a bitstream interoperable extension of ITU-T G.729 based on three embedded stages: narrowband cascaded CELP coding at 8 and 12 kbit/s, time-domain bandwidth extension (TDBWE) at 14 kbit/s, and split-band MDCT coding with spherical vector quantization (VQ) and pre-echo reduction from 16 to 32 kbit/s. Side information - consisting of signal class, phase, and energy - is transmitted at 12, 14 and 16 kbit/s to improve the resilience and recovery of the decoder in case of frame erasures. The quality, delay, and complexity of G.729.1 are summarized based on ITU-T results.


international conference on acoustics, speech, and signal processing | 2015

Overview of the EVS codec architecture

Martin Dietz; Markus Multrus; Vaclav Eksler; Vladimir Malenovsky; Erik Norvell; Harald Pobloth; Lei Miao; Zhe Wang; Lasse Laaksonen; Adriana Vasilache; Yutaka Kamamoto; Kei Kikuiri; Stephane Ragot; Julien Faure; Hiroyuki Ehara; Vivek Rajendran; Venkatraman S. Atti; Ho-Sang Sung; Eunmi Oh; Hao Yuan; Changbao Zhu

The recently standardized 3GPP codec for Enhanced Voice Services (EVS) offers new features and improvements for low-delay real-time communication systems. Based on a novel, switched low-delay speech/audio codec, the EVS codec contains various tools for better compression efficiency and higher quality for clean/noisy speech, mixed content and music, including support for wideband, super-wideband and full-band content. The EVS codec operates in a broad range of bitrates, is highly robust against packet loss and provides an AMR-WB interoperable mode for compatibility with existing systems. This paper gives an overview of the underlying architecture as well as the novel technologies in the EVS codec and presents listening test results showing the performance of the new codec in terms of compression and speech/audio quality.


international conference on acoustics, speech, and signal processing | 2004

Efficient spectrum coding for super-wideband speech and its application to 7/10/15 kHz bandwidth scalable coders

Masahiro Oshikiri; Hiroyuki Ehara; Koji Yoshida

The paper presents an efficient spectrum coding method for super-wideband (beyond 7 kHz, e.g. 10 kHz or 15 kHz bandwidth) speech signals based on a bandwidth expansion technique. By using a 7 kHz bandwidth speech signal, its frequency band over 7 kHz is generated by the expansion technique without violating the harmonics structure of the speech signal. The bandwidth expansion is performed by pitch filtering in a frequency domain. A 7 kHz bandwidth spectrum is used as a pitch filter state, and pitch filtering is performed toward a frequency band over 7 kHz. We adopted this pitch filtering based spectrum coding (PFSC) to our proposing 7/10/15 kHz bandwidth scalable coder. The scalable coder consists of an existing standard wideband coder as a base-layer and two PFSC coders as an enhancement-layer. One PFSC coder encodes a 7-10 kHz band spectrum at 4.4 kbit/s and the other a 10-15 kHz band spectrum at 2.2 kbit/s. When the AMR-WB coder at 15.85 kbit/s is used as the base-layer, the total bitrate of the scalable coder is 22.45 kbit/s and the total algorithmic delay is 30 ms. We conducted degradation category rating (DCR) tests for both 10 kHz and 15 kHz bandwidth signals. The results show that the DMOS score of the proposed coder is better than that of the 7 kHz bandwidth original signals in both bandwidth clean speech conditions. In addition, when G.722 at 56 kbit/s is used as the base-layer instead of the AMR-WB coder, the DMOS score of this scalable coder is close to that of the 7 kHz bandwidth original signals in both bandwidth audio conditions.


international conference on acoustics, speech, and signal processing | 2001

A candidate for the ITU-T 4 kbit/s speech coding standard

Jes Thyssen; Yang Gao; Adil Benyassine; Eyal Shlomot; Carlo Murgia; Huan-Yu Su; Kazunori Mano; Yusuke Hiwasaki; Hiroyuki Ehara; Kazutoshi Yasunaga; Claude Lamblin; Balazs Kovesi; Joachim Stegmann; Hong-Goo Kang

This paper presents the 4 kbit/s speech coding candidate submitted by AT&T, Conexant, Deutsche Telekom, France Telecom, Matsushita, and NTT for the ITU-T 4 kbit/s selection phase. The algorithm was developed jointly based on the qualification version of Conexant. This paper focuses on the development carried out during the collaboration in order to narrow the gap to the requirements in an attempt to provide toll quality at 4 kbit/s. This objective is currently being verified in independent subjective tests coordinated by ITU-T and carried out in multiple languages. Subjective tests carried out during the development indicate that the collaboration work has been successful in improving the quality, and that meeting a majority of the requirements in the extensive selection phase test is a realistic goal.


international conference on acoustics, speech, and signal processing | 2000

Dispersed-pulse codebook and its application to a 4 kb/s speech coder

Kazutoshi Yasunaga; Hiroyuki Ehara; Koji Yoshida; Toshiyuki Morii

This paper presents a dispersed-pulse codebook for CELP coder. This codebook generates an excitation vector by convoluting dispersion vectors with signed pulses in an algebraic codevector. The dispersion vectors are obtained through training so the coding distortion to be reduced. An objective evaluation result shows that the coding distortion with this codebook is smaller than that with an algebraic codebook. The dispersed-pulse codebook is applied to a 4 kb/s CELP coder. Subjective evaluation results show that: (1) the fundamental performance of the 4 kb/s coder is equivalent to that of G.726 32 kb/s coder, and (2) the performances of the 4k b/s coder under some error and background noise conditions are equivalent to those of G.729 8 kb/s coder.


Journal of the Acoustical Society of America | 2009

Speech signal transmission apparatus and method that multiplex and packetize coded information

Hiroyuki Ehara

A speech signal transmission apparatus multiplexes, packetizes, and sends first coded information coded in a normal state and second coded information used for improving the quality of decoded speech when a frame loss occurs. A first error calculating section calculates a first error signal between a target signal and a synthesized signal generated by an adaptive codebook, and a second error calculating section calculates a second error signal between the target signal and a synthesized signal generated by a fixed codebook. An error signal ratio calculating section calculates the ratio of the first error signal to the second error signal. A speech frame classifying section classifies a speech frame according to the magnitude of the ratio, and a decision section decides whether or not to multiplex the second coded information based on the classification result.


IEEE Transactions on Multimedia | 2008

Decoder Initializing Technique for Improving Frame-Erasure Resilience of a CELP Speech Codec

Hiroyuki Ehara; Koji Yoshida

The authors present and evaluate a technique for synchronizing the internal states of a code-excited-linear-prediction (CELP) encoder and decoder after the occurrence of frame erasure. The designed technique, called ldquoduplicated transmission (DT),rdquo uses some redundant information for realizing synchronization. The encoder performs encoding processes twice and sends two codes for each frame. One code is encoded by an encoder that is initialized. The code is used in cases where the previous frame is erased. An onset detector is combined with the DT technique to select the frames to which the DT should be applied. Subjective test results suggest that, by introducing DT selectively, the number of DT frames is reducible by about 80% without degrading the subjective quality. Results demonstrate that synchronization of the internal states is effective in cases of erasure of onset. The DT technique requires no additional algorithmic delay. For that reason, it would a better choice for particular applications for which the delay has a significant impact.


international conference on acoustics, speech, and signal processing | 2007

An 8-12 Kbit/S Embedded CELP Coder Interoperable with ITU-T G.729 CIDER: First Stage of the New G.729.1 Standard

Dominique Massaloux; Romain Trilling; Claude Lamblin; Stéphane Ragot; Hiroyuki Ehara; Mi Suk Lee; Do Young Kim; Bruno Bessette

ITU-T G.729.1 is a scalable coder recently standardized in ITU-T for wideband telephony and voice over IP (VoIP) applications. Composed of three stages, this codec provides a scalable bitstream between 8 and 32 kbit/s both in narrowband and wideband. This paper describes the first stage which is a narrowband embedded CELP coder at 8 and 12 kbit/s. The 8 kbit/s layer ensures interoperability with ITU-T G.729 standard with a reduced complexity, and with a quality better than G.729 Annex A. At 12 kbit/s, G.729.1 reaches the quality level of the 11.8 kbit/s G.729 Annex E in spite of the embedded structure. The modifications brought to the original G.729 scheme to achieve this performance are explained and formal test results provided.


international conference on acoustics, speech, and signal processing | 2015

Low bit rate high-quality MDCT audio coding of the 3GPP EVS standard

Srikanth Nagisetty; Zongxian Liu; Takuya Kawashima; Hiroyuki Ehara; Xuan Zhou; Bin Wang; Zexin Liu; Lei Miao; Jon Gibbs; Lasse Laaksonen; Venkatraman S. Atti; Vivek Rajendran; Venkatesh Krishnan; Ho-Sang Sung; Ki-hyun Choo

This paper presents a low bit-rate MDCT coder, which is adopted as a part of the recently standardized codec for Enhanced Voice Services. To maximize codec performance for NB to SWB input signals for low bit-rates (7.2 to 16.4 kbps), new adaptive bit-allocation and spectrum quantization schemes, which emphasize perceptually important spectrum while efficiently coding full spectrum, was introduced into the low bit-rate MDCT coder. Further, small symbol switched Huffman coding is exploited for reducing the bits consumption for quantizing band energies of the spectrum. Finally, the performance of the coder is illustrated with some listening test results.


Speech Communication | 2007

Predictive vector quantization of wideband LSF using narrowband LSF for bandwidth scalable coders

Hiroyuki Ehara; Toshiyuki Morii; Koji Yoshida

For implementing a bandwidth-scalable coder, a wideband line spectral frequency (LSF) quantizer was developed. It works in combination with a narrowband LSF quantizer. A new predictive vector quantization was introduced to the wideband LSF quantizer. The predictive vector quantizer is based on the use of several predictive contributions, which include first-order auto regressive (AR) prediction and vector quantization (VQ) codebook mapping. One feature of the new predictive vector quantizer is exploitation of the correlation between wideband and narrowband LSFs quantized in the previous frame for estimating wideband LSF in the current frame. A 16-bit switched predictive three-stage vector quantizer was used to encode estimation residues. Results showed that introduction of the predictor brought about a performance improvement of 0.3dB in spectral distortion. This paper describes procedures of designing the predictor and the three-stage codebook, as well as simulation results.

Collaboration


Dive into the Hiroyuki Ehara's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kazunori Mano

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge