Ari Heikkinen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ari Heikkinen is active.

Explore More

Publication

Featured researches published by Ari Heikkinen.

international conference on acoustics, speech, and signal processing | 2002

A 4 kb/s hybrid MELP/CELP speech coding candidate for ITU standardization

Alan V. McCree; Jacek Stachurski; Takahiro Unno; Erdem Ertan; Erdal Paksoy; Vishu R. Viswanathan; Ari Heikkinen; Anssi Rämö; Sakari Himanen; Peter Blöcher; Oliver Dressler

This paper presents an improved 4 kb/s hybrid MELP/CELP speech coder submitted as a candidate for ITU standardization. The coder uses three modes: a high-quality MELP coder for strongly voiced speech frames, an ACELP coder with pitch prediction for weakly voiced frames, and a stochastic CELP coder for unvoiced frames. We present recent enhancements to this coder, both to improve speech quality and to reduce coder complexity. Previous lTU Selection Testing results on an earlier version of this coder showed that it met nearly all requirements for toll-quality speech, more than any other candidate. Our internal testing shows that the current reduced-complexity fixed-point coder maintains this high performance.

international conference on acoustics, speech, and signal processing | 2003

Hybrid MELP/CELP coding at bit rates from 6.4 to 2.4 kb/s

Jacek Stachurski; Alan V. McCree; Vishu R. Viswanathan; Ari Heikkinen; Anssi Rämö; Sakari Himanen; Peter Blöcher

This paper describes extensions of the 4 kb/s hybrid MELP/CELP coder, up to 6.4 kb/s and down to 2.4 kb/s. The baseline 4 kb/s coder uses three coding modes: MELP in strongly voiced speech frames, CELP with pitch prediction in weakly voiced frames, and CELP with stochastic excitation in unvoiced frames. To minimize switching artifacts between parametric MELP and waveform CELP coding, an alignment phase is encoded in MELP and zero-phase equalization is applied to the CELP target signal. The 6.4 kb/s extension uses the same three modes as the 4 kb/s coder, with improved MELP and CELP coders. The 2.4 kb/s extension uses only two modes: MELP for voiced frames and CELP synthesis with random excitation for unvoiced frames. The alignment phase is encoded in MELP frames for all bit rates so that time synchrony with input speech is always maintained. Alignment phase and zero-phase equalization enable smooth switching between coders at different bit rates. The hybrid MELP/CELP coding structure leads to coders that perform better at a given bit rate than MELP or CELP separately, and better than or equivalent to higher bit-rate ITU standards. Formal subjective tests show that for all-but-one tested conditions, the 6.4 kb/s hybrid coder is better than 8 kb/s G.729 and the 2.4 kb/s coder is equivalent to, or better than, 6.4 kb/s G.729 Annex D.

international conference on acoustics, speech, and signal processing | 2002

A novel quantization scheme for the noise-like component in waveform interpolation speech coding

Jani Nurminen; Ari Heikkinen; Jukka Saarinen

This paper presents a novel and efficient coding scheme for the noise-like component in waveform interpolation speech coding. The techniques employed in the task are described in detail and the advantages gained by using them are discussed. These techniques are shown to enable perceptually transparent coding of the magnitude spectra of the noise-like component with only ten bits per frame and to offer speech quality improvements at higher bit rates. The most significant features of the proposed scheme are smoothed representation and matrix quantization of the magnitude spectra. The proposed coding techniques can be used for enhancement of speech coders based on the waveform interpolation approach.

Speech Communication | 2004

Development of a 4 kbit/s hybrid sinusoidal/CELP speech coder

Ari Heikkinen

Abstract A comprehensive performance analysis of sinusoidal and code excited linear prediction (CELP) speech coding is given around 4 kbit/s, using both subjective and objective measurements. Based on the observations made, justification for the multi-modal hybrid coding approach employing both sinusoidal and CELP coding is given, and an implementation of such a coder is described. This 4 kbit/s sinusoidal/CELP speech coder utilizes four modes to classify the input speech segment: voiced, jittery-voiced, plosive and unvoiced. For voiced segments sinusoidal coding is used whereas different CELP versions are employed for the other modes. The quality of the implemented 4 kbit/s sinusoidal/CELP speech coder in clean speech conditions is finally verified by a listening test. In the test, the 4 kbit/s coder performed almost as well as the high-quality references used, but it still needs improvements to be classified as a high-quality 4 kbit/s speech coder.

Speech Communication | 2002

On methods for perfect reconstruction WI Speech coding with preprocessing

Mikko Tammi; Ari Heikkinen; Jukka Saarinen

The waveform interpolation (WI) speech coding algorithm has been shown to be an efficient method to describe the evolution of periodic voiced components in the speech signal. However, the conventional WI coding does not provide perfect reconstruction property, i.e. the decoded signal does not converge to the original signal with decreasing quantization error. Therefore errors in the coding model cannot be fixed by quantization. In this paper we discuss about characteristics of the WI coding model and about modifications to the model which enable the perfect reconstruction property. The new requirements and features are examined and discussed in detail. While the perfect reconstruction property brings many benefits it also causes new demands to the operation of the coder. Particularly high requirements are set to the exactness of the pitch estimate; inaccuracies hamper rapidly the possibilities to quantize the parameters efficiently. To overcome this we introduce a preprocessing method which slightly modifies the pitch structure of the residual signal before waveform extraction. The modifications to the signal are minor and therefore the quality of the preprocessed signal is very close to that of the input speech. In the proposed method the perfect reconstruction property is maintained in relation to the preprocessed signal.

Archive | 1995