Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jan Skoglund is active.

Publication


Featured researches published by Jan Skoglund.


IEEE Transactions on Speech and Audio Processing | 2000

Vector quantization based on Gaussian mixture models

Per Hedelin; Jan Skoglund

We model the underlying probability density function of vectors in a database as a Gaussian mixture (GM) model. The model is employed for high rate vector quantization analysis and for design of vector quantizers. It is shown that the high rate formulas accurately predict the performance of model-based quantizers. We propose a novel method for optimizing GM model parameters for high rate performance, and an extension to the EM algorithm for densities having bounded support is also presented. The methods are applied to quantization of LPC parameters in speech coding and we present new high rate analysis results for band-limited spectral distortion and outlier statistics. In practical terms, we find that an optimal single-stage VQ can operate at approximately 3 bits less than a state-of-the-art LSF-based 2-split VQ.


international conference on acoustics, speech, and signal processing | 1997

Predictive VQ for noisy channel spectrum coding: AR or MA?

Jan Skoglund; Jan Linden

In this paper, the performance of different predictive vector quantization (PVQ) structures is studied and compared for different degrees of channel noise. Predictive quantization schemes with an auto-regressive (AR) decoder structure are compared with schemes that employ a moving average (MA) decoder. For noisy channels MA prediction performs better than AR. It is shown here that a combination of a PVQ scheme (AR or MA) and a memoryless VQ outperforms both types of traditional predictive quantizer schemes in noiseless as well as noisy channels.


international conference on acoustics speech and signal processing | 1996

Exploiting interframe correlation in spectral quantization: a study of different memory VQ schemes

Thomas Eriksson; Jan Linden; Jan Skoglund

This paper addresses the problem of efficient transmission of the LSF parameters in speech coding using vector quantization (VQ). By performing a comparison of several memory VQ methods on the same database, we investigate what gains can be achieved by exploiting interframe correlation. The memory VQ methods studied are finite-state VQ and linear predictive VQ. By combining the memory VQ with a fixed memoryless VQ, called the safety-net, further improvements in performance can be obtained. It is found that memory VQ can improve the performance with 3-5 bits compared to memoryless VQ for error-free transmission. The best method in this study is a safety-net extended predictive VQ. For noisy channels, most memory methods perform worse than memoryless VQ, but the safety-net predictive VQ outperforms memoryless VQ for all tested channel error rates, with 4 bits less.


IEEE Transactions on Speech and Audio Processing | 2000

On time-frequency masking in voiced speech

Jan Skoglund; W.B. Kleijn

This paper addresses the issue of masking of noise in voiced speech. First, we examine the audibility of cyclostationary narrow-band noise bursts added to voiced speech generated by synthetic excitation. Varying the temporal location of noise within a pitch cycle corresponds to varying its phase spectrum. Using this fact, we found that a change of phase of the noise in the high frequency region is more perceptible for a low-pitched sound than for a high-pitched sound. We then propose a pitch-dependent temporal weighting function which can be employed in quantization of pitch cycle waveforms. In a second experiment, we found that the audibility of high-frequency noise added to natural speech can be significantly reduced using this weighting function.


international conference on acoustics, speech, and signal processing | 2013

Robustness of speech quality metrics to background noise and network degradations: Comparing ViSQOL, PESQ and POLQA

Andrew Hines; Jan Skoglund; Anil C. Kokaram; Naomi Harte

The Virtual Speech Quality Objective Listener (ViSQOL) is a new objective speech quality model. It is a signal based full reference metric that uses a spectro-temporal measure of similarity between a reference and a test speech signal. ViSQOL aims to predict the overall quality of experience for the end listener whether the cause of speech quality degradation is due to ambient noise, or transmission channel degradations. This paper describes the algorithm and tests the model using two speech corpora: NOIZEUS and E4. The NOIZEUS corpus contains speech under a variety of background noise types, speech enhancement methods, and SNR levels. The E4 corpus contains voice over IP degradations including packet loss, jitter and clock drift. The results are compared with the ITU-T objective models for speech quality: PESQ and POLQA. The behaviour of the metrics are also evaluated under simulated time warp conditions. The results show that for both datasets ViSQOL performed comparably with PESQ. POLQA was shown to have lower correlation with subjective scores than the other metrics for the NOIZEUS database.


ieee workshop on speech coding for telecommunications | 1997

Audibility of pitch-synchronously modulated noise

Jan Skoglund; W. Bastiaan Kleijn; Per Hedelin

This paper examines the audibility of stationary and cyclostationary narrow-band noise added to voiced speech generated by natural and synthetic excitation. Varying the temporal location of noise within a pitch cycle corresponds to varying its phase spectrum. Exploiting this fact, we find that a change of phase is more perceptible for a low-pitched sound than for a high-pitched sound. Our results support the notion that sinusoidal coders work well for female speech and that CELP coders work well for male speech.


asilomar conference on signals, systems and computers | 1997

Controlling spectral dynamics in LPC quantization for perceptual enhancement

Jonas Samuelsson; Jan Skoglund; J. Linden

Taking the evolution of spectral parameters into consideration in speech coding has been shown to enhance the perceptual performance. In this study we examine and compare two methods that are designed for explicit control of spectral dynamics. One method operates on the encoder part of the coding system by incorporating a constraint in the distortion measure and the other method smoothes the trajectory of output vectors at the decoder side. The decoder method requires however an additional coding delay of one frame. By means of listening experiments it is demonstrated for three different vector quantizer structures that especially the decoder method gives significant improvements. For noisy channels, the preference for this method is even more emphasized.


1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351) | 1999

SD optimization of spectral coders

Per Hedelin; Fredrik Nordén; Jan Skoglund

In spectral coding of speech, several different criteria are in use for designing and evaluating quantizers. One measure, spectral distortion (SD), has become dominant for comparisons between coders. At run-time, a coder normally quantizes vectors according to other measures, e.g. line spectrum frequency (LSF) distance, in order to keep computational complexity down. In this study, we adopt the SD criterion both in coder design and for quantizer operation. The quantizer is optimized to give minimal average SD scores, This allows us to address the question, is average SD measure really a good criterion, matching subjective ratings. We perform a few objective and subjective tests based on SD optimized coding and some versions thereof. Our tests imply that minimizing average SD may not lead to the best subjective scoring.


Speech Communication | 1998

Analysis and quantization of glottal pulse shapes

Jan Skoglund

Abstract In source-filter based speech coding for low bit rates an efficient representation of excitation pulses is required to attain high quality of the synthetic speech. In this paper, we discuss a pulse waveform representation by a codebook populated with pulse shapes. The codebook is designed from glottal derivative pulses obtained by a linear predictive inverse filtering technique. Pulses are extracted and normalized in time and amplitude to form prototype pulses. Design methods and performance evaluation of the codebooks are investigated in a vector quantization (VQ) framework. The quantization gains obtained by exploiting the correlation between pulses are studied by theoretic calculations which suggest that about 2 bits per vector (in a budget of 7–10 bits) can be gained when exploiting the correlation. Memory based VQ is a generic term for quantization schemes which utilizes previous quantized pulses. We study traditional memory based VQ methods and an extension of memory based VQ with memoryless VQ, denoted a safety-net extension. The experiments show that performance improves when extending memory based VQ with a safety-net. It is found that, at the designated bit rates, a safety-net extended memory based VQ can gain about 1.5–2 bits in comparison with memoryless VQ.


international conference on acoustics speech and signal processing | 1998

On nonlinear utilization of intervector dependency in vector quantization

Mikael Skoglund; Jan Skoglund

This paper presents an approach to speech vector quantization of sources exhibiting intervector dependency. We present the optimal decoder based on a collection of received indices. We also present the optimal encoder for such decoding. The optimal decoder can be implemented as a table look-up decoder, however the size of the decoder codebook grows very fast with the size of the collection of utilized indices. This leads us to introduce a method for storing an approximation to the set of optimal decoder vectors, based on linear mapping of a block code vector quantization. In this approach a heavily reduced set of parameters is employed to represent the codebook. Furthermore, we illustrate that the proposed scheme has an interpretation as nonlinear predictive quantization. Numerical results indicate high gain over memoryless coding and memory quantization based on linear predictive coding. The results also show that the sub-optimal approach performs close to the optimal.

Collaboration


Dive into the Jan Skoglund's collaboration.

Top Co-Authors

Avatar

Jan Linden

Chalmers University of Technology

View shared research outputs
Top Co-Authors

Avatar

Per Hedelin

Chalmers University of Technology

View shared research outputs
Top Co-Authors

Avatar

Thomas Eriksson

Chalmers University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge