Tomas Bäckström
Helsinki University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tomas Bäckström.
Speech Communication | 2009
Carlo Magi; Jouni Pohjalainen; Tomas Bäckström; Paavo Alku
Weighted linear prediction (WLP) is a method to compute all-pole models of speech by applying temporal weighting of the square of the residual signal. By using short-time energy (STE) as a weighting function, this algorithm was originally proposed as an improved linear predictive (LP) method based on emphasising those samples that fit the underlying speech production model well. The original formulation of WLP, however, did not guarantee stability of all-pole models. Therefore, the current work revisits the concept of WLP by introducing a modified short-time energy function leading always to stable all-pole models. This new method, stabilised weighted linear prediction (SWLP), is shown to yield all-pole models whose general performance can be adjusted by properly choosing the length of the STE window, a parameter denoted by M. The study compares the performances of SWLP, minimum variance distortionless response (MVDR), and conventional LP in spectral modelling of speech corrupted by additive noise. The comparisons were performed by computing, for each method, the logarithmic spectral differences between the all-pole spectra extracted from clean and noisy speech in different segmental signal-to-noise ratio (SNR) categories. The results showed that the proposed SWLP algorithm was the most robust method against zero-mean Gaussian noise and the robustness was largest for SWLP with a small M-value. These findings were corroborated by a small listening test in which the majority of the listeners assessed the quality of impulse-train-excited SWLP filters, extracted from noisy speech, to be perceptually closer to original clean speech than the corresponding all-pole responses computed by MVDR. Finally, SWLP was compared to other short-time spectral estimation methods (FFT, LP, MVDR) in isolated word recognition experiments. Recognition accuracy obtained by SWLP, in comparison to other short-time spectral estimation methods, improved already at moderate segmental SNR values for sounds corrupted by zero-mean Gaussian noise. For realistic factory noise of low pass characteristics, the SWLP method improved the recognition results at segmental SNR levels below 0dB.
IEEE Transactions on Speech and Audio Processing | 2002
Tomas Bäckström; Paavo Alku; Erkki Vilkman
The aim of this paper is to analyze and compare two time-domain parameterization methods of the glottal flow waveform on a large intensity range. The first parameter is the classical closing quotient which indicates the portion of a period where the glottis is closing. The second parameter is the normalized amplitude quotient which is defined using the ratio between the maximum flow amplitude and the negative peak amplitude of the differentiated glottal flow. The parameters are shown to be strongly correlated, and the normalized amplitude quotient to be a more accurate, consistent and robust measure than the closing quotient. The subjects, five female and six male, produced sustained phonations on a large intensity range. On this material, the normalized amplitude quotient is shown to vary systematically with sound pressure level, and it reveals information that for the closing quotient is hidden in the local variance.
IEEE Transactions on Speech and Audio Processing | 2004
Paavo Alku; Tomas Bäckström
An all-pole modeling technique, Linear Prediction with Low-frequency Emphasis (LPLE), which emphasizes the lower frequency range of the input signal, is presented. The method is based on first interpreting conventional linear predictive (LP) analyses of successive prediction orders with parallel structures using the concept of symmetric linear prediction. In these implementations, symmetric linear prediction is preceded by simple pre-filters, which are of either low or high frequency characteristics. Combining those symmetric linear predictors that are not preceded by high-frequency pre-filters yields the proposed LPLE predictor. It is proved that the all-pole filters computed by LPLE are always stable. The results achieved with vowels show that the proposed method is well-suited for those applications, where low-order all-pole models with improved modeling of the lowest formants, are needed.
international conference on acoustics, speech, and signal processing | 2005
Tomas Bäckström; Matti Airas; Laura Lehto; Paavo Alku
Glottal inverse filtering is a process where the effects of the vocal tract are cancelled from the speech signal in order to estimate the voice source. Traditionally, inverse filtering methods have involved a high level of manual tuning of parameters, such as the vocal tract model order. We present objective heuristics for the measurement of the quality of the resulting glottal flow estimate. In addition, we propose an automatic method for determining the order of the vocal tract all-pole model in inverse filtering based on phase-plane analysis and estimation of the glottal flow kurtosis.
IEEE Signal Processing Letters | 2007
Tomas Bäckström; Carlo Magi
White-noise correction is a technique used in speech coders using linear predictive coding (LPC). This technique generates an artificial noise-floor in order to avoid stability problems caused by numerical round-off errors. In this letter, we study the effect of white-noise correction on the roots of the LPC model. The results demonstrate in analytic form the relation between the noise floor level and the stability radius of the LPC model
international conference on acoustics, speech, and signal processing | 2002
Paavo Alku; Tomas Bäckström
This study presents a new technique called weighted-sum line spectrum pair (WLSP) where an all-pole filter is defined by using a sum of weighted line spectrum pair polynomials. The WLSP yields a stable all-pole filter of order m, whose autocorrelation function coincides with that of the input signal between indices 0 and m-1. By sacrificing the exact matching at index m, the WLSP models the autocorrelation of the input signal at the indices above m more accurately than conventional linear prediction (LP). Experiments with vowels show that, in comparison to the conventional LP, WLSP yields all-pole spectra that model formants with an increased dynamic range between formant peaks and spectral valleys.
IEEE Transactions on Speech and Audio Processing | 2004
Tomas Bäckström; Paavo Alku; Tuomas Paatero; W. Bastiaan Kleijn
The line spectrum pair (LSP) decomposition is a widely used method in speech coding. In this article, we will show that the LSP polynomials, whose trivial zeros have been removed, are equivalent to two optimal (in the mean square sense) predictors in which a sample is predicted from linear combinations of its previous averaged and differentiated values.
Signal Processing | 2003
Tomas Bäckström; Paavo Alku
A constrained linear predictive model, where the degrees of freedom of the predictor are reduced, is presented. It is shown that this model has the minimum-phase property and forms the basis for a convex space of polynomials with the minimum-phase property. Models presented have a wide range of potential applications, such as spectral modeling of speech and audio signals.
Signal Processing | 2008
Carlo Magi; Tomas Bäckström; Paavo Alku
This paper gives simple proofs of the root locations of two linear predictive methods: the symmetric linear prediction model and the eigenfilter model corresponding to the minimal or maximal simple eigenvalues of an autocorrelation matrix. The roots of both symmetric models are proved to lie on the unit circle. Differently from previous proofs, the approach used in the present study also shows, based on the properties of the autocorrelation sequence, that the root angles of the symmetric linear prediction model are limited to occur within a certain interval. Moreover, eigenfilters corresponding to the minimum or maximum eigenvalue of an autocorrelation matrix that have multiplicity greater than unity are also studied. It turns out that it is possible to characterise the whole space spanned by the eigenvectors corresponding to the multiple eigenvalues by a single symmetric/antisymmetric eigenvector of the principal diagonal sub-block of the autocorrelation matrix having all the roots on the unit circle.
IEEE Signal Processing Letters | 2007
Tomas Bäckström; Carlo Magi; Paavo Alku
We provide a theoretical lower limit on the distance of line spectral frequencies for both the line spectrum pair decomposition and the immittance spectrum pair decomposition. The result applies to line spectral frequencies computed from linear predictive polynomials with all roots within a zero-centered circle of radius r<1