Mats Blomberg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mats Blomberg is active.

Explore More

Publication

Featured researches published by Mats Blomberg.

Speech Communication | 2000

An overview of the CAVE project research activities in speaker verification

Frédéric Bimbot; Mats Blomberg; Lou Boves; Hans-Peter Hutter; Cédric Jaboulet; Johan Koolwaaij; Johan Lindberg; Jean-Benoı̂t Pierrot

This article presents an overview of the research activities carried out in the European CAVE project, which focused on text-dependent speaker verification on the telephone network using whole word Hidden Markov Models. It documents in detail various aspects of the technology and the methodology used within the project. In particular, it addresses the issue of model estimation in the context of limited enrollment data and the problem of a posteriori decision threshold setting. Experiments are carried out on the realistic telephone speech database SESP. State-of-the-art performance levels are obtained, which validates the technical approaches developed and assessed during the project as well as the working infrastructure which facilitated cooperation between the partners.

international conference on acoustics, speech, and signal processing | 1982

Effects of emphasizing transitional or stationary parts of the speech signal in a discrete utterance recognition system

Kjell Elenius; Mats Blomberg

A dynamic programming pattern matching isolated word recognition system has been modified in order to emphasize the transient parts of speech in the similarity measure. The technique is to weight the word distances with a normalized spectral change function. A small positive effect is measured. Emphasizing the stationary parts is shown to substantially decrease the performance. Adding the time derivative of the speech parameters to the word patterns improves performance significantly. This is probably a consequence of an improvement in the description of the transient segments.

international conference on acoustics speech and signal processing | 1998

A comparison of a priori threshold setting procedures for speaker verification in the CAVE project

Jean-Benoît Pierrot; Johan Lindberg; Johan Koolwaaij; Hans-Peter Hutter; Mats Blomberg; Frédéric Bimbot

The issue of a priori threshold setting in speaker verification is a key problem for field applications. In the context of the Caller Verification in Banking and Telecommunications (CAVE) project, we compared several methods for estimating speaker-independent and speaker-dependent decision thresholds. Relevant parameters are estimated from development data only, i.e. without resorting to additional client data. The various approaches are tested on the Dutch SESP database.

Speech Communication | 1991

Adaptation to a speaker's voice in a speech recognition system based on synthetic phoneme references

Mats Blomberg

Abstract A speech recognition system based on synthetic generation of reference prototypes is described. The vocabulary and grammar are described in a finite-state phoneme network. In the transformation from symbolic to spectral representation, reduction rules modify the initial phoneme target values and a coarticulation module inserts interpolated transition states at phoneme boundaries. The phoneme templates are specified in terms of control parameters to a seriel formant synthesiser. At each state, a 16-channel filter bank section is computed from the synthesis parameters. The recognition process uses a time-synchronous dynamic programming technique to find the path in the network that minimises the accumulated spectral distance to the input utterance. A technique for dynamic adaptation to the speakers voice source spectrum is performed during recognition. Without adaptation, the average recognition for ten male speakers was 88% on an isolated-word task using a 26-word vocabulary. Adding voice source adaptation raised the performance to 96%. On a vocabulary of 3 connected digits, the adaptation technique improved the recognition rate for six male speakers from 87.7% to 92.8%. The improvement was largest for subjects with low initial recognition rate, indicating the benefit of the voice source adaptation technique for certain voices. Changing the voice source model and optimising the adaptation time constant raised the recognition rate further to 96.1%. Current work is directed towards speaker adaptation of phoneme parameters and modelling of the variability of the parameter dynamics at phoneme boundaries.

international conference on acoustics, speech, and signal processing | 1989

Synthetic phoneme prototypes in a connected-word speech recognition system

Mats Blomberg

A recognition system based on a reference library of synthetic phoneme prototypes is described. The phoneme templates are specified in terms of formant synthesis parameters. The vocabulary and grammar are described in a finite-state network where each node represents a phoneme. A transition between two phonemes in the net is expanded to a number of new nodes using interpolation on the synthesis parameters or at the spectrum level. For each node, a 16-channel filter bank section is computed from the synthesis parameters. Adaptation to each speakers individual voice source spectrum is performed during recognition. Auditory forward masking is incorporated. Speaker-independent recognition results are given for male speakers on isolated words and connected digits. Future improvements include coarticulation and reduction rules and speaker adaptation of phoneme parameters. The method could also be used in combination with hidden Markov models to provide reference data in cases not covered by the training material.<<ETX>>

Recent Research Towards Advanced Man-Machine Interface Through Spoken Language | 1996

Word recognition using synthesized templates

Mats Blomberg; Rolf Carlson; Kjell Elenius; Björn Granström; Sheri Hunnicutt

Publisher Summary This chapter reports on some experiments, which are part of a long term project towards knowledge based speech recognition system, NEBULA. An extreme stand is taken in these experiments of comparing human speech to predicted pronunciations on the acoustic level with the help of straightforward pattern matching technique. The significantly better results when human references are used were not a surprise. It is well known that text-to-speech systems still need more work before they reach human quality. However, the results can be regarded as encouraging. The low levels of NEBULA explore the descriptive power of cues, and use multiple cues to analyze, classify, and segment the speech wave. The mid-portion of NEBULA is currently represented by a syntactic component of the text-to-speech system, morphological decomposition in the text-to-speech system, and a concept-to-speech system.

Journal of the Acoustical Society of America | 1988

Word recognition using synthesized reference templates

Mats Blomberg; Rolf Carlson; Kjell Elenius; Björn Granström; Sheri Hunnicutt

A major problem in large‐vocabulary speech recognition is the collection of reference data and speaker normalization. In this paper, the use of synthetic speech is proposed as a means of handling this problem. An experimental scheme for such a speech recognition system will be described. A rule‐based speech synthesis procedure is used for generating the reference data. Ten male subjects participated in an experiment using a 26‐word test vocabulary recorded in a normal office room. The subjects were asked to read the words from a list with little instruction except to pronounce each word separately. The synthesis was used to build the references. No adjustments were done to the synthesis in this first stage. All the human speakers served better as reference than the synthesis. Differences between natural and synthetic speech have been analyzed in detail at the segmental level. Methods for updating the synthetic speech parameters from natural speech templates will be described. [This work has been supported by the Swedish Board of Technical Development.]

international conference on spoken language processing | 1996

Creation of unseen triphones from diphones and monophones using a speech production approach

Mats Blomberg; Kjell Elenius

With limited training data, infrequent triphone models for speech recognition are not observed in sufficient numbers. In this paper, a speech production approach is used to predict the characteristics of unseen triphones by concatenating diphones and/or monophones in the parametric representation of a formant speech synthesiser. The parameter trajectories are estimated by interpolation between the endpoints of the original units. The spectral states of the created triphone are generated by the speech synthesiser. Evaluation of the proposed technique has been performed using spectral error measurements and recognition candidate rescoring of N-best lists. In both cases, the created triphones are shown to perform better than the shorter units from which they were constructed.

Journal of the Acoustical Society of America | 1978

A phonetically based isolated word recognition system

Mats Blomberg; Kjell Elenius

The principal object in using a phonetic approach is the reduction of the influence on recognition rate caused by the intra‐ and inter‐speaker speech variations. The system is implemented on a 16‐K minicomputer and uses a filter bank delivering spectral sections, 0–5 kHz, every 10 ms. Estimates of the first three formants are calculated and energies in different spectral bands are used to segment the speech signal into broad classes. The following measures are calculated depending on the segmental class and the speech parameter: mean values, steady‐state values, durations, transition rates, and some distances between formants. In a learning phase the statistics of the measures of the used vocabulary are automatically calculated by a program given the quasiphonetic spelling of the input words. The statistics are based on phoneme pairs, i.e., diphones. In the recognition phase the program uses the statistics and the quasiphonetic spelling to recognize the input words. Six male speakers were used for calculating the statistics of a 41‐word vocabulary. Their mean recognition rate was 98%, using a new recording. The rate decreased to 96.3% using four male talkers, unknown to the system. [Work supported by STU, Sweden.]

international conference on acoustics, speech, and signal processing | 1986

Nonlinear frequency warp for speech recognition

Mats Blomberg; Kjell Elenius

A technique of nonlinear frequency warping has been investigated for recognition of Swedish vowels. A frequency warp between two spectra is computed using a standard dynamic programming algorithm. The frequency distance, defined as the area between the obtained warping function and the diagonal, is contributing to the spectral distance. The distance between two spectra is a weighted sum of the warped amplitude distance and the frequency distance. By changing two weights, we get a gradual shift between non-warped amplitude distance, warped amplitude distance, and frequency distance. In recognition experiments on natural and synthetic vowel spectra, a metric combining the frequency and amplitude distances gave better results than using only amplitude or frequency deviation. Analysis of the results of the synthetic vowels show a reduced sensitivity to voice source and pitch variation. For the natural vowels, the recognition improvement is larger for the male and female speakers separately than for the combined groups.

Explore More