Eric Thelen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eric Thelen is active.

Explore More

Publication

Featured researches published by Eric Thelen.

Journal of the Acoustical Society of America | 2002

User model-improvement-data-driven selection and update of user-oriented recognition model of a given type for word recognition at network server

Stefan Besling; Eric Thelen

A distributed pattern recognition system includes at least one user station and a server station. The server station and the user station are connected via a network, such as Internet. The server station includes different recognition models of a same type. As part of a recognition enrolment, the user station transfers model improvement data associated with a user of the user station to the server station. The server station selects a recognition model from the different recognition models of a same type in dependence on the model improvement data. For each recognition session, the user station transfers an input pattern representative of time sequential input generated by the user to the server station. The server station retrieves a recognition model selected for the user and provides the retrieved recognition model to a recognition unit for recognising the input pattern using the recognition models.

Journal of the Acoustical Society of America | 2004

Method of speech recognition

Stefan Besling; Eric Thelen; Meinhard Ullrich

In a method in which an information unit (4) enabling a speech input is stored on a server (5) and can be retrieved by a client (1, 2, 3) and in which the client can be coupled to one or more speech recognizers (7, 8, 9) through a communications network (6), the information unit (4) is assigned additional information (12) which is provided for determining a combination of a client (1, 2, 3) for recognizing an uttered speech signal and at least one of the speech recognizers (7, 8, 9), to dynamically assign the speech recognizers (7, 8, 9) in a communications network (6) to the information units (4) and thus ensure an acceptable processing time for the recognition of a speech input with a high recognition quality.

international conference on acoustics, speech, and signal processing | 1995

Automatic transcription of unknown words in a speech recognition system

Peter Beyerlein; Eric Thelen

We address the problem of automatically finding an acoustic representation (i.e. a transcription) of unknown words as a sequence of subword units, given a few sample utterances of the unknown words, and an inventory of speaker-independent subword units. The problem arises if a user wants to add his own vocabulary to a speaker-independent recognition system simply by speaking the words a few times. Two methods are investigated which are both based on a maximum-likelihood formulation of the problem. The experimental results show that both automatic transcription methods provide a good estimate of the acoustic models of unknown words. The recognition error rates obtained with such models in a speaker-independent recognition task are clearly better than those resulting from separate whole-word models. They are comparable with the performance of transcriptions drawn from a dictionary.

Journal of the Acoustical Society of America | 2000

Method for constructing a model of a new word for addition to a word model database of a speech recognition system

Reinhold Häb-Umbach; Peter Beyerlein; Eric Thelen

For speech recognition a new word is represented as based on a stored inventory of models of sub-word units. First a plurality of utterances are presented that all should conform to the word. For building a word model from the utterances, these are represented by a sequence of feature vectors. First, the utterances are used to train a whole-word model that is independent of the models of the sub-word units. The length of the whole-word model equals the average length of the utterances. Next, a sequence of Markov states and associated probability densities of acoustic events of the whole-word model is interpreted as a reference template represented by a string of averaged feature vectors. Finally, the string is recognized by matching to models in the inventory and storing a recognition result as a model of the utterances.

international conference on acoustics, speech, and signal processing | 1997

Speaker adaptation in the Philips system for large vocabulary continuous speech recognition

Eric Thelen; Xavier L. Aubert; Peter Beyerlein

The combination of maximum likelihood linear regression (MLLR) with maximum a posteriori (MAP) adaptation has been investigated for both the enrollment of a new speaker as well as for the asymptotic recognition rate after several hours of dictation. We show that a least mean square approach to MLLR is quite effective in conjunction with phonetically derived regression classes. Results are presented for both ARPA read-speech test sets and real-life dictation. Significant improvements are reported. While MLLR achieves a faster adaptation rate when only few data is available, MAP has desirable asymptotic properties and the combination of both methods provides the best results. Both incremental and iterative batch modes are studied and compared to the performance of speaker-dependent training.

international conference on spoken language processing | 1996

Long term on-line speaker adaptation for large vocabulary dictation

Eric Thelen

Online speaker adaptation is desirable for speech recognition dictation applications, because it offers the possibility to improve the system with the speaker specific data obtained from the user. Since the user will work with such a device over a long period, for a dictation system, the long term adaptation performance is more important than the adaptation speed. In contrast to speaker dependent retraining, the speaker specific speech data does not need to be stored for online speaker adaptation and each adaptation step does not require a large computational effort. We describe our way of performing online Bayesian speaker adaptation using partial traceback. We compare supervised with unsupervised adaptation and speaker adaptation with speaker dependent training using the adaptation material. Compared to the speaker independent startup models, the error rate was divided by two after five hours of supervised adaptation in our experiments. In the long term experiments, supervised online adaptation performed similar to speaker dependent training using the adaptation material.

Archive | 2000