Judith M. Kessens | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Judith M. Kessens is active.

Explore More

Publication

Featured researches published by Judith M. Kessens.

Speech Communication | 2003

A data-driven method for modeling pronunciation variation

Judith M. Kessens; Catia Cucchiarini; Helmer Strik

This paper describes a rule-based data-driven (DD) method to model pronunciation variation in automatic speech recognition (ASR). The DD method consists of the following steps. First, the possible pronunciation variants are generated by making each phone in the canonical transcription of the word optional. Next, forced recognition is performed in order to determine which variant best matches the acoustic signal. Finally, the rules are derived by aligning the best matching variant with the canonical transcription of the variant. Error analysis is performed in order to gain insight into the process of pronunciation modeling. This analysis shows that although modeling pronunciation variation brings about improvements, deteriorations are also introduced. A strong correlation is found between the number of improvements and deteriorations per rule. This result indicates that it is not possible to improve ASR performance by excluding the rules that cause deteriorations, because these rules also produce a considerable number of improvements. Finally, we compare three different criteria for rule selection. This comparison indicates that the absolute frequency of rule application (Fabs) is the most suitable criterion for rule selection. For the best testing condition, a statistically significant reduction in word error rate (WER) of 1.4% absolutely, or 8% relatively, is found.

Language and Speech | 2001

Obtaining Phonetic Transcriptions: A Comparison between Expert Listeners and a Continuous Speech Recognizer

Mirjam Wester; Judith M. Kessens; Catia Cucchiarini; Helmer Strik

Key words Abstract In this article, we address the issue of using a continuous speech recognition tool to obtain phonetic or phonological representations of speech. Two experiments were carried out in which the performance of a continuous speech recognizer (CSR) was compared to the performance of expert listeners in a task of judging whether a number of prespecified phones had been realized in an utterance. In the first experiment, nine expert listeners and the CSR carried out exactly the same task: deciding whether a segment was presentor no tin 467 cases. In the second experiment, we expanded on the first experiment by focusing on two phonological processes: schwa-deletion and schwa-insertion. The results of these experiments show that significant differences in performance were found between the CSR and the listeners, but also between individual listeners. Although some of these differences appeared to be statistically significant, their magnitude is such that they may very well be acceptable dependingon what the transcriptions are needed for. In other words, although the CSR is not infallible, it makes it possible to explore large data sets, which might outweigh the errors introduced by the mistakes the CSR makes. For these reasons, we can conclude that the CSR can be used instead of a listener to carry out this type of task: deciding whether a phone is presentor not.

Computer Speech & Language | 2004

On automatic phonetic transcription quality: Lower word error rates do not guarantee better transcriptions

Judith M. Kessens; Helmer Strik

Abstract The first goal of this study was to investigate the effect of changing several properties of a continuous speech recognizer (CSR) on the automatic phonetic transcriptions generated by the same CSR. Our results show that the quality of the automatic transcriptions can be improved by using ‘short’ hidden Markov models (HMMs) and by reducing the amount of contamination in the HMMs. The amount of contamination can be reduced by training the HMMs on the basis of a transcription that better matches the actual pronunciation, e.g., by modeling pronunciation variation or by training HMMs on read speech. Furthermore, we found that context-dependent HMMs should preferably not be trained on baseline transcriptions if there is a mismatch between these baseline transcriptions of the speech material and the realized pronunciation. Finally, we found that by combining the changes in the properties of the CSR, the quality of automatic transcription can be further improved. The second goal of this study was to find out whether a relationship exists between word error rate (WER) and transcription quality. As no clear relationship was found, we conclude that taking the CSR with the lowest WER does not necessarily provide the optimal solution for obtaining optimal automatic transcriptions.

Speech Communication | 1999