Thomas Niesler
Stellenbosch University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Niesler.
international conference on acoustics speech and signal processing | 1996
Thomas Niesler; Philip C. Woodland
A language model based on word-category n-grams and ambiguous category membership with n increased selectively to trade compactness for performance is presented. The use of categories leads intrinsically to a compact model with the ability to generalise to unseen word sequences, and diminishes the sparseness of the training data, thereby making larger n feasible. The language model implicitly involves a statistical tagging operation, which may be used explicitly to assign category assignments to untagged text. Experiments on the LOB corpus show the optimal model-building strategy to yield improved results with respect to conventional n-gram methods, and when used as a tagger, the model is seen to perform well in relation to a standard benchmark.
international conference on acoustics speech and signal processing | 1999
Thomas Hain; Philip C. Woodland; Thomas Niesler; Edward W. D. Whittaker
This paper describes the 1998 HTK large vocabulary speech recognition system for conversational telephone speech as used in the NIST 1998 Hub5E evaluation. Front-end and language modelling experiments conducted using various training and test sets from both the Switchboard and Callhome English corpora are presented. Our complete system includes reduced bandwidth analysis, side-based cepstral feature normalisation, vocal tract length normalisation (VTLN), triphone and quinphone hidden Markov models (HMMs) built using speaker adaptive training (SAT), maximum likelihood linear regression (MLLR) speaker adaptation and a confidence score based system combination. A detailed description of the complete system together with experimental results for each stage of our multi-pass decoding scheme is presented. The word error rate obtained is almost 20% better than our 1997 system on the development set.
international conference on acoustics speech and signal processing | 1998
Thomas Niesler; Edward W. D. Whittaker; Philip C. Woodland
This paper compares various category-based language models when used in conjunction with a word-based trigram by means of linear interpolation. Categories corresponding to parts-of-speech as well as automatically clustered groupings are considered. The category-based model employs variable-length n-grams and permits each word to belong to multiple categories. Relative word error rate reductions of between 2 and 7% over the baseline are achieved in N-best rescoring experiments on the Wall Street Journal corpus. The largest improvement is obtained with a model using automatically determined categories. Perplexities continue to decrease as the number of different categories is increased, but improvements in the word error rate reach an optimum.
Physiological Measurement | 2006
W D Duckitt; S K Tuomi; Thomas Niesler
Snoring is a prevalent condition with a variety of negative social effects and associated health problems. Treatments, both surgical and therapeutic, have been developed, but the objective non-invasive monitoring of their success remains problematic. We present a method which allows the automatic monitoring of snoring characteristics, such as intensity and frequency, from audio data captured via a freestanding microphone. This represents a simple and portable diagnostic alternative to polysomnography. Our system is based on methods that have proved effective in the field of speech recognition. Hidden Markov models (HMMs) were employed as basic elements with which to model different types of sound by means of spectrally based features. This allows periods of snoring to be identified, while rejecting silence, breathing and other sounds. Training and test data were gathered from six subjects, and annotated appropriately. The system was tested by requiring it to automatically classify snoring sounds in new audio recordings and then comparing the result with manually obtained annotations. We found that our system was able to correctly identify snores with 82-89% accuracy, despite the small size of the training set. We could further demonstrate how this segmentation can be used to measure the snoring intensity, snoring frequency and snoring index. We conclude that a system based on hidden Markov models and spectrally based features is effective in the automatic detection and monitoring of snoring from audio data.
international conference on acoustics speech and signal processing | 1998
Philip C. Woodland; Thomas Hain; Sue E. Johnson; Thomas Niesler; Andreas Tuerk; Steve J. Young
This paper presents the development of the HTK broadcast news transcription system. Previously we have used data type specific modelling based on adapted Wall Street Journal trained HMMs. However, we are now experimenting with data for which no manual pre-classification or segmentation is available and therefore automatic techniques are required and compatible acoustic modelling strategies adopted. An approach for automatic audio segmentation and classification is described and evaluated as well as extensions to our previous work on segment clustering. A number of recognition experiments are presented that compare datatype specific and non-specific models; differing amounts of training data; the use of gender-dependent modelling and the effects of automatic data-type classification. It is shown that robust segmentation into a small number of audio types is possible and that models trained on a wide variety of data types can yield good performance.
international conference on spoken language processing | 1996
Thomas Niesler; Philip C. Woodland
A language model combining word based and category based n grams within a backoff framework is presented. Word n grams conveniently capture sequential relations between particular words, while the category model, which is based on part of speech classifications and allows ambiguous category membership, is able to generalise to unseen word sequences and therefore appropriate in backoff situations. Experiments on the LOB, Switchboard and WSJO corpora demonstrate that the technique greatly improves language model perplexities for sparse training sets, and offers significantly improved complexity versus performance tradeoffs when compared with standard trigram models.
Speech Communication | 2007
Thomas Niesler
The need to compile annotated speech databases remains an impediment to the development of automatic speech recognition (ASR) systems in under-resourced multilingual environments. We investigate whether it is possible to combine speech data from different languages spoken within the same multilingual population to improve the overall performance of a speech recognition system. For our investigation, we use recently collected Afrikaans, South African English, Xhosa and Zulu speech databases. Each consists of between 6 and 7h of speech that has been annotated at the phonetic and the orthographic level using a common IPA-based phone set. We compare the performance of separate language-specific systems with that of multilingual systems based on straightforward pooling of training data as well as on a data-driven alternative. For the latter, we extend the decision-tree clustering process normally used to construct tied-state hidden Markov models to allow the inclusion of language-specific questions, and compare the performance of systems that allow sharing between languages with those that do not. We find that multilingual acoustic models obtained in this way show a small but consistent improvement over separate-language systems as well as systems based on IPA-based data pooling.
Computer Speech & Language | 1999
Thomas Niesler; Philip C. Woodland
Abstract This paper presents a language model based onn-grams of word groups (categories). The length of eachn-gram is increased selectively according to an estimate of the resulting improvement in predictive quality. This allows the model size to be controlled while including longer-range dependencies when these benefit performance. The categories are chosen to correspond to part-of-speech classifications in a bid to exploita priorigrammatical information. To account for different grammatical functions, the language model allows words to belong to multiple categories, and implicitly involves a statistical tagging operation which may be used to label new text. Intrinsic generalization by the category-based model leads to good performance with sparse data sets. However word-basedn-grams deliver superior average performance as the amount of training material increases. Nevertheless, the category model continues to supply better predictions for wordn-tuples not present in the training set. Consequently, a method allowing the two approaches to be combined within a backoff framework is presented. Experiments with the LOB, Switchboard and Wall Street Journal corpora demonstrate that this technique greatly improves language model perplexities for sparse training sets, and offers significantly improved size vs. performance tradeoffs when compared with standard trigram models.
Southern African Linguistics and Applied Language Studies | 2005
Thomas Niesler; Philippa H. Louw; J. C. Roux
We present a corpus-based analysis of the Afrikaans, English, Xhosa and Zulu languages, comparing these in terms of phonetic content, diversity and mutual overlap. Our aim is to shed light on the fundamental phonetic interrelationships between these languages, with a view to furthering progress in multilingual automatic speech recognition in general, and in the South African region in particular.
Speech Communication | 2009
F. de Wet; C. Van der Walt; Thomas Niesler
This paper describes an attempt to automate the large-scale assessment of oral language proficiency and listening comprehension for fairly advanced students of English as a second language. The automatic test is implemented as a spoken dialogue system and consists of a reading as well as a repeating task. Two experiments are described in which different rating criteria were used by human judges. In the first experiment, proficiency was scored globally for each of the two test components. In the second experiment, various aspects of proficiency were evaluated for each section of the test. In both experiments, rate of speech (ROS), goodness of pronunciation (GOP) and repeat accuracy were calculated for the spoken utterances. The correlation between scores assigned by human raters and these three automatically derived measures was determined to assess their suitability as proficiency indicators. Results show that the more specific rating instructions used in the second experiment improved intra-rater agreement, but made little difference to inter-rater agreement. In addition, the more specific rating criteria resulted in a better correlation between the human and the automatic scores for the repeating task, but had almost no impact in the reading task. Overall, the results indicate that, even for the narrow range of proficiency levels observed in the test population, the automatically derived ROS and accuracy scores give a fair indication of oral proficiency.