Catherine Middag | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Catherine Middag is active.

Explore More

Publication

Featured researches published by Catherine Middag.

EURASIP Journal on Advances in Signal Processing | 2009

Automated intelligibility assessment of pathological speech using phonological features

Catherine Middag; Jean-Pierre Martens; Gwen Van Nuffelen; Marc De Bodt

It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words) and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008) is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.

International Journal of Language & Communication Disorders | 2009

Speech technology-based assessment of phoneme intelligibility in dysarthria

Gwen Van Nuffelen; Catherine Middag; Marc De Bodt; Jean-Pierre Martens

BACKGROUND Currently, clinicians mainly rely on perceptual judgements to assess intelligibility of dysarthric speech. Although often highly reliable, this procedure is subjective with a lot of intrinsic variables. Therefore, certain benefits can be expected from a speech technology-based intelligibility assessment. Previous attempts to develop an automated intelligibility assessment mainly relied on automatic speech recognition (ASR) systems that were trained to recognize the speech of persons without known impairments. In this paper automatic speech alignment (ASA) systems are used instead. In addition, previous attempts only made use of phonemic features (PMF). However, since articulation is an important contributing factor to intelligibility of dysarthric speech and since phonological features (PLF) are shared by multiple phonemes, phonological features may be more appropriate to characterize and identify dysarthric phonemes. AIMS To investigate the reliability of objective phoneme intelligibility scores obtained by three types of intelligibility models: models using only phonemic features (yielded by an automated speech aligner) (PMF models), models using only phonological features (PLF models), and models using a combination of phonemic and phonological features (PMF + PLF models). METHODS & PROCEDURES Correlations were calculated between the objective phoneme intelligibility scores of 60 dysarthric speakers and the corresponding perceptual phoneme intelligibility scores obtained by a standardized perceptual phoneme intelligibility assessment. OUTCOMES & RESULTS The correlations between the objective and perceptual intelligibility scores range from 0.793 for the PMF models, over 0.828 for PLF models to 0.943 for PMF + PLF models. The features selected to obtain such high correlations can be divided into six main subgroups: (1) vowel-related phonemic and phonological features, (2) lateral-related features, (3) silence-related features, (4) fricative-related features, (5) velar-related features and (6) plosive-related features. CONCLUSIONS & IMPLICATIONS The phoneme intelligibility scores of dysarthric speakers obtained by the three investigated intelligibility model types are reliable. The highest correlation between the perceptual and objective intelligibility scores was found for models combining phonemic and phonological features. The intelligibility scoring system is now ready to be implemented in a clinical tool.

Computer Speech & Language | 2014

Robust automatic intelligibility assessment techniques evaluated on speakers treated for head and neck cancer

Catherine Middag; R.P. Clapham; Rob van Son; Jean-Pierre Martens

It is generally acknowledged that an unbiased and objective assessment of the communication deficiency caused by a speech disorder calls for automatic speech processing tools. In this paper, a new automatic intelligibility assessment method is presented. The method can predict running speech intelligibility in a way that is robust against changes in the text and against differences in the accent of the speaker. It is evaluated on a Dutch corpus comprising longitudinal data of several speakers who have been treated for cancer of the head and the neck. The results show that the method is as accurate as a human listener in detecting trends in the intelligibility over time. By evaluating the intelligibility predictions made with different models trained on distinct texts and accented speech data, evidence for the robustness of the method against text and accent factors is offered.

Speech Communication | 2014

Developing automatic articulation, phonation and accent assessment techniques for speakers treated for advanced head and neck cancer

R.P. Clapham; Catherine Middag; Frans J. M. Hilgers; Jean-Pierre Martens; Michiel W. M. van den Brekel; Rob van Son

Purpose: To develop automatic assessment models for assessing the articulation, phonation and accent of speakers with head and neck cancer (Experiment 1) and to investigate whether the models can track changes over time (Experiment 2). Method: Several speech analysis methods for extracting a compact acoustic feature set that characterizes a speakers speech are investigated. The effectiveness of a feature set for assessing a variable is assessed by feeding it to a linear regression model and by measuring the mean difference between the outputs of that model for a set of recordings and the corresponding perceptual scores for the assessed variable (Experiment 1). The models are trained and tested on recordings of 55 speakers treated non-surgically for advanced oral cavity, pharynx and larynx cancer. The perceptual scores are average unscaled ratings of a group of 13 raters. The ability of the models to track changes in perceptual scores over time is also investigated (Experiment 2). Results: Experiment 1 has demonstrated that combinations of feature sets generally result in better models, that the best articulation model outperforms the average human raters performance and that the best accent and phonation models are deemed competitive. Scatter plots of computed and observed scores show, however, that especially low perceptual scores are difficult to assess automatically. Experiment 2 has shown that the articulation and phonation models show only variable success in tracking trends over time and for only one of the time pairs are they deemed compete with the average human rater (Experiment 2). Nevertheless, there is a significant level of agreement between computed and observed trends when considering only a coarse classification of the trend into three classes: clearly positive, clearly negative and minor differences. Conclusions: A baseline tool to support the multi-dimensional evaluation of speakers treated non-surgically for advanced head and neck cancer now exists. More work is required to further improve the models, particularly with respect to their ability to assess low-quality speech.

Computer Speech & Language | 2016

Computing scores of voice quality and speech intelligibility in tracheoesophageal speech for speech stimuli of varying lengths

R.P. Clapham; Jean-Pierre Martens; Rob van Son; Frans J. M. Hilgers; Michiel Mw van den Brekel; Catherine Middag

In this paper, automatic assessment models are developed for two perceptual variables: speech intelligibility and voice quality. The models are developed and tested on a corpus of Dutch tracheoesophageal (TE) speakers. In this corpus, each speaker read a text passage of approximately 300 syllables and two speech therapists provided consensus scores for the two perceptual variables. Model accuracy and stability are investigated as a function of the amount of speech that is made available for speaker assessment (clinical setting). Five sets of automatically generated acoustic-phonetic speaker features are employed as model inputs. In Part I, models taking complete feature sets as inputs are compared to models taking only the features which are expected to have sufficient support in the speech available for assessment. In Part II, the impact of phonetic content and stimulus length on the computer-generated scores is investigated. Our general finding is that a text encompassing circa 100 syllables is long enough to achieve close to asymptotic accuracy.

text speech and dialogue | 2014

Visualization of Intelligibility Measured by Language-Independent Features

Tino Haderlein; Catherine Middag; Andreas K. Maier; Jean-Pierre Martens; Michael Döllinger; Elmar Nöth

Automatic intelligibility assessment using automatic speech recognition is usually language-specific. In this study, a language-independent approach based on alignment-free phonological and phonemic features is proposed. It utilizes models that are trained with Flemish speech, and it is applied to assess dysphonic German speakers. In order to visualize the results, two techniques were tested: a plain selection of most relevant features emerging from Ensemble Linear Regression involving feature selection, and a Sammon transform of all the features to a 3-D space. The test data comprised recordings of 73 hoarse persons (48.3 ± 16.8 years) who read the German version of the text “The North Wind and the Sun”. The reference evaluation was obtained by five speech therapists and physicians who rated intelligibility according to a 5-point Likert scale. In the 3-D visualization, the different levels of intelligibility were clearly separated. This could be the basis for an objective support for diagnostics in voice and speech rehabilitation.

Archive | 2012