Stuart P. Cunningham
University of Sheffield
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stuart P. Cunningham.
Clinical Linguistics & Phonetics | 2006
Mark Parker; Stuart P. Cunningham; Pam Enderby; Mark Hawley; Phil D. Green
The STARDUST project developed robust computer speech recognizers for use by eight people with severe dysarthria and concomitant physical disability to access assistive technologies. Independent computer speech recognizers trained with normal speech are of limited functional use by those with severe dysarthria due to limited and inconsistent proximity to “normal” articulatory patterns. Severe dysarthric output may also be characterized by a small mass of distinguishable phonetic tokens making the acoustic differentiation of target words difficult. Speaker dependent computer speech recognition using Hidden Markov Models was achieved by the identification of robust phonetic elements within the individual speaker output patterns. A new system of speech training using computer generated visual and auditory feedback reduced the inconsistent production of key phonetic tokens over time.
IEEE Transactions on Neural Systems and Rehabilitation Engineering | 2013
Mark Hawley; Stuart P. Cunningham; Phil D. Green; Pam Enderby; Rebecca Palmer; Siddharth Sehgal; Peter O'Neill
A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment-the voice-input voice-output communication aid (VIVOCA)-is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.
Augmentative and Alternative Communication | 2011
Zahoor Ahmad Khan; Phil D. Green; Sarah Creer; Stuart P. Cunningham
This case study describes the generation of a synthetic voice resembling that of an individual before she underwent a laryngectomy. Recordings of this person (6–7 min) speaking prior to the operation were used to create the voice. Synthesis was based on statistical speech models and this method allows models pre-trained on many speakers to be adapted to resemble an individual voice. The results of a listening test in which participants were asked to judge the similarity of the synthetic voice to the pre-operation (target) voice are reported. Members of the patients family were asked to make a similar judgment. These experiments show that, for most listeners, the voice is quite convincing despite the low quality and small quantity of adaptation data.
Computer Speech & Language | 2013
Sarah Creer; Stuart P. Cunningham; Phil D. Green; Junichi Yamagishi
Abstract For individuals with severe speech impairment accurate spoken communication can be difficult and require considerable effort. Some may choose to use a voice output communication aid (or VOCA) to support their spoken communication needs. A VOCA typically takes input from the user through a keyboard or switch-based interface and produces spoken output using either synthesised or recorded speech. The type and number of synthetic voices that can be accessed with a VOCA is often limited and this has been implicated as a factor for rejection of the devices. Therefore, there is a need to be able to provide voices that are more appropriate and acceptable for users. This paper reports on a study that utilises recent advances in speech synthesis to produce personalised synthetic voices for 3 speakers with mild to severe dysarthria, one of the most common speech disorders. Using a statistical parametric approach to synthesis, an average voice trained on data from several unimpaired speakers was adapted using recordings of the impaired speech of 3 dysarthric speakers. By careful selection of the speech data and the model parameters, several exemplar voices were produced for each speaker. A qualitative evaluation was conducted with the speakers and listeners who were familiar with the speaker. The evaluation showed that for one of the 3 speakers a voice could be created which conveyed many of his personal characteristics, such as regional identity, sex and age.
conference of the international speech communication association | 2015
Siddharth Sehgal; Stuart P. Cunningham
Dysarthria is a neurological speech disorder, which exhibits multi-fold disturbances in the speech production system of an individual and can have a detrimental effect on the speech output. In addition to the data sparseness problems, dysarthric speech is characterised by inconsistencies in the acoustic space making it extremely challenging to model. This paper investigates a variety of baseline speaker independent (SI) systems and its suitability for adaptation. The study also explores the usefulness of speaker adaptive training (SAT) for implicitly annihilating inter-speaker variations in a dysarthric corpus. The paper implements a hybrid MLLR-MAP based approach to adapt the SI and SAT systems. ALL the results reported uses UASPEECH dysarthric data. Our best adapted systems gave a significant absolute gain of 11.05% (20.42% relative) over the last published best result in the literature. A statistical analysis performed across various systems and its specific implementation in modelling different dysarthric severity sub-groups, showed that, SAT-adapted systems were more applicable to handle disfluencies of more severe speech and SI systems prepared from typical speech were more apt for modelling speech with low level of severity. Index Terms: speech recognition, dysarthric speech, speaker adaptation, speaker adaptive training
Applied Artificial Intelligence | 2011
Sarah Creer; Stuart P. Cunningham; Mark Hawley; Peter Wallis
The Social Engagement with Robots and Agents (SERA) project conducts research into making robots and agents more sociable. A robot setup was deployed for ten days in the homes of users to generate audio-visual data for analysis on the nature of the evolving human-robot relationship. This paper details the setup developed to provide opportunities for human-robot interaction and yield the quantity of data required for analysis. The robots function was not to exist as part of an experiment but to exist in the users home, fulfilling a role in his/her existing routine to ensure interaction. The system acted as an exercise monitor to encourage older people to lead a healthy lifestyle. The assumption made was that increased engagement and usefulness of the system leads to increased use, providing more data for analysis. This paper describes the SERA robot setup for each of three iterations of deployment, with particular reference to maximizing the amount of data collected.
spoken language technology workshop | 2014
Jort F. Gemmeke; Siddharth Sehgal; Stuart P. Cunningham; Hugo Van hamme
Over the past decade, several speech-based electronic assistive technologies (EATs) have been developed that target users with dysarthric speech. These EATs include vocal command & control systems, but also voice-input voice-output communication aids (VIVOCAs). In these systems, the vocal interfaces are based on automatic speech recognition systems (ASR), but this approach requires much training data and detailed annotation. In this work we evaluate an alternative approach, which works by mining utterance-based representations of speech for recurrent acoustic patterns, with the goal of achieving usable recognition accuracies with less speaker-specific training data. Comparisons with a conventional ASR system on dysarthric speech databases show that the proposed approach offers a substantial reduction in the amount of training data needed to achieve the same recognition accuracies.
Journal of the Acoustical Society of America | 2008
Frank Herrmann; Sandra P. Whiteside; Stuart P. Cunningham
Psycholinguistic research of the mid nineties suggests that articulatory routines for high frequency syllables are stored in the form of gestural scores in a library. Syllable frequency effects on naming latency and utterance duration have been interpreted as supporting evidence for such a syllabary. This paper presents a data-subset from a project investigating speech motor learning as a function of syllable type. Fourteen native speakers of English were asked to listen to and repeat 16 mono-syllabic stimuli which belonged to either of two categories: high & low frequency syllables (CELEX). Acoustic coarticulation measures, i.e. F2 locus equations and absolute formant changes, were used to indirectly determine the degree of gestural overlap in articulatory movements. In addition, utterance durations were measured to determine speed of articulation. Significant syllable frequency effects were found for both F2 Locus equations (e.g. slope and R²), and utterance duration. High frequency syllables exhibited greater degrees of coarticulation (steeper slopes), greater overall consistency in their production (greater R²) and shorter utterance durations than low frequency syllables. These data provide some further supporting evidence that different syllable categories may be encoded differently during speech production.
Journal of Voice | 2017
Tracy Jeffery; Stuart P. Cunningham; Sandra P. Whiteside
OBJECTIVES Automatic acoustic measures of voice quality in people with Down syndrome (DS) do not reliably reflect perceived voice qualities. This study used acoustic data and visual spectral data to investigate the relationship between perceived voice qualities and acoustic measures. STUDY DESIGN Participants were four young adults (two males, two females; mean age 23.8 years) with DS and severe learning disabilities, at least one of whom had a hearing impairment. METHODS Participants imitated sustained /i/, /u/, and /a/ vowels at predetermined target pitches within their vocal range. Medial portions of vowels were analyzed, using Praat, for fundamental frequency, harmonics-to-noise ratio, jitter, and shimmer. Spectrograms were used to identify the presence and the duration of subharmonics at onset and offset, and mid-vowel. The presence of diplophonia was assessed by auditory evaluation. RESULTS Perturbation data were highest for /a/ vowels and lowest for /u/ vowels. Intermittent productions of subharmonics were evident in spectrograms, some of which coincided with perceived diplophonia. The incidence, location, duration, and intensity of subharmonics differed between the four participants. CONCLUSIONS Although the acoustic data do not clearly indicate atypical phonation, diplophonia and subharmonics reflect nonmodal phonation. The findings suggest that these may contribute to different perceived voice qualities in the study group and that these qualities may result from intermittent involvement of supraglottal structures. Further research is required to confirm the findings in the wider DS population, and to assess the relationships between voice quality, vowel type, and physiological measures.
conference of the international speech communication association | 2015
Phil D. Green; Ricard Marxer; Stuart P. Cunningham; Heidi Christensen; Frank Rudzicz; Maria Yancheva; André Coy; Massimiliano Malavasi; Lorenzo Desideri
Clinical applications of speech technology face two challenges. The first is data sparsity. There is little data available to underpin techniques which are based on machine learning and, because it is difficult to collect disordered speech corpora, the only way to address this problem is by pooling what is produced from systems which are already in use. The second is personalisation. This field demands individual solutions, technology which adapts to its user rather than demanding that the user adapt to it. Here we introduce a project, CloudCAST, which addresses these two problems by making remote, adaptive technology available to professionals who work with speech: therapists, educators and clinicians. Index Terms: assistive technology, clinical applications of speech technology