Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Frank Rudzicz is active.

Publication


Featured researches published by Frank Rudzicz.


language resources and evaluation | 2012

The TORGO database of acoustic and articulatory speech from speakers with dysarthria

Frank Rudzicz; Aravind Kumar Namasivayam; Talya Wolff

This paper describes the acquisition of a new database of dysarthric speech in terms of aligned acoustics and articulatory data. This database currently includes data from seven individuals with speech impediments caused by cerebral palsy or amyotrophic lateral sclerosis and age- and gender-matched control subjects. Each of the individuals with speech impediments are given standardized assessments of speech-motor function by a speech-language pathologist. Acoustic data is obtained by one head-mounted and one directional microphone. Articulatory data is obtained by electromagnetic articulography, which allows the measurement of the tongue and other articulators during speech, and by 3D reconstruction from binocular video sequences. The stimuli are obtained from a variety of sources including the TIMIT database, lists of identified phonetic contrasts, and assessments of speech intelligibility. This paper also includes some analysis as to how dysarthric speech differs from non-dysarthric speech according to features such as length of phonemes, and pronunciation errors.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Articulatory Knowledge in the Recognition of Dysarthric Speech

Frank Rudzicz

Disabled speech is not compatible with modern generative and acoustic-only models of speech recognition (ASR). This work considers the use of theoretical and empirical knowledge of the vocal tract for atypical speech in labeling segmented and unsegmented sequences. These combined models are compared against discriminative models such as neural networks, support vector machines, and conditional random fields. Results show significant improvements in accuracy over the baseline through the use of production knowledge. Furthermore, although the statistics of vocal tract movement do not appear to be transferable between regular and disabled speakers, transforming the space of the former given knowledge of the latter before retraining gives high accuracy. This work may be applied within components of assistive software for speakers with dysarthria.


international conference on acoustics, speech, and signal processing | 2011

Adapting acoustic and lexical models to dysarthric speech

Kinfe Tadesse Mengistu; Frank Rudzicz

Dysarthria is a motor speech disorder resulting from neurological damage to the part of the brain that controls the physical production of speech. It is, in part, characterized by pronunciation errors that include deletions, substitutions, insertions, and distortions of phonemes. These errors follow consistent intra-speaker patterns that we exploit through acoustic and lexical model adaptation to improve automatic speech recognition (ASR) on dysarthric speech. We show that acoustic model adaptation yields an average relative word error rate (WER) reduction of 36.99% and that pronunciation lexicon adaptation (PLA) further reduces the relative WER by an average of 8.29% on a large vocabulary task of over 1500 words for six speakers with severe to moderate dysarthria. PLA also shows an average relative WER reduction of 7.11% on speaker-dependent models evaluated using 5-fold cross-validation.


international joint conference on natural language processing | 2009

Summarizing multiple spoken documents: finding evidence from untranscribed audio

Xiaodan Zhu; Gerald Penn; Frank Rudzicz

This paper presents a model for summarizing multiple untranscribed spoken documents. Without assuming the availability of transcripts, the model modifies a recently proposed unsupervised algorithm to detect re-occurring acoustic patterns in speech and uses them to estimate similarities between utterances, which are in turn used to identify salient utterances and remove redundancies. This model is of interest due to its independence from spoken language transcription, an error-prone and resource-intensive process, its ability to integrate multiple sources of information on the same topic, and its novel use of acoustic patterns that extends previous work on low-level prosodic feature detection. We compare the performance of this model with that achieved using manual and automatic transcripts, and find that this new approach is roughly equivalent to having access to ASR transcripts with word error rates in the 33--37% range without actually having to do the ASR, plus it better handles utterances with out-of-vocabulary words.


Pattern Recognition | 2015

Fast incremental LDA feature extraction

Youness Aliyari Ghassabeh; Frank Rudzicz; Hamid Abrishami Moghaddam

Linear discriminant analysis (LDA) is a traditional statistical technique that reduces dimensionality while preserving as much of the class discriminatory information as possible. The conventional form of the LDA assumes that all the data are available in advance and the LDA feature space is computed by finding the eigendecomposition of an appropriate matrix. However, there are situations where the data are presented in a sequence and the LDA features are required to be updated incrementally by observing the new incoming samples. Chatterjee and Roychowdhury proposed an algorithm for incrementally computing the LDA features followed by Moghaddam et al. who accelerated the convergence rate of these algorithms. The proposed algorithms by Moghaddam et al. are derived by applying the chain rule on an implicit cost function. Since the authors have not had access to the cost function they could not analyze the convergence of the proposed algorithms and the convergence of the proposed accelerated techniques were not guaranteed. In this paper, we briefly review the previously proposed algorithms, then we derive new algorithms to accelerate the convergence rate of the incremental LDA algorithm given by Chatterjee and Roychowdhury. The proposed algorithms are derived by optimizing the step size in each iteration using steepest descent and conjugate direction methods. We test the performance of the proposed algorithms for incremental LDA on synthetic and real data sets. The simulation results confirm that the proposed algorithms estimate the LDA features faster than the gradient descent based algorithm presented by Moghaddam et al., and the algorithm proposed by Chatterjee and Roychowdhury. HighlightsThe previous algorithms for fast incremental LDA are discussed.New algorithm for fast incremental LDA based on steepest descent is proposed.New algorithm for fast incremental LDA based on conjugate direction is proposed.The performance of the proposed algorithms is tested using real data sets.


international conference on acoustics, speech, and signal processing | 2012

Sentence recognition from articulatory movements for silent speech interfaces

Jun Wang; Ashok Samal; Jordan R. Green; Frank Rudzicz

Recent research has demonstrated the potential of using an articulation-based silent speech interface for command-and-control systems. Such an interface converts articulation to words that can then drive a text-to-speech synthesizer. In this paper, we have proposed a novel near-time algorithm to recognize whole-sentences from continuous tongue and lip movements. Our goal is to assist persons who are aphonic or have a severe motor speech impairment to produce functional speech using their tongue and lips. Our algorithm was tested using a functional sentence data set collected from ten speakers (3012 utterances). The average accuracy was 94.89% with an average latency of 3.11 seconds for each sentence prediction. The results indicate the effectiveness of our approach and its potential for building a real-time articulation-based silent speech interface for clinical applications.


conference on computers and accessibility | 2007

Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech

Frank Rudzicz

Acoustic modeling of dysarthric speech is complicated by its increased intra- and inter-speaker variability. The accuracy of speaker-dependent and speaker-adaptive models are compared for this task, with the latter prevailing across varying levels of speaker intelligibility.


canadian conference on artificial intelligence | 2011

Comparing humans and automatic speech recognition systems in recognizing dysarthric speech

Kinfe Tadesse Mengistu; Frank Rudzicz

Speech is a complex process that requires control and coordination of articulation, breathing, voicing, and prosody. Dysarthria is a manifestation of an inability to control and coordinate one or more of these aspects, which results in poorly articulated and hardly intelligible speech. Hence individuals with dysarthria are rarely understood by human listeners. In this paper, we compare and evaluate how well dysarthric speech can be recognized by an automatic speech recognition system (ASR) and naive adult human listeners. The results show that despite the encouraging performance of ASR systems, and contrary to the claims in other studies, on average human listeners perform better in recognizing single-word dysarthric speech. In particular, the mean word recognition accuracy of speaker-adapted monophone ASR systems on stimuli produced by six dysarthric speakers is 68.39% while the mean percentage correct response of 14 naive human listeners on the same speech is 79.78% as evaluated using single-word multiple-choice intelligibility test.


virtual reality software and technology | 2004

A framework for 3D visualisation and manipulation in an immersive space using an untethered bimanual gestural interface

Yves Boussemart; François Rioux; Frank Rudzicz; Michael Wozniewski; Jeremy R. Cooperstock

Immersive Environments offer users the experience of being submerged in a virtual space, effectively transcending the boundary between the real and virtual world. We present a framework for visualization and manipulation of 3D virtual environments in which users need not resort to the awkward command vocabulary of traditional keyboard-and-mouse interaction. We have adapted the transparent toolglass paradigm as a gestural interface widget for a spatially immersive environment. To serve that purpose, we have implemented a bimanual gesture interpreter to recognize and translate a users actions into commands for control of these widgets. In order to satisfy a primary design goal of keeping the user completely untethered, we use purely video-based tracking techniques.


Speech Communication | 2012

Using articulatory likelihoods in the recognition of dysarthric speech

Frank Rudzicz

Millions of individuals have congenital or acquired neuro-motor conditions that limit control of their muscles, including those that manipulate the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand both by human listeners and by traditional automatic speech recognition (ASR), which in some cases can be rendered completely unusable. In this work we first introduce a new method for acoustic-to-articulatory inversion which estimates positions of the vocal tract given acoustics using a nonlinear Hammerstein system. This is accomplished based on the theory of task-dynamics using the TORGO database of dysarthric articulation. Our approach uses adaptive kernel canonical correlation analysis and is found to be significantly more accurate than mixture density networks, at or above the 95% level of confidence for most vocal tract variables. Next, we introduce a new method for ASR in which acoustic-based hypotheses are re-evaluated according to the likelihoods of their articulatory realizations in task-dynamics. This approach incorporates high-level, long-term aspects of speech production and is found to be significantly more accurate than hidden Markov models, dynamic Bayesian networks, and switching Kalman filters.

Collaboration


Dive into the Frank Rudzicz's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hisham Alshaer

Toronto Rehabilitation Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge