Colleen Richey | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Colleen Richey is active.

Explore More

Publication

Featured researches published by Colleen Richey.

international conference of the ieee engineering in medicine and biology society | 2008

Learning diagnostic models using speech and language measures

Bart Peintner; William Jarrold; Dimitra Vergyri; Colleen Richey; Maria Gorno Tempini; Jennifer M. Ogar

We describe results that show the effectiveness of machine learning in the automatic diagnosis of certain neurodegenerative diseases, several of which alter speech and language production. We analyzed audio from 9 control subjects and 30 patients diagnosed with one of three subtypes of Frontotemporal Lobar Degeneration. From this data, we extracted features of the audio signal and the words the patient used, which were obtained using our automated transcription technologies. We then automatically learned models that predict the diagnosis of the patient using these features. Our results show that learned models over these features predict diagnosis with accuracy significantly better than random. Future studies using higher quality recordings will likely improve these results.

international conference on acoustics, speech, and signal processing | 2011

Automatic identification of speaker role and agreement/disagreement in broadcast conversation

Wen Wang; Sibel Yaman; Kristin Precoda; Colleen Richey

We present supervised approaches for detecting speaker roles and agreement/disagreement between speakers in broadcast conversation shows in three languages: English, Arabic, and Mandarin. We develop annotation approaches for a variety of linguistic phenomena. Various lexical, structural, and social network analysis based features are explored, and feature importance is analyzed across the three languages. We also compare the performance when using features extracted from automatically generated annotations against that when using human annotations. The algorithms achieve speaker role labeling accuracy of more than 86% for all three languages. For agreement and disagreement detection, the algorithms achieve precision of 63% to 92% and 55% to 85%, respectively, across the three languages.

Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge | 2014

The SRI AVEC-2014 Evaluation System

Vikramjit Mitra; Elizabeth Shriberg; Mitchell McLaren; Andreas Kathol; Colleen Richey; Dimitra Vergyri; Martin Graciarena

Though depression is a common mental health problem with significant impact on human society, it often goes undetected. We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale. These features, many of which are novel for this task, include (1) estimated articulatory trajectories during speech production, (2) acoustic characteristics, (3) acoustic-phonetic characteristics and (4) prosodic features. Features are modeled using a variety of approaches, including support vector regression, a Gaussian backend and decision trees. We report results on the AVEC-2014 depression dataset and find that individual systems range from 9.18 to 11.87 in root mean squared error (RMSE), and from 7.68 to 9.99 in mean absolute error (MAE). Initial fusion brings further improvement; fusion and feature selection work is still in progress.

international conference on acoustics, speech, and signal processing | 2009

Recent advances in SRI'S IraqComm™ Iraqi Arabic-English speech-to-speech translation system

Murat Akbacak; Horacio Franco; Michael W. Frandsen; Saša Hasan; Huda Jameel; Andreas Kathol; Shahram Khadivi; Xin Lei; Arindam Mandal; Saab Mansour; Kristin Precoda; Colleen Richey; Dimitra Vergyri; Wen Wang; Mei Yang; Jing Zheng

We summarize recent progress on SRIs IraqComm™ Iraqi Arabic-English two-way speech-to-speech translation system. In the past year we made substantial developments in our speech recognition and machine translation technology, leading to significant improvements in both accuracy and speed of the IraqComm system. On the 2008 NIST-evaluation dataset our twoway speech-to-text (S2T) system achieved 6% to 8% absolute improvement in BLEU in both directions, compared to our previous year system [1].

international conference on acoustics, speech, and signal processing | 2006

Speech Recognition Engineering Issues in Speech to Speech Translation System Design for Low Resource Languages and Domains

Shrikanth Narayanan; Panayiotis G. Georgiou; Abhinav Sethy; Dagen Wang; Murtaza Bulut; Shiva Sundaram; Emil Ettelaie; Sankaranarayanan Ananthakrishnan; Horacio Franco; Kristin Precoda; Dimitra Vergyri; Jing Zheng; Wen Wang; Ramana Rao Gadde; Martin Graciarena; Victor Abrash; Michael W. Frandsen; Colleen Richey

Engineering automatic speech recognition (ASR) for speech to speech (S2S) translation systems, especially targeting languages and domains that do not have readily available spoken language resources, is immensely challenging due to a number of reasons. In addition to contending with the conventional data-hungry speech acoustic and language modeling needs, these designs have to accommodate varying requirements imposed by the domain needs and characteristics, target device and usage modality (such as phrase-based, or spontaneous free form interactions, with or without visual feedback) and huge spoken language variability arising due to socio-linguistic and cultural differences of the users. This paper, using case studies of creating speech translation systems between English and languages such as Pashto and Farsi, describes some of the practical issues and the solutions that were developed for multilingual ASR development. These include novel acoustic and language modeling strategies such as language adaptive recognition, active-learning based language modeling, class-based language models that can better exploit resource poor language data, efficient search strategies, including N-best and confidence generation to aid multiple hypotheses translation, use of dialog information and clever interface choices to facilitate ASR, and audio interface design for meeting both usability and robustness requirements

north american chapter of the association for computational linguistics | 2004

Limited-domain speech-to-speech translation between English and Pashto

Kristin Precoda; Horacio Franco; Ascander Dost; Michael W. Frandsen; John Fry; Andreas Kathol; Colleen Richey; Susanne Z. Riehemann; Dimitra Vergyri; Jing Zheng; Christopher Culy

This paper describes a prototype system for near-real-time spontaneous, bidirectional translation between spoken English and Pashto, a language presenting many technological challenges because of its lack of resources, including both data and expert knowledge. Development of the prototype is ongoing, and we propose to demonstrate a fully functional version which shows the basic capabilities, though not yet their final depth and breadth.

Speech Communication | 2015

Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems

Luciana Ferrer; Harry Bratt; Colleen Richey; Horacio Franco; Victor Abrash; Kristin Precoda

A system for classification of lexical stress for language learners is proposed.It successfully combines spectral and prosodic characteristics using GMMs.Models are learned on native speech, which does not require manual labeling.A method for controlling the operating point of the system is proposed.We achieve a 20% error rate on Japanese children speaking English. We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak? computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, primary or secondary) for each syllable in a word. Features are computed on the vowels and include normalized energy, pitch, spectral tilt, and duration measurements, as well as log-posterior probabilities obtained from the frame-level mel-frequency cepstral coefficients (MFCCs). Gaussian mixture models (GMMs) are used to represent the distribution of these features for each stress class. The system is trained on utterances by L1-English children and tested on English speech from L1-English children and L1-Japanese children with variable levels of English proficiency. Since it is trained on data from L1-English speakers, the system can be used on English utterances spoken by speakers of any L1 without retraining. Furthermore, automatically determined stress patterns are used as the intended target; therefore, hand-labeling of training data is not required. This allows us to use a large amount of data for training the system. Our algorithm results in an error rate of approximately 11% on English utterances from L1-English speakers and 20% on English utterances from L1-Japanese speakers. We show that all features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features. For English utterances from L1-Japanese speakers, energy, MFCC log-posterior probabilities and duration are the most important features.

conference of the international speech communication association | 2016

Privacy-Preserving Speech Analytics for Automatic Assessment of Student Collaboration.

Nikoletta Bassiou; Andreas Tsiartas; Jennifer Smith; Harry Bratt; Colleen Richey; Elizabeth Shriberg; Cynthia D'Angelo; Nonye Alozie

This work investigates whether nonlexical information from speech can automatically predict the quality of smallgroup collaborations. Audio was collected from students as they collaborated in groups of three to solve math problems. Experts in education annotated 30-second time windows by hand for collaboration quality. Speech activity features (computed at the group level) and spectral, temporal and prosodic features (extracted at the speaker level) were explored. After the latter were transformed from the speaker level to the group level, features were fused. Results using support vector machines and random forests show that feature fusion yields best classification performance. The corresponding unweighted average F1 measure on a 4-class prediction task ranges between 40% and 50%, significantly higher than chance (12%). Speech activity features alone are strong predictors of collaboration quality, achieving an F1 measure between 35% and 43%. Speaker-based acoustic features alone achieve lower classification performance, but offer value in fusion. These findings illustrate that the approach under study offers promise for future monitoring of group dynamics, and should be attractive for many collaboration activity settings in which privacy is desired.

international conference on acoustics, speech, and signal processing | 2011

Acoustic data sharing for Afghan and Persian languages

Arindam Mandal; Dimitra Vergyri; Murat Akbacak; Colleen Richey; Andreas Kathol

In this work, we compare several known approaches for multilingual acoustic modeling for three languages, Dari, Farsi and Pashto, which are of recent geo-political interest. We demonstrate that we can train a single multilingual acoustic model for these languages and achieve recognition accuracy close to that of monolingual (or language-dependent) models. When only a small amount of training data is available for each of these languages, the multilingual model may even outperform the monolingual ones. We also explore adapting the multilingual model to target language data, which are able to achieve improved automatic speech recognition (ASR) performance compared to the monolingual models for both large and small amounts of training data by 3% relative word error rate (WER).

spoken language technology workshop | 2016

Toward human-assisted lexical unit discovery without text resources

Chris Bartels; Wen Wang; Vikramjit Mitra; Colleen Richey; Andreas Kathol; Dimitra Vergyri; Harry Bratt; Chiachi Hung

This work addresses lexical unit discovery for languages without (usable) written resources. Previous work has addressed this problem using entirely unsupervised methodologies. Our approach in contrast investigates the use of linguistic and speaker knowledge which are often available even if text resources are not. We create a framework that benefits from such resources, not assuming orthographic representations and avoiding generation of word-level transcriptions. We adapt a universal phone recognizer to the target language and use it to convert audio into a searchable phone string for lexical unit discovery via fuzzy sub-string matching. Linguistic knowledge is used to constrain phone recognition output and to constrain lexical unit discovery on the phone recognizer output.

Explore More