Joseph Tepperman
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joseph Tepperman.
international conference on acoustics, speech, and signal processing | 2005
Joseph Tepperman; Shrikanth Narayanan
A robust language learning system, designed to help students practice a foreign language along with a machine tutor, must provide meaningful feedback to users by isolating and localizing their pronunciation errors. This paper presents a new technique for automatic syllable stress detection that is tailored for language-learning purposes. Our method, which uses basic prosodic features and others related to the fundamental frequency slope and RMS energy range, is at least as accurate as an expert human listener, but requires no human supervision other than a pre-defined dictionary of expected lexical stress patterns for all words in the systems vocabulary. Optimal feature choices exhibited an 87-89% accuracy compared with human-tagged stress labels, exceeding the inter-human agreement commonly held to be about 80%.
multimedia signal processing | 2007
Abeer Alwan; Yijian Bai; Matthew P. Black; Larry Casey; Matteo Gerosa; Markus Iseli; Barbara Jones; Abe Kazemzadeh; Sungbok Lee; Shrikanth Narayanan; Patti Price; Joseph Tepperman; Shizhen Wang
This paper describes the design and realization of an automatic system for assessing and evaluating the language and literacy skills of young children. This system was developed in the context of the TBALL (technology based assessment of language and literacy) project and aims at automatically assessing the English literacy skills of both native talkers of American English and Mexican-American children in grades K-2. The automatic assessments were carried out employing appropriate speech recognition and understanding techniques. In this paper, we describe the system focusing on the role of the multiple sources of information at our disposal. We present the content of the assessment system, discuss some issues in creating a child-friendly interface, and how to provide a suitable feedback to the teachers. In addition, we will discuss the different assessment modules and the different algorithms used for speech analysis.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Joseph Tepperman; Shrikanth Narayanan
Motivated by potential applications in second-language pedagogy, we present a novel approach to using articulatory information to improve automatic detection of typical phone-level errors made by nonnative speakers of English-a difficult task that involves discrimination between close pronunciations. We describe a reformulation of the hidden-articulator Markov model (HAMM) framework that is appropriate for the pronunciation evaluation domain. Model training requires no direct articulatory measurement, but rather involves a constrained and interpolated mapping from phone-level transcriptions to a set of physically and numerically meaningful articulatory representations. Here, we define two new methods of deriving articulatory-based features for classification: one, by concatenating articulatory recognition results over eight streams representative of the vocal tracts constituents; the other, by calculating multidimensional articulatory confidence scores within these representations based on general linguistic knowledge of articulatory variants. After adding these articulatory features to traditional phone-level confidence scores, our results demonstrate absolute reductions in combined error rates for verification of segment-level pronunciations produced by nonnative speakers in the ISLE corpus by as much as 16%-17% for some target segments, and a 3%-4% absolute improvement overall.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Joseph Tepperman; Sungbok Lee; Shrikanth Narayanan; Abeer Alwan
This paper presents a novel student model intended to automate word-list-based reading assessments in a classroom setting, specifically for a student population that includes both native and nonnative speakers of English. As a Bayesian Network, the model is meant to conceive of student reading skills as a conscientious teacher would, incorporating cues based on expert knowledge of pronunciation variants and their cognitive or phonological sources, as well as prior knowledge of the student and the test itself. Alongside a hypothesized structure of conditional dependencies, we also propose an automatic method for refining the Bayes Net to eliminate unnecessary arcs. Reading assessment baselines that use strict pronunciation scoring alone (without other prior knowledge) achieve 0.7 correlation of their automatic scores with human assessments on the TBALL dataset. Our proposed structure significantly outperforms this baseline, and a simpler data-driven structure achieves 0.87 correlation through the use of novel features, surpassing the lower range of inter-annotator agreement. Scores estimated by this new model are also shown to exhibit the same biases along demographic lines as human listeners. Though used here for reading assessment, this model paradigm could be used in other pedagogical applications like foreign language instruction, or for inferring abstract cognitive states like categorical emotions.
Speech Communication | 2009
Patti Price; Joseph Tepperman; Markus Iseli; Thao Duong; Matthew P. Black; Shizhen Wang; Christy Boscardin; P. David Pearson; Shrikanth Narayanan; Abeer Alwan
To automate assessments of beginning readers, especially those still learning English, we have investigated the types of knowledge sources that teachers use and have tried to incorporate them into an automated system. We describe a set of speech recognition and verification experiments and compare teacher scores with automatic scores in order to decide when a novel pronunciation is best viewed as a reading error or as dialect variation. Since no one classroom teacher is expected to be familiar with as many dialect systems as might occur in an urban classroom, making progress in automated assessments in this area can improve the consistency and fairness of reading assessment. We found that automatic methods performed best when the acoustic models were trained on both native and non-native speech, and argue that this training condition is necessary for automatic reading assessment since a childs reading ability is not directly observable in one utterance. We also found assessment of emerging reading skills in young children to be an area ripe for more research!
international conference on acoustics, speech, and signal processing | 2009
Matthew P. Black; Joseph Tepperman; Abe Kazemzadeh; Sungbok Lee; Shrikanth Narayanan
Children need to master reading letter-names and letter-sounds before reading phrases and sentences. Pronunciation assessment of letter-names and letter-sounds read aloud is an important component of preliterate childrens education, and automating this process can have several advantages. The goal of this work was to automatically verify letter-names spoken by kindergarteners and first graders in realistic classroom noise conditions. We applied the same techniques developed in our previous work on automatic letter-sound verification by comparing and optimizing different acoustic models, dictionaries, and decoding grammars. Our final system was unbiased with respect to the childs grade, age, and native language and achieved 93.1% agreement (0.813 kappa agreement) with human evaluators, who agreed among themselves 95.4% of the time (0.891 kappa).
ieee automatic speech recognition and understanding workshop | 2005
Joseph Tepperman; Shrikanth Narayanan
The design of a robust language-learning system, intended to help students practice a foreign language along with a machine tutor, must provide for localization of common pronunciation errors. This paper presents a new technique for unsupervised detection of phone-level mispronunciations, created with language-learning applications in mind. Our method uses multiple hidden-articulator Markov models to asynchronously classify acoustic events in various articulatory domains. It requires no human input besides a pronunciation dictionary for all words in the end systems vocabulary, and has been shown to perform as well as a human tutor would, given the same task. For the majority of systematic mispronunciations investigated in this study, precision in detecting the presence of an error exceeded the 70% inter-annotator agreement reported by our test corpus
ACM Transactions on Speech and Language Processing | 2011
Matthew P. Black; Abe Kazemzadeh; Joseph Tepperman; Shrikanth Narayanan
Automatic literacy assessment is an area of research that has shown significant progress in recent years. Technology can be used to automatically administer reading tasks and analyze and interpret childrens reading skills. It has the potential to transform the classroom dynamic by providing useful information to teachers in a repeatable, consistent, and affordable way. While most previous research has focused on automatically assessing children reading words and sentences, assessments of childrens earlier foundational skills is needed. We address this problem in this research by automatically verifying preliterate childrens pronunciations of English letter-names and the sounds each letter represents (“letter-sounds”). The children analyzed in this study were from a diverse bilingual background and were recorded in actual kindergarten to second grade classrooms. We first manually verified (accept/reject) the letter-name and letter-sound utterances, which serve as the ground-truth in this study. Next, we investigated four automatic verification methods that were based on automatic speech recognition techniques. We attained percent agreement with human evaluations of 90% and 85% for the letter-name and letter-sound tasks, respectively. Humans agree between themselves an average of 95% of the time for both tasks. We discuss the various confounding factors for this assessment task, such as background noise and the presence of disfluencies, that impact automatic verification performance.
Journal of the Acoustical Society of America | 2009
Joseph Tepperman; Erik Bresch; Yoon-Chul Kim; Louis Goldstein; Dani Byrd; Krishna S. Nayak; Shrikanth Narayanan
We present the first study of nonnative English speech using real‐time MRI analysis. The purpose of this study is to investigate the articulatory nature of “phonological transfer”—a speaker’s systematic use of sounds from their native language (L1) when they are speaking a foreign language (L2). When a non‐native speaker is prompted to produce a phoneme that does not exist in their L1, we hypothesize that their articulation of that phoneme will be colored by that of the “closest” phoneme in their L1’s set, possibly to the point of substitution. With data from three native German speakers and three reference native English speakers, we compare articulation of read phoneme targets well documented as “difficult” for German speakers of English (/w/ and /dh/) with their most common substitutions (/v/ and /d/, respectively). Tracking of vocal tract organs in the MRI images reveals that the acoustic variability in a foreign accent can indeed be ascribed to the subtle articulatory influence of these close substit...
Journal of the Acoustical Society of America | 2008
Shrikanth Narayanan; Abe Kazemzadeh; Matthew P. Black; Joseph Tepperman; Sungbok Lee; Abeer Alwan
Evaluations of letter naming and letter sounding are commonly used to measure a young child’s growing reading ability, since performance in them is well-correlated with future reading development. Assessing a child’s oral reading skills requires teachers, as well as technologies that attempt to automate such assessment, to form an item-level accept/reject decision based on speech cues and prior knowledge of the child’s literacy level and linguistic background. With data collected from 171 K-2 children, both learners and native speakers of American English, we designed and evaluated an automated letter naming assessment method using a simple word-loop HMM decoding for the word-level letter names. The automated accept/reject evaluation performance, 81.9%, approached the agreement of human raters, 83.2% (0.62 kappa). However, the task where children must produce the sound that the letter represents was more dicult: English orthography allows one-to many letter-to-sound mapping, teachers showed less agreement in their assessment (80.9%, 0.55 kappa), and the brief durations of some of the letter sounds made it dicult to distinguish them from each other and from background noises. Phone-level HMM based evaluation accuracy was 58.2%. Preprocessing the recordings into speech, silence, and noise improved these results, especially for plosive sounds. [Supported by NSF]