Is this you? Create Your Porfile

James B. Polikoff

Alfred I. duPont Hospital for Children

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James B. Polikoff is active.

Explore More

Publication

Featured researches published by James B. Polikoff.

meeting of the association for computational linguistics | 2008

ModelTalker Voice Recorder---An Interface System for Recording a Corpus of Speech for Synthesis

Debra Yarrington; John Gray; Christopher A. Pennington; H. Timothy Bunnell; Allegra Cornaglia; Jason Lilley; Kyoko Nagao; James B. Polikoff

We will demonstrate the ModelTalker Voice Recorder (MT Voice Recorder) -- an interface system that lets individuals record and bank a speech database for the creation of a synthetic voice. The system guides users through an automatic calibration process that sets pitch, amplitude, and silence. The system then prompts users with both visual (text-based) and auditory prompts. Each recording is screened for pitch, amplitude and pronunciation and users are given immediate feedback on the acceptability of each recording. Users can then rerecord an unacceptable utterance. Recordings are automatically labeled and saved and a speech database is created from these recordings. The systems intention is to make the process of recording a corpus of utterances relatively easy for those inexperienced in linguistic analysis. Ultimately, the recorded corpus and the resulting speech database is used for concatenative synthetic speech, thus allowing individuals at home or in clinics to create a synthetic voice in their own voice. The interface may prove useful for other purposes as well. The system facilitates the recording and labeling of large corpora of speech, making it useful for speech and linguistic research, and it provides immediate feedback on pronunciation, thus making it useful as a clinical learning tool.

Journal of the Acoustical Society of America | 2000

Using Markov models to assess articulation errors in young children

H. Timothy Bunnell; Debra Yarrington; James B. Polikoff

Digital recordings of children producing the names ‘‘Rhonda’’ and ‘‘Wanda,’’ and/or ‘‘Toto’’ and ‘‘Coco’’ were made using the microphone input to a Toshiba laptop computer (16‐bit samples, 22 050‐kHz sampling rate) with an AKG C410/B head‐mounted condenser microphone. These names were associated with animated characters in a mock video game running on the laptop under the control of a Speech Language Pathologist. The children, ranging in age from four to six years, were undergoing speech therapy at the Alfred I. duPont Hospital for Children for one or both of two common articulation errors: /w/ substituted for /r/; and/or /t/ substituted for /k/. The initial segment in each recorded utterance was classified by laboratory staff as either r/w or t/k, and assigned a goodness rating. Discrete Hidden Markov phoneme Models (DHMMs) trained using data recorded from normally articulating children were then used to classify the same utterances and results of the automatic classification were compared to the huma...

Journal of the Acoustical Society of America | 2006

Database size and naturalness in concatenative speech synthesis

H. Timothy Bunnell; James T. Mantell; James B. Polikoff

Unit concatenation TTS systems seek to maximize perceived naturalness by minimizing the amount of signal processing applied to the recorded speech on which they are based. To generate distinct suprasegmentals for a given segmental sequence (e.g., to convey variation in focus or emotion), it is necessary to record and store multiple instances of the same segments that vary in fundamental frequency and voice quality. At the expense of naturalness, concatenative systems can store a minimal segmental inventory and synthesize suprasegmental factors by manipulating f0 and voice quality via signal processing. Classic diphone synthesis (where only a single instance of each diphone sequence is stored) represents the limiting case of this strategy. The present study explores aspects of the trade‐off between perceived naturalness and segmental inventory size using the ModelTalker TTS system. Twenty‐five speakers each recorded about 1650 utterances. From these, databases were constructed that limited the maximum numb...

Journal of the Acoustical Society of America | 2004

Spectral moments versus Bark cepstrum classification of children’s voiceless stops

James B. Polikoff; Jenna Hammond; Jane McNicholas; H. Timothy Bunnell

Spectral moments have been shown to be effective in deriving acoustic features for classifying voiceless stop release bursts [K. Forrest, G. Weismer, P. Milenkovic, and R. N. Dougall, J. Acoust. Soc. Am. 84, 115–123 (1988)]. In this study, we compared the classification of stops /p/, /t/, and /k/ based on spectral moments with classification based on an equal number of Bark cepstrum coefficients. The speech tokens were 446 instances each of utterance‐initial /p/, /t/, and /k/ sampled from utterances produced by 208 children 6 to 8 years old. Linear discriminant analysis (LDA) was used to classify the three stops based on four analysis frames from the initial 40 ms of each token. The best classification based on spectral moments used all four spectral moment features and all four time intervals and yielded 75.6% correct classification. The best classification based on Bark cepstrum yielded 83.4% correct also using four coefficients and four time frames. Differences between these results and previous classi...

Journal of the Acoustical Society of America | 2004

Acoustic characterization of developmental speech disorders

H. Timothy Bunnell; James B. Polikoff; Jane McNicholas; Rhonda Walter; Matthew Winn

A novel approach to classifying children with developmental speech delays (DSD) involving /r/ was developed. The approach first derives an acoustic classification of /r/ tokens based on their forced Viterbi alignment to a five‐state hidden Markov model (HMM) of normally articulated /r/. Children with DSD are then classified in terms of the proportion of their /r/ productions that fall into each broad acoustic class. This approach was evaluated using 953 examples of /r/ as produced by 18 DSD children and an approximately equal number of /r/ tokens produced by a much larger number of normally articulating children. The acoustic classification identified three broad categories of /r/ that differed substantially in how they aligned to the normal speech /r/ HMM. Additionally, these categories tended to partition tokens uttered by DSD children from those uttered by normally articulating children. Similarities among the DSD children and average normal child measured in terms of the proportion of their /r/ produc...

Journal of the Acoustical Society of America | 1997

Modeling perceptual confusions to dysarthric speech

H. Timothy Bunnell; Xavier Menéndez-Pidal; James B. Polikoff

An HMM labeler has been extended to detect poor correspondence between phonetic labels and underlying acoustic data. This paper will present work extending the labeler to model perceptual confusions of human listeners from a forced‐choice word identification experiment which used dysarthric speech. The speech and perception data are from the Nemours Dysarthric Speech database [Menendez et al., Proceedings of ICSLP 96, SaP2P1.19 (1996)]. The perceptual data comprise distributions of listener identification responses over sets of from four to six words (the intended word plus several phonetically similar foils). In all, 37 words were produced twice by each of 10 dysarthric talkers providing a total dataset of 740 items. Each of these items was identified at least 12 times by five naive listeners for a total of at least 60 responses per item. Half of this data set will be used to adapt parameters of the HMM labeler to reproduce the distribution of human responses to the speech. The remaining half of the data...

Journal of the Acoustical Society of America | 1994

Acoustic and perceptual properties of intervocalic consonants produced by talkers with cerebral palsy

H. Timothy Bunnell; Steven R. Hoskins; James B. Polikoff

Speech produced by ten talkers with cerebral palsy was studied. Talkers recorded semantically anomalous sentences of the form ‘‘The N1 is Ving the N2’’ where N1, N2, and V are single‐syllable nouns and verbs (in the bisyllabic infinitive form) selected from lists of noun and verb target words. Listeners heard sentences over headphones and identified the target words in a closed response set task in which each target word had an associated set of four to six minimally different response alternatives (e.g., {boat, moat, vote, oat}). For the present analyses, only verb targets that differed in the final consonant of the root word (e.g., waiting, wading, waning, waving, weighing) have been examined. Perceptual data were analyzed for percentage of incorrect responses and proportions of various types of errors (e.g., voicing/place/manner) per token. Acoustic analyses included measures of vowel and consonant durations, extent of VC and CV formant transitions, categorical measures of a variety of acoustic feature...

Journal of the Acoustical Society of America | 1991

Effects of emphatic stress and speaking mode on articulatory organization

H. Timothy Bunnell; James B. Polikoff

An adult male talker recorded multiple repetitions of short nonsense sentences. Each sentence was of the form “A huCl ate a C2uffle,” where C1 and C2 were from the set {/b/,/d/,/g/}. Sentences containing all nine combinations of C1‐C2 pairings were recorded in several speaking modes: CLEAR, CONVERSATIONAL, STRESS1, and STRESS2. The latter two conditions entailed production of the sentences in conversational mode, but with emphatic stress on the syllable containing either the first (STRESS1) or the second (STRESS2) variable consonant. These materials were recorded at the University of Wisconsin Microbeam facility and include both acoustic data and tracings of the trajectories of flesh points on the tongue, lips, and jaw. Analyses of the acoustic and articulatory data suggest that local effects of stress on consonant articulation are similar to those of clear speech. The largest acoustic and articulatory differences were observed for consonants in syllable final position. The various significant articulatory effects of stress and speaking mode will be compared to perceptual measures of consonant intelligibility.

Journal of the Acoustical Society of America | 1990

Observations on disordered articulations

H. Timothy Bunnell; James B. Polikoff

An adequate characterization of dysarthric speech requires both a description of the nature of the motor impairment and a description of the articulatory compensation strategies talkers employ in attempting to minimize the consequences of their motor impairment. As a preliminary step in developing such a characterization of dysarthric speech, data from a single dysarthric talker and similar data from two normal talkers have been examined. Acoustic and articulatory recordings were obtained from the University of Wisconsin x‐ray microbeam facility. Pellets for x‐ray tracking were attached to the lips, mandible, tongue tip, tongue blade, and tongue dorsum. Due to technical difficulties, only pellets on the lips and tongue dorsum of the dysarthric talker were consistently tracked. In several speech production tasks, talkers repeated the syllables /ka/ and /pa/ with both constant and alternating stress; sustained the vowel /i/ and abruptly (in response to a click) switched to the vowel /a/; and repeated extend...

conference of the international speech communication association | 2000