Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where George Papcun is active.

Publication


Featured researches published by George Papcun.


Journal of the Acoustical Society of America | 1992

Inferring articulation and recognizing gestures from acoustics with a neural network trained on x‐ray microbeam data

George Papcun; Judith Hochberg; Timothy R. Thomas; François Laroche; Jeff Zacks; Simon Levy

This paper describes a method for inferring articulatory parameters from acoustics with a neural network trained on paired acoustic and articulatory data. An x-ray microbeam recorded the vertical movements of the lower lip, tongue tip, and tongue dorsum of three speakers saying the English stop consonants in repeated Ce syllables. A neural network was then trained to map from simultaneously recorded acoustic data to the articulatory data. To evaluate learning, acoustics from the training set were passed through the neural network. To evaluate generalization, acoustics from speakers or consonants excluded from the training set were passed through the network. The articulatory trajectories thus inferred were a good fit to the actual movements in both the learning and generalization conditions, as judged by root-mean-square error and correlation. Inferred trajectories were also matched to templates of lower lip, tongue tip, and tongue dorsum release gestures extracted from the original data. This technique correctly recognized from 94.4% to 98.9% of all gestures in the learning and cross-speaker generalization conditions, and 75% of gestures underlying consonants excluded from the training set. In addition, greater regularity was observed for movements of articulators that were critical in the formation of each consonant.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 1991

A default hierarchy for pronouncing English

Judith Hochberg; Susan M. Mniszewski; Teri Calleja; George Papcun

The authors study the principles governing the power and efficiency of the default hierarchy, a system of knowledge acquisition and representation. The default hierarchy trains automatically, yet yields a set of rules which can be easily assessed and analyzed. Rules are organized in a hierarchical structure containing general (default) and specific rules. In training the hierarchy, general rules are learned before specific rules. In using the hierarchy, specific rules are accessed first, with default rules used when no specific rules apply. The main results concern the properties of the default hierarchy architecture, as revealed by its application to English pronunciation. Evaluating the hierarchy as a pronouncer of English, the authors find that its rules capture several key features of English spelling. The default hierarchy pronounces English better than the neural network NETtalk, and almost as well as expert-devised systems. >


Journal of the Acoustical Society of America | 1985

Voice discrimination by two listener populations

Jody Kreiman; George Papcun

The ability of two groups of listeners (over and under 45 years of age) to discriminate among the voices often young male Southern Californians was examined with a paired‐comparison task. Listeners heard both orders of all possible pairs of voices, plus 30 pairs where voices were the same. They indicated for each pair whether the voices were the same or different and how sure they were of their response. Discrimination scores across groups ranged from nearly perfect (Az = 0.97) to near chance (Az = 0.69). Overall, the older group of listeners performed significantly less well than did the younger group: They made fewer correct identifications (24.65 vs 26.58), more false identifications (25.45 vs 17.75), and had a lower group discrimination score (Az = 0.85 vs 0.92). (All comparisons were significant at p < 0.05.) Two explanations are suggested. Aging may affect the ability to discriminate among voices as it does the ability to discriminate among phonemes [P.J. Price and HJ. Simon, L Acoust. Soc. Am. 76, ...


Journal of the Acoustical Society of America | 1988

What do mimics do when they imitate a voice

George Papcun

Imitations by both professional and amateur mimics were studied to determine what similarities are achieved between the imitated voice and the imitation thereof. A wide variety of characteristics was approximated, including the following: mean F0, F1, and F2; frequency contours of F0, F1, and F2; degree of nasalization (but not frequency of nasal formants); speech rate and dynamics, including timing, attack, and release characteristics. Contours of F0, F1, and F2 were often matched accurately, even when their absolute frequencies differed considerably from those of the original. Specific images of words and phrases were used, as well as general phonetic characteristics. Imitators tended to concentrate on imitating unusual characteristics of a voice, rather than attempting to imitate all characteristics equally. This observation may be formalized as a model according to which the importance of a parameter is nonlinearly related to the extent to which it diverges from its mean population value. Professional...


Proceedings of the Johns Hopkins National Search for Computing Applications to Assist Persons with Disabilities | 1992

An animated display of tongue, lip and jaw movements during speech: A proper basis for speech aids to the handicapped and other speech technologies

Judith Hochberg; F. Laroche; S. Levy; George Papcun; Timothy R. Thomas

The authors have developed a method for inferring articulatory parameters from acoustics. For this method, an X-ray microbeam records the movements of the lower lip, tongue tip and tongue dorsum during normal speech. A neural network is then trained to map from concurrently recorded acoustic data to the articulatory data. The device has applications in speech therapy as a lip-reading aid, and as a basis for other speech technologies including speech and speaker recognition and low data-rate speech transmission.<<ETX>>


Journal of the Acoustical Society of America | 1995

Three acoustically predictable factors underlying vowel tongue shapes

David A. Nix; George Papcun

To obtain a low‐dimensional, speaker‐independent parameterization of vowel tongue shapes, the three‐mode factor analysis procedure PARAFAC [Harshman et al., 693–707 (1977)] was applied to x‐ray microbeam tongue measurements of ten English vowels spoken by two male and two female subjects in seven different /CVC/ contexts. PARAFAC reliably extracts three speaker‐independent, nonorthogonal factors. The resulting speaker‐independent factor coefficients cluster by vowel in three‐dimensional articulatory space. In two‐dimensional projections, they qualitatively reflect the traditional vowel quality chart. A multi‐layer perception (neural network) independently corroborates this solution: these tongue shape coefficients are significantly more predictable from the corresponding acoustics than coefficients from either the nonorthogonal two‐factor solution or the orthogonally constrained three‐factor solution. These three factors also correspond qualitatively to the three nonorthogonal factors extracted from Icela...


Journal of the Acoustical Society of America | 1990

Learning the association between speech acoustics and articulatory trajectories with a neural network

Timothy R. Thomas; George Papcun; Kennan Shelton

Because the relationship between the motion of the speech articulators and the acoustic signal produced is nonlinear and of great complexity, it has previously proved impossible to accurately infer the articulatory motions that produced natural continuous speech. However, a carefully tuned back‐propagation neural network has the capability of learning such complex relationships. The present results show that such a network can learn the map between a female speakers continuous speech and the x‐ray microbeam record of the associated articulator movements. The discovered map can then be used to successfully infer the movements of a male speaker saying the same thing (r=0.82 between inferred and actual tongue tip vertical position). The Bark scaled spectrum of counting at a normal rate from 1 to 10 was used as input, and separate nets learned the position of the tongue tip, body, dorsum, and lower lip. Alternative input normalization schemes, network configurations, and training procedures were evaluated. [...


Journal of the Acoustical Society of America | 1990

Speech recognition based on inferred articulatory movements

George Papcun; Timothy R. Thomas

A neural network was trained to learn the associations between speech acoustics, suitably transformed and normalized, and associated articulatory movements, which were recorded at the University of Wisconsin x‐ray microbeam facility. Training was done on the numbers “one” through “ten” spoken by a female speaker; articulatory templates were selected from her speech. Speech from a male speaker was transformed and normalized and passed through the trained neural network. Articulatory movements inferred from his speech by the neural network were compared, by Euclidian distance, with the templates from the speech of the female speaker. To assign a single point of match for each template, an extremum finding algorithm was passed over the function that relates temporal position to degree of match. A threshold, defined as a distance of more than two standard deviations from the mean of the distance measure taken at each step in the speech sample, was used to successfully recognize the words spoken. Enhanced disc...


Journal of the Acoustical Society of America | 1988

Voice “features” in long‐term memory

Jody Kreiman; George Papcun

This study examines the perceptual parameters used to recognize an unfamiliar voice (heard only once before) after a one‐week delay, as compared to the parameters used to discriminate among the same voices in a short‐term memory (paired comparison) task. Ten groups of ten listeners each heard a short sample of a single voice; after seven days, all listeners heard the full ten voice set and had to determine if their target voice was present or not. Confusion data and similarity ratings were gathered and combined to form full matrices that were analyzed using multidimensional scaling. Differences in the perceptual parameters that underlie confusions and perceived similarities in the short‐ versus the long‐term memory tasks will be discussed, as will changes in the relative importance of different parameters with time. The implications of these findings for models of voice perception and speaker recognition will also be addressed.


Journal of the Acoustical Society of America | 1988

Enhanced speaker identification through the use of cepstral coefficients and redundant time frame elimination

Timothy R. Thomas; George Papcun; Stan Willie; Kalyan Ganesan

A text‐independent speaker identification system was developed on two male and two female speakers with similar accents. Each read a phonetically balanced list of ten sentences. The system was trained repeatedly on a rotated set of nine sentences and tested on the remaining one. Speech during successive 16‐ms windowed time slices was described by 14 cepstral coefficients. Unit direction vectors were used to characterize each sentence from each speaker. A nearest‐centroid, nearest‐neighbor, or improved perceptron neural net training procedure was used to define decision regions. When the data were preprocessed so as to remove time slices that were similar in all speakers, discriminability was enhanced and error‐less identification was obtained. The success of this system appears to result primarily from the ability of the cepstral coefficients to capture the speaker‐dependent information in the higher formants and from the accentuation of this information by the preprocessor.

Collaboration


Dive into the George Papcun's collaboration.

Top Co-Authors

Avatar

Jody Kreiman

University of California

View shared research outputs
Top Co-Authors

Avatar

Judith Hochberg

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Timothy R. Thomas

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

David L. Clark

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

David Nix

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

F. Laroche

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Igor Zlokarnik

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Jeff Zacks

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

John D. Zahrt

Los Alamos National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge