Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Philip E. Rubin is active.

Publication


Featured researches published by Philip E. Rubin.


Speech Communication | 1998

Quantitative association of vocal-tract and facial behavior

Hani Camille Yehia; Philip E. Rubin; Eric Vatikiotis-Bateson

Abstract This paper examines the degrees of correlation among vocal-tract and facial movement data and the speech acoustics. Multilinear techniques are applied to support the claims that facial motion during speech is largely a by-product of producing the speech acoustics and further that the spectral envelope of the speech acoustics can be better estimated by the 3D motion of the face than by the midsagittal motion of the anterior vocal-tract (lips, tongue and jaw). Experimental data include measurements of the motion of markers placed on the face and in the vocal-tract, as well as the speech acoustics, for two subjects. The numerical results obtained show that, for both subjects, 91% of the total variance observed in the facial motion data could be determined from vocal-tract motion by means of simple linear estimators. For the inverse path, i.e. recovery of vocal-tract motion from facial motion, the results indicate that about 80% of the variance observed in the vocal-tract can be estimated from the face. Regarding the speech acoustics, it is observed that, in spite of the nonlinear relation between vocal-tract geometry and acoustics, linear estimators are sufficient to determine between 72 and 85% (depending on subject and utterance) of the variance observed in the RMS amplitude and LSP parametric representation of the spectral envelope. A dimensionality analysis is also carried out, and shows that between four and eight components are sufficient to represent the mappings examined. Finally, it is shown that even the tongue, which is an articulator not necessarily coupled with the face, can be recovered reasonably well from facial motion since it frequently displays the same kind of temporal pattern as the jaw during speech.


Psychological Review | 1994

On the perceptual organization of speech

Robert E. Remez; Philip E. Rubin; Stefanie M. Berns; Jennifer S. Pardo; Jessica M. Lang

A general account of auditory perceptual organization has developed in the past 2 decades. It relies on primitive devices akin to the Gestalt principles of organization to assign sensory elements to probable groupings and invokes secondary schematic processes to confirm or to repair the possible organization. Although this conceptualization is intended to apply universally, the variety and arrangement of acoustic constituents of speech violate Gestalt principles at numerous junctures, cohering perceptually, nonetheless. The authors report 3 experiments on organization in phonetic perception, using sine wave synthesis to evade the Gestalt rules and the schematic processes alike. These findings falsify a general auditory account, showing that phonetic perceptual organization is achieved by specific sensitivity to the acoustic modulations characteristic of speech signals.


Journal of Experimental Psychology: Human Perception and Performance | 1997

Talker identification based on phonetic information

Robert E. Remez; Jennifer M. Fellowes; Philip E. Rubin

Accounts of the identification of words and talkers commonly rely on different acoustic properties. To identify a word, a perceiver discards acoustic aspects of an utterance that are talker specific, forming an abstract representation of the linguistic message with which to probe a mental lexicon. To identify a talker, a perceiver discards acoustic aspects of an utterance specific to particular phonemes, creating a representation of voice quality with which to search for familiar talkers in long-term memory. In 3 experiments, sinewave replicas of natural speech sampled from 10 talkers eliminated natural voice quality while preserving idiosyncratic phonetic variation. Listeners identified the sinewave talkers without recourse to acoustic attributes of natural voice quality. This finding supports a revised description of speech perception in which the phonetic properties of utterances serve to identify both words and talkers.


Journal of the Acoustical Society of America | 1978

An articulatory synthesizer for perceptual research

Philip E. Rubin; Thomas Baer

A software artieulatory synthesizer, based upon a model developed by P. Mermelstein [J. Aeoust. Soe. Am. 53, 1070-1082 (1973)], has been implemented on a laboratory computer. The synthesizer is designed as a tool for studying the linguistically and pereeptually significant aspects of artieulatory events. A prominent feature of this system is that it easily permits modification of a limited set of key parameters that control the positions of the major artieulators: the lips, jaw, tongue body, tongue tip, velum, and hyoid bone. Time-varying control over vocal-tract shape and nasal coupling is possible by a straightforward procedure that is similar to keyframe animation: critical vocal-tract configurations are specified, along with excitation and timing information. Articulation then proceeds along a directed path between these key frames within the time script specified by the user. Such a procedure permits a sufficiently fine degree of control over articulator positions and movements. The organization of this system and its present and future applications are discussed.


Journal of the Acoustical Society of America | 1996

Accurate recovery of articulator positions from acoustics: New conclusions based on human data

John Hogden; Anders Löfqvist; Vince Gracco; Igor Zlokarnik; Philip E. Rubin; Elliot Saltzman

Vocal tract models are often used to study the problem of mapping from the acoustic transfer function to the vocal tract area function (inverse mapping). Unfortunately, results based on vocal tract models are strongly affected by the assumptions underlying the models. In this study, the mapping from acoustics (digitized speech samples) to articulation (measurements of the positions of receiver coils placed on the tongue, jaw, and lips) is examined using human data from a single speaker: Simultaneous acoustic and articulator measurements made for vowel-to-vowel transitions, /g/ closures, and transitions into and out of /g/ closures. Articulator positions were measured using an EMMA system to track coils placed on the lips, jaw, and tongue. Using these data, look-up tables were created that allow articulator positions to be estimated from acoustic signals. On a data set not used for making look-up tables, correlations between estimated and actual coil positions of around 94% and root-mean-squared errors around 2 mm are common for coils on the tongue. An error source evaluation shows that estimating articulator positions from quantized acoustics gives root-mean-squared errors that are typically less than 1 mm greater than the errors that would be obtained from quantizing the articulator positions themselves. This study agrees with and extends previous studies of human data by showing that for the data studied, speech acoustics can be used to accurately recover articulator positions.


Attention Perception & Psychophysics | 1976

Initial phonemes are detected faster in spoken words than in spoken nonwords

Philip E. Rubin; M. T. Turvey; Peter Van Gelder

In two experiments, subjects monitored sequences of spoken consonant-vowel-consonant words and nonwords for a specified initial phoneme. In Experiment I, the target-carrying monosyllables were embedded in sequences in which the monosyllables were all words or all nonwords. The possible contextual bias of Experiment I was minimized in Experiment II through a random mixing of target-carrying words and nonwords with foil words and nonwords. Target-carrying words were distinguished in both experiments from target-carrying nonwords only in the final consonant, e.g., /bit/ vs. /bip/. In both experiments, subjects detected the specified consonant /b/ significantly faster when it began a word than when it began a nonword. One interpretation of this result is that in speech perception lexical information is accessed before phonological information. This interpretation was questioned and preference was given to the view that the result reflected processes subsequent to perception: words become available to awareness faster than nonwords and therefore provide a basis for differential responding that much sooner.


Psychological Science | 2001

On the Bistability of Sine Wave Analogues of Speech

Robert E. Remez; Jennifer S. Pardo; Rebecca L. Piorkowski; Philip E. Rubin

Our studies revealed two stable modes of perceptual organization, one based on attributes of auditory sensory elements and another based on attributes of patterned sensory variation composed by the aggregation of sensory elements. In a dual-task method, listeners attended concurrently to both aspects, component and pattern, of a sine wave analogue of a word. Organization of elements was indexed by several single-mode tests of auditory form perception to verify the perceptual segregation of either an individual formant of a synthetic word or a tonal component of a sinusoidal word analogue. Organization of patterned variation was indexed by a test of lexical identification. The results show the independence of the perception of auditory and phonetic form, which appear to be differently organized concurrent effects of the same acoustic cause.


Attention Perception & Psychophysics | 1997

Perceiving the sex and identity of a talker without natural vocal timbre

Jennifer M. Fellowes; Robert E. Remez; Philip E. Rubin

The personal attributes of a talker perceived via acoustic properties of speech are commonly considered to be an extralinguistic message of an utterance. Accordingly, accounts of the perception of talker attributes have emphasized a causal role of aspects of the fundamental frequency and coarsegrain acoustic spectra distinct from the detailed acoustic correlates of phonemes. In testing this view, in four experiments, we estimated the ability of listeners to ascertain the sex or the identity of 5 male and 5 female talkers from sinusoidal replicas of natural utterances, which lack fundamental frequency and natural vocal spectra. Given such radically reduced signals, listeners appeared to identify a talker’s sex according to the central spectral tendencies of the sinusoidal constituents. Under acoustic conditions that prevented listeners from determining the sex of a talker, individual identification from sinewave signals was often successful. These results reveal that the perception of a talker’s sex and identity are not contingent and that fine-grain aspects of a talker’s phonetic production can elicit individual identification under conditions that block the perception of voice quality.


Experimental Brain Research | 1998

Dynamics of intergestural timing: a perturbation study of lip-larynx coordination.

Elliot Saltzman; Anders Löfqvist; Bruce A. Kay; Jeff Kinsella‐Shaw; Philip E. Rubin

Abstract In this study, downward-directed mechanical perturbations were applied to the lower lip during both repetitive (/…pæpæpæ…/) and discrete (/p ’sæpæpl/) utterances in order to examine the perturbation-induced changes of intergestural timing between syllables (i.e., between the bilabial and laryngeal gestures for successive /p/’s) and within phonemes (i.e., between the bilabial and laryngeal gestures within single /p/’s ). Our findings led us to several conclusions. First, steady-state (phase-resetting) analyses of the repetitive utterances indicated both that ”permanent” phase shifts existed for both the lips and the larynx after the system returned to its pre-perturbation rhythm and that smaller steady-state shifts occurred in the relative phasing of these gestures. These results support the hypothesis that central intergestural dynamics can be reset by peripheral articulatory events. Such resetting was strongest when the perturbation was delivered within a ”sensitive phase” of the cycle, during which the downwardly directed lower-lip perturbation opposed the just-initiated, actively controlled bilabial closing gesture for /p/. Although changes in syllable duration were found for other perturbed phases, these changes were simply transient effects and did not indicate a resetting of the central ”clock.” Second, analyses of the transient portions of the perturbed cycles of the repetitive utterances indicated that the perturbation-induced steady-state phase shifts are almost totally attributable to changes occurring during the first two perturbed cycles. Finally, the transient changes in speech timing induced by perturbations in the discrete sequences appeared to share a common dynamical basis with the changes to the repetitive sequences. We conclude by speculating on the type of dynamical system that could generate these temporal patterns.


Journal of Experimental Psychology: Human Perception and Performance | 1987

Perceptual normalization of vowels produced by sinusoidal voices.

Robert E. Remez; Philip E. Rubin; Lynne C. Nygaard; William A. Howell

When listeners hear a sinusoidal replica of a sentence, they perceive linguistic properties despite the absence of short-time acoustic components typical of vocal signals. Is this accomplished by a postperceptual strategy that accommodates the anomalous acoustic pattern ad hoc, or is a sinusoidal sentence understood by the ordinary means of speech perception? If listeners treat sinusoidal signals as speech signals however unlike speech they may be, then perception should exhibit the commonplace sensitivity to the dimensions of the originating vocal tract. The present study, employing sinusoidal signals, raised this issue by testing the identification of target /bVt/, or b-vowel-t, syllables occurring in sentences that differed in the range of frequency variation of their component tones. Vowel quality of target syllables was influenced by this acoustic correlate of vocal-tract scale, implying that the perception of these nonvocal signals includes a process of vocal-tract scale, implying that the perception of these nonvocal signals includes a process of vocal-tract normalization. Converging evidence suggests that the perception of sinusoidal vowels depends on the relation among component tones and not on the phonetic likeness of each tone in isolation. The findings support the general claim that sinusoidal replicas of natural speech signals are perceptible phonetically because they preserve time-varying information present in natural signals.

Collaboration


Dive into the Philip E. Rubin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John Hogden

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Louis Goldstein

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Eric Vatikiotis-Bateson

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David B. Pisoni

Indiana University Bloomington

View shared research outputs
Researchain Logo
Decentralizing Knowledge