Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Quentin Summerfield is active.

Publication


Featured researches published by Quentin Summerfield.


Phonetica | 1979

Use of Visual Information for Phonetic Perception

Quentin Summerfield

The accuracy with which naive listeners can report sentences presented 12 dB below a background of continuous prose was compared with accuracy in four audio visually supplemented conditions. With monochrome displays of the talker showing (i) the face, (ii) the lips and (iii) four points at the centres of the lips and the corners of the mouth, accuracy improved by 43, 31 and 8%, respectively. No improvement was produced by optical information on syllabic timing. The results suggest that optical concomitants of articulation specify linguistic information to normal listeners. This conclusion was reinforced in a second experiment in which identification functions were obtained for continua of synthetic syllables ranging between [aba], [ada] and [aga], presented both in isolation and in combination with video recordings. Audio-visually, [b] was only perceived when lip closure was specified optically and, if lip closure was specified optically, [b] was generally perceived. Perceivers appear to make use of articulatory constraints upon the combined audio-visual specification of phonetic events, suggesting that optical and acoustical displays are co-perceived in a common metric closely related to that of articulatory dynamics.


British Journal of Audiology | 1990

A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise : rationale, evaluation, and recommendations for use

Alison Macleod; Quentin Summerfield

The strategy for measuring speech-reception thresholds for sentences in noise advocated by Plomp and Mimpen (Audiology, 18, 43-52, 1979) was modified to create a reliable test for measuring the difficulty which listeners have in speech reception, both auditorily and audio-visually. The test materials consist of 10 lists of 15 short sentences of homogeneous intelligibility when presented acoustically, and of different, but still homogeneous, intelligibility when presented audio-visually, in white noise. Homogeneity was achieved by applying phonetic and linguistic principles at the stage of compilation, followed by pilot testing and balancing of properties. To run the test, lists are presented at signal-to-noise ratios (SNRs) determined by an up-down psychophysical rule so as to estimate auditory and audio-visual speech-reception thresholds, defined as the SNRs at which the three content words in each sentence are identified correctly on 50% of trials. These thresholds provide measures of a subjects speech-reception abilities. The difference between them provides a measure of the benefit received from vision. It is shown that this measure is closely related to the accuracy with which subjects lip-read words in sentences with no acoustical information. In data from normally hearing adults, the standard deviations (s.d.s) of estimates of auditory speech reception threshold in noise (SRTN), audio-visual SRTN, and visual benefit are 1.2, 2.0, and 2.3 dB, respectively. Graphs are provided with which to estimate the trade-off between reliability and the number of lists presented, and to assess the significance of deviant scores from individual subjects.


Journal of the Acoustical Society of America | 1995

Perceptual separation of concurrent speech sounds: Absence of across‐frequency grouping by common interaural delay

John Francis Culling; Quentin Summerfield

Three experiments and a computational model explored the role of within-channel and across-channel processes in the perceptual separation of competing, complex, broadband sounds which differed in their interaural phase spectra. In each experiment, two competing vowels, whose first and second formants were represented by two discrete bands of noise, were presented concurrently, for identification. Experiments 1 and 2 showed that listeners were able to identify the vowels accurately when each was presented to a different ear, but were unable to identify the vowels when they were presented with different interaural time delays (ITDs); i.e. listeners could not group the noisebands in different frequency regions with the same ITD and thereby separate them from bands in other frequency regions with a different ITD. Experiment 3 demonstrated that while listeners were unable to exploit a difference in interaural delay between the pairs of noisebands, listeners could identify a vowel defined by interaurally decorrelated noisebands when the other two noisebands were interaurally correlated. A computational model based upon that of Durlach [J. Acoust. Soc. Am. 32, 1075-1076 (1960)] showed that the results of these and other experiments can be interpreted in terms of a within-channel mechanism, which is sensitive to interaural decorrelation. Thus the across-frequency integration which occurs in the lateralization of complex sounds may play little role in segregating concurrent sounds.


Journal of Experimental Psychology: Human Perception and Performance | 1981

Articulatory rate and perceptual constancy in phonetic perception.

Quentin Summerfield

The perception of syllable-initial stop consonants as voiced (/b/, /d/, /g/) or voiceless (/p/, /t/, /k/) was shown to depend on the prevailing rate of articulation. Reducing the articulatory rate of a precursor phrase causes a greater proportion of test consonants to be identified as voiced. Subsequent experiments demonstrated that this effect depends almost entirely on variation in the duration of the syllable immediately preceding the test syllable; this, the duration of the intervening silent stop closure, and the duration of the test syllable itself all influenced the identification of the stop as voiced or voiceless. Variation in the tempo of a nonspeech melody produced no effect on the perception of embedded test syllables. Those manipulations which produce the major part of the influence of rate do so not by changing the context in which the stop is perceived, but rather by changing temporal concomitants of the constriction, occlusion, and release phases of the articulation of the stop itself. For this reason, an explanation for such effects based on extrinsic timing in perception is found to be wanting. Timing should, in the main, be regarded as intrinsic to the acoustical specifications of phonetic events, a view that is compatible with recent reformulations of the problem of timing control in speech production.


Journal of the Acoustical Society of America | 1982

Psychoacoustic and phonetic temporal processing in normal and hearing‐impaired listeners

Richard S. Tyler; Quentin Summerfield; Elizabeth J. Wood; Mariano A. Fernandes

Four measures of auditory temporal processing were obtained from 16 normals and 16 individuals with a hearing loss of heterogeneous origin. These measures were: (1) temporal integration—the difference in detection thresholds between signals of 10‐ and 1000‐ms duration (which was determined to provide an estimate of the ability to integrate energy over time), (2) gap detection—the shortest duration of silence between two noise bursts that can be discriminated from an uninterrupted noise, (3) temporal difference limen—the increment in duration necessary to detect a difference in the duration of a noise burst, (4) gap difference limen—the increment in duration necessary to detect a difference in the duration of a silent interval between two noise bursts. Each measure was obtained for stimuli centered both at 500 and at 4000 Hz using a three‐alternative forced‐choice procedure. In addition, measures of identification and discrimination were obtained for two sets of synthetic speech syllables varying chiefly in a temporal parameter, voice‐onset‐time, from /ba/ to /pa/ and from /bi/ to /pi/. Finally, speech identification in noise was measured with the FAAF test. Most of the hearing‐impaired listeners displayed poorer temporal analysis than the normals on all of the psychoacoustical tasks, regardless of whether the two groups were compared at similar sound pressure levels or at similar sensation levels. Although the hearing‐impaired listeners displayed a reduction in the ability to discriminate subphonemic cues for the voiced–voiceless distinction, their identification of that distinction in stop consonants appeared to be normal. The hearing‐impaired group made about twice as many errors as did the normals on each of the consonant features of place, manner, and voicing when identifying speech in noise. Increased temporal difference limen and longer gap‐detection thresholds were found to correlate significantly with reduced speech intelligibility in noise, even when the effects of the pure‐tone threshold loss were partialed out.


Journal of the Acoustical Society of America | 1985

Intermodal timing relations and audio‐visual speech recognition by normal‐hearing adults

Matthew McGrath; Quentin Summerfield

Audio-visual identification of sentences was measured as a function of audio delay in untrained observers with normal hearing; the soundtrack was replaced by rectangular pulses originally synchronized to the closing of the talkers vocal folds and then subjected to delay. When the soundtrack was delayed by 160 ms, identification scores were no better than when no acoustical information at all was provided. Delays of up to 80 ms had little effect on group-mean performance, but a separate analysis of a subgroup of better lipreaders showed a significant trend of reduced scores with increased delay in the range from 0-80 ms. A second experiment tested the interpretation that, although the main disruptive effect of the delay occurred on a syllabic time scale, better lipreaders might be attempting to use intermodal timing cues at a phonemic level. Normal-hearing observers determined whether a 120-Hz complex tone started before or after the opening of a pair of liplike Lissajou figures. Group-mean difference limens (70.7% correct DLs) were - 79 ms (sound leading) and + 138 ms (sound lagging), with no significant correlation between DLs and sentence lipreading scores. It was concluded that most observers, whether good lipreaders or not, possess insufficient sensitivity to intermodal timing cues in audio-visual speech for them to be used analogously to voice onset time in auditory speech perception. The results of both experiments imply that delays of up to about 40 ms introduced by signal-processing algorithms in aids to lipreading should not materially affect audio-visual speech understanding.


Journal of the Acoustical Society of America | 1977

On the dissociation of spectral and temporal cues to the voicing distinction in initial stop consonants

Quentin Summerfield; Mark Haggard

It has been claimed that a rising first‐formant (F1) transition is an important cue to the voiced–voiceless distinction for syllable‐initial, prestressed stop consonants in English. Lisker [J. Acoust. Soc. Am. 57, 1547–1551 (L) (1975)] has pointed out that the acoustic manipulations suggesting a role for F1 have involved covariation of the onset frequency of F1 with the duration, and hence the frequency extent, of the F1 transition; he has argued that effects hitherto ascribed to the transition are more properly attributed to its onset. Two experiments are reported in which F1 onset frequency and F1 transition duration/extent were manipulated independently. The results confirm Lisker’s suggestion that the major effect of F1 in initial voicing contrasts is determined by its perceived frequency at the onset of voicing and show that a periodically excited F1 transition is not, per se, a positive cue to voicing. In further experiments, the relative levels and the frequencies at the onset of voicing of both F1...


Archive | 2004

The Perception of Speech Under Adverse Conditions

Peter F. Assmann; Quentin Summerfield

Speech is the primary vehicle of human social interaction. In everyday life, speech communication occurs under an enormous range of different environmental conditions. The demands placed on the process of speech communication are great, but nonetheless it is generally successful. Powerful selection pressures have operated to maximize its effectiveness. The adaptability of speech is illustrated most clearly in its resistance to distortion. In transit from speaker to listener, speech signals are often altered by background noise and other interfering signals, such as reverberation, as well as by imperfections of the frequency or temporal response of the communication channel. Adaptations for robust speech transmission include adjustments in articulation to offset the deleterious effects of noise and interference (Lombard 1911; Lane and Tranel 1971); efficient acousticphonetic coupling, which allows evidence of linguistic units to be conveyed in parallel (Hockett 1955; Liberman et al. 1967; Greenberg 1996; see Diehl and Lindblom, Chapter 3); and specializations of auditory perception and selective attention (Darwin and Carlyon 1995). Speech is a highly efficient and robust medium for conveying information under adverse conditions because it combines strategic forms of redundancy to minimize the loss of information. Coker and Umeda (1974, p. 349) define redundancy as “any characteristic of the language that forces spoken messages to have, on average, more basic elements per message, or more cues per basic element, than the barest minimum [necessary for conveying the linguistic message].” This definition does not address the function of redundancy in speech communication, however. Coker and Umeda note that “redundancy can be used effectively; or it can be squandered on uneven repetition of certain data, leaving other crucial items very vulnerable to noise. . . . But more likely, if a redundancy is a property of a language and has to be learned, then it has a purpose.” Coker and Umeda conclude that the purpose of redundancy in speech communication is to provide a basis for error correction and resistance to noise.


Quarterly Journal of Experimental Psychology | 1984

Detection and resolution of audio-visual incompatibility in the perception of vowels

Quentin Summerfield; Matthew McGrath

If audio and video recordings of a talker speaking consonant-vowel syllables containing different consontants are approximately synchronised, observers may fail to detect conflict between the modalities and perceive consonants presented in neither individual modality. The present experiments demonstrate an analogous effect in the perception of vowels. Vision can bias the identity of an acoustical vowel to be more like the vowel presented visually, even when observers detect conflict and are instructed to report only what they hear. The size of the effect is positively related to the size of the physical difference between the visible configuration of the lips and the configuration that would naturally accompany the acoustical vowel. In demonstrating these and other phenomena in audio-visual speech perception, observers behave as if they compute a continuous estimate of the filter function of the vocal tract from both visual and acoustical evidence. If the visual evidence is potent, observers may appear to interpret the acoustical evidence in novel ways. However, these compromises can be predicted from known patterns of acoustical similarity and visual distinctiveness and do not require ad hoc explanations involving categorical levels of perceptual process.


Attention Perception & Psychophysics | 1981

Audiovisual presentation demonstrates that selective adaptation in speech perception is purely auditory.

Martin Roberts; Quentin Summerfield

MRC Institute of Hearing Research, University of Nottingham, Nottingham NG72RD, England Both auditory and phonetic processes have been implicated by previous results from selective adaptation experiments using speech stimuli. It has proved difficult to dissociate their individual contributions because the auditory and phonetic structure of conventional acoustical stimuli are mutually predictive. In the present experiment, the necessary dissociation was achieved by using an audiovisual adaptor consisting of an acoustical [bɛ] synchronized to a video recording of a talker uttering the syllable [gɛ]. This stimulus was generally identified as one of the dentals [dɛ] or [∂ɛ]. It produced an adaptation effect, measured with an acoustical [bɛ-dɛ] test continuum, identical in size and direction to that produced by an acoustical [bɛ]—an adaptor sharing its acoustical structure—and opposite in direction to that produced by an acoustical [dɛ]—an adaptor sharing its perceived phonetic identity. Thus, the result strongly suggests that auditory rather than phonetic levels of processing are influenced in selective adaptation.

Collaboration


Dive into the Quentin Summerfield's collaboration.

Top Co-Authors

Avatar

Peter F. Assmann

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

John Foster

University of Nottingham

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alan R. Palmer

University of Nottingham

View shared research outputs
Top Co-Authors

Avatar

Alison Macleod

University of Nottingham

View shared research outputs
Top Co-Authors

Avatar

Andrew Sidwell

University of Nottingham

View shared research outputs
Top Co-Authors

Avatar

Garry Barton

University of East Anglia

View shared research outputs
Top Co-Authors

Avatar

Mark Haggard

University of Cambridge

View shared research outputs
Researchain Logo
Decentralizing Knowledge