Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Thomas H. Crystal is active.

Publication


Featured researches published by Thomas H. Crystal.


Journal of the Acoustical Society of America | 1988

Segmental durations in connected‐speech signals: Current results

Thomas H. Crystal; Arthur S. House

Two types of analyses have been performed on the measured durations of recordings produced by six talkers reading two scripts of approximately 300 words each. The texts, the combined visual–auditory marking technique, and preliminary results were reported earlier by Crystal and House [J. Acoust. Soc. Am. 72, 705–716 (1982)]. The average durations and standard deviations of various classes of speech sounds, as well as individual speech sounds, have been determined and segmental measurements are compared to earlier data and to various pertinent published reports. The histograms of the measured durations of various sounds and categories have been fitted with distributions which are, equivalently, the exit‐probability sequence for a Markov chain or the impulse response of an IIR digital‐filter network.


Journal of the Acoustical Society of America | 1990

Articulation rate and the duration of syllables and stress groups in connected speech

Thomas H. Crystal; Arthur S. House

Further analyses have been made on readings of two scripts by six talkers [T. H. Crystal and A. S. House, J. Acoust. Soc. Am. 72, 705-716 (1982); 84, 1932-1935 (1988); 83, 1553-1573 (1988)]. Durations of syllables and stress groups are compared to earlier data and various pertinent published reports, and are used to evaluate reports of articulation rate variability. The average durations of syllables of different complexity have a quasilinear dependency on the number of phones in the syllable, where the linear factor and the vowel durations are functions of stress. The duration of stress groups has a quasilinear dependency on the number of syllables and the number of phones. It was found that variability of articulation rate, measured as the average syllable duration for interpause intervals (runs), is not random, but is the natural consequence of the content of the run. Durations of comparable runs of different talkers are highly correlated. Major determinants of average syllable duration are the average number of phones per syllable and the proportion of either + stress phones or + stress syllables in the run.


Journal of the Acoustical Society of America | 1988

Segmental durations in connected‐speech signals: Syllabic stress

Thomas H. Crystal; Arthur S. House

Analyses into the effect of syllabic stress on the durations of speech sounds have been performed on the recordings produced by six talkers reading two scripts of approximately 300 words each. The texts, the combined visual–auditory marking technique, and preliminary results were reported earlier in Crystal and House [J. Acoust. Soc. Am. 72, 705–716 (1982)] and further results were reported in Crystal and House [J. Acoust. Soc. Am. XX, XXX–XXX (1988a)]. The average durations and standard deviations of various classes of speech sounds, as well as individual speech sounds, have been determined in syllables where the stress characteristic is known. Measurements are compared to earlier data and to various pertinent published reports.


Digital Signal Processing | 2000

Speaker Verification by Human Listeners

Astrid Schmidt-Nielsen; Thomas H. Crystal

Schmidt-Nielsen, Astrid, and Crystal, Thomas H., Speaker Verification by Human Listeners: Experiments Comparing Human and Machine Performance Using the NIST 1998 Speaker Evaluation Data, Digital Signal Processing10(2000), 249?266.The speaker verification performance of human listeners was compared to that of computer algorithms/systems. Listening protocols were developed to emulate as closely as possible the 1998 algorithm evaluation run by the U.S. National Institute of Standards and Technology (NIST), while taking into account human memory limitations. A subset of the target speakers and test samples from the same telephone conversation data was used. Ways of combining listener data to arrive at a group decision were explored, and the group mean worked well. The human results were very competitive with the best computer algorithms in the same handset condition. For same numbertesting, with 3-s samples, listener panels and the best algorithm had the same equal-error rate (EER) of 8%. Listeners were better than typical algorithms. For different numbertesting, EERs increased but humans had a 40% lower equal-error rate. Human performance in general seemed relatively robust to degradation.


Journal of the Acoustical Society of America | 1988

A note on the durations of fricatives in American English

Thomas H. Crystal; Arthur S. House

The recent observations of Baum and Blumstein on the durations of word‐initial fricatives in citation form [J. Acoust. Soc. Am. 82, 1073–1077 (1987)] are augmented with additional data on word‐initial and word‐final fricatives in connected speech. In general, these data support their findings that voiceless fricatives are longer than voiced fricatives and that the distributions of the two classes overlap considerably. The distinction is more pronounced if the fricatives are first separated on the basis of their position in a word, whether word‐final tokens are prepausal, and stress characteristic. The data suggest also that fricatives in connected speech are shorter than those in citation form.


Journal of the Acoustical Society of America | 1997

Conversational speech recognition

Thomas H. Crystal

A hot topic in speech recognition is developing technology for the automatic transcription of telephone conversations. The recognizer must contain robust language, pronunciation, and acoustic models that embody the world and topic knowledge and the understanding of syntax and pronunciation, which the talkers have and use in decoding each other’s acoustic signals. Partly because of the talkers’ shared knowledge and the casual, unprepared nature of the speech, the signals have dysfluencies, incomplete and ungrammatical expressions, and ‘‘lazy,’’ reduced articulation of words. Conversational speech recognition error rates, measured in the NIST Hub‐5 evaluations, are 45% for English and 66% to 75% for Spanish, Mandarin, and Arabic. To improve this performance, the shared knowledge must be represented in a mathematical framework, which facilitates the efficient search of the sentences of a language to decode the speech. Recent work, including workshops at Rutgers CAIP and Johns Hopkins CLSP, has included the i...


Journal of the Acoustical Society of America | 1974

Effect of Local Signal Level on Differential Performance in Dichotic Listening

Thomas H. Crystal; Arthur S. House

Current studies have used a variety of dichotic listening tasks to demonstrate differential ear advantages in the processing of particular speech elements. The results have been used to draw conclusions about the roles of memory devices, feature processors, signal encodedness, and other factors in speech perception. In the belief that such interpretations are premature, we are conducting a series of experiments to evaluate the influence of the inherent levels of speech sounds on the results obtained from a variety of dichotic listening tasks. In particular, results are reported here for (1) dichotic temporal‐order judgments involving three classes of speech sounds, and (2) dichotic discrimination and identification of sound features. The listening difficulty across sound categories was equated by using envelope masking that allowed the specification of local signal‐to‐noise conditions in the stimuli. In general, the results indicate that knowledge of local signal‐level characteristics may be used to expla...


Journal of the Acoustical Society of America | 2001

Tactical communications bring new challenges to automatic speech recognition

Thomas H. Crystal; Astrid Schmidt-Nielsen; Elaine Marsh

Automatic speaker‐independent recognition of conversational speech has been making significant progress in recent years. Military communications challenge the robustness of current recognition systems. Compared to telephone conversations or news broadcasts, tactical communication has a reduced vocabulary. It consists of short utterances, limited to the task at hand, with occasional chat words. What makes recognition difficult is high levels of background noise in tactical environments (helicopters, tanks) and the degradation of the signal by military microphones, including noise canceling microphones. Of perhaps more importance, people alter their speech to overcome these degradations and the loss of intelligibility from communicating over vocoders. The DARPA‐sponsored spine program is exploring these issues using speech from pairs of participants performing a collaborative task while communicating between separate sound booths. Each person sits in an accurately reproduced military background noise enviro...


Journal of the Acoustical Society of America | 1986

Variability of timing control: Maturational or statistical?

Thomas H. Crystal; Arthur S. House

Segmental durations have been studied extensively in the context of motor control and skilled action. Rather consistently it has been found that the mean durations of sounds, and associated standard deviations, produced by younger children are larger than those produced by older children and adults. This putative lengthening and increased dispersion is of interest since, if reduction of duration with age is a consequence of neuromuscular maturation, such durational measures may serve to characterize developmental progress in motor control. Kent and Forner [J. Phonet. 8, 157–168 (1980)] have pointed out, however, that “heightened variability…may really be evidence…that slow speakers are more variable…than fast speakers” independently of maturational considerations. We have examined measurements made on readings of two scripts by three fast and three slow talkers drawn from larger groups. In this corpus, as well as an earlier one [J. Acoust. Soc. Am. 72, 705–716 (1982)] the distributions of speech sounds ar...


Journal of Speech Language and Hearing Research | 1988

A Note on the Variability of Timing Control.

Thomas H. Crystal; Arthur S. House

Collaboration


Dive into the Thomas H. Crystal's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Astrid Schmidt-Nielsen

United States Naval Research Laboratory

View shared research outputs
Top Co-Authors

Avatar

Elaine Marsh

United States Naval Research Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge