Laurence S. Gillick | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Laurence S. Gillick is active.

Explore More

Publication

Featured researches published by Laurence S. Gillick.

Journal of the Acoustical Society of America | 1992

Method for representing word models for use in speech recognition

Laurence S. Gillick; Dean Sturtevant; Robert Roth; James K. Baker; Janet M. Baker

A method is provided for deriving acoustic word representations for use in speech recognition. Initial word models are created, each formed of a sequence of acoustic sub-models. The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity. Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering. Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models. The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models.

Journal of the Acoustical Society of America | 1991

Method for creating and using multiple-word sound models in speech recognition

Laurence S. Gillick; Paul G. Bamberg; James K. Baker; Robert Roth

A first speech recognition method receives an acoustic description of an utterance to be recognized and scores a portion of that description against each of a plurality of cluster models representing similar sounds from different words. The resulting score for each cluster is used to calculate a word score for each word represented by that cluster. Preferably these word scores are used to prefilter vocabulary words, and the description of the utterance includes a succession of acoustic decriptions which are compared by linear time alignment against a succession of acoustic models. A second speech recognition method is also provided which matches an acoustic model with each of a succession of acoustic descriptions of an utterance to be recognized. Each of these models has a probability score for each vocabulary word. The probability scores for each word associated with the matching acoustic models are combined to form a total score for that word. The preferred speech recognition method calculates to separate word scores for each currently active vocabulary word from a common succession of sounds. Preferably the first scores is calculated by a time alignment method, while the second score is calculated by a time independent method. Preferably this calculation of two separate word scores is used in one of multiple word-selecting phase of a recognition process, such as in the prefiltering phase.

Journal of the Acoustical Society of America | 1998

Apparatuses and methods for developing and using models for speech recognition

Laurence S. Gillick; Francesco Scattone

A computerized system time aligns frames of spoken training data against models of the speech sounds; automatically selects different sets of phonetic context classifications which divide the speech sound models into speech sound groups aligned against acoustically similar frames; creates model components from the frames aligned against speech sound groups with related classifications; and uses these model components to build a separate model for each related speech sound group. A decision tree classifies speech sounds into such groups, and related speech sound groups descend from common tree nodes. New speech samples time aligned against a given speech sound groups model update models of related speech sound groups, decreasing the training data required to adapt the system. The phonetic context classifications can be based on knowledge of which contextual features are associated with acoustic similarity. The computerized system samples speech sounds using a first, larger, parameter set; automatically selects combinations of phonetic context classifications which divide the speech sounds into groups whose frames are acoustically similar, such as by use of a decision tree; selects a second, smaller, set of parameters based on that sets ability to separate the frames aligned with each speech sound group, such as by used of linear discriminant analysis; and then uses these new parameters to represent frames and speech sound models. Then, using the new parameters, a decision tree classifier can be used to re-classify the speech sounds and to calculate new acoustic models for the resulting groups of speech sounds.

Journal of the Acoustical Society of America | 1997

System for processing a succession of utterances spoken in continuous or discrete form

Laurence S. Gillick; Robert Roth

The system of the invention relates to continuous speech pre-filtering systems for use in discrete and continuous speech recognition computer systems. The speech to be recognized is converted from utterances to frame data sets, which frame data sets are smoothed to generate a smooth frame model over a predetermined number of frames. A resident vocabulary is stored within the computer as clusters of word models which are acoustically similar over a succession of frame periods. A cluster score is generated by the system, which score includes the likelihood of the smooth frames evaluated using a probability model for the cluster against which the smooth frame model is being compared. Cluster sets having cluster scores below a predetermined acoustic threshold are removed from further consideration. The remaining cluster sets are unpacked for determination of a word score for each unpacked word. These word scores are used to identify those words which are above a second predetermined threshold to define a word list which is sent to a recognizer for a more lengthy word match. Control means enable the system to initialize times corresponding to the frame start time for each frame data set, defining a sliding window.

Journal of the Acoustical Society of America | 1990

Method for speech analysis and speech recognition

Paul G. Bamberg; James K. Baker; Laurence S. Gillick; Robert Roth

A method of speech analysis calculates one or more difference parameters for each of a sequence of acoustic frames, where each difference parameter is a function of the difference between an acoustic parameter in one frame and an acoustic parameter in a nearby frame. The method is used in speech recognition which compares the difference parameters of each frame against acoustic models representing speech units, where each speech-unit model has a model of the difference parameters associated with the frames of its speech unit. The difference parameters can be slope parameters or energy difference parameters. Slope parameters are derived by finding the difference between the energy of a given spectral parameter of a given frame and the energy, in a nearby frame, of a spectral parameter associated with a different frequency band. The resulting parameter indicates the extent to which the frequency of energy in the part of the spectrum represented by the given parameter is going up or going down. Energy difference parameters are calculated as a function of the difference between a given spectral parameter in one frame and a spectral parameter in a nearby frame representing the same frequency band. In one embodiment of the invention, dynamic programming compares the difference parameters of a sequence of frames to be recognized against a sequence of dynamic programming elements associated with each of a plurality of speech-unit models. In another embodiment of the invention, each speech-unit model represents one phoneme, and the speech-unit models for a plurality of phonemes are compared against individual frames, to associate with each such frame the one or more phonemes whose models compare most closely with it.

Journal of the Acoustical Society of America | 1992

Method for deriving acoustic models for use in speech recognition

Laurence S. Gillick

The invention provides a method of deriving generally improved statistical acoustic model of a first class of speech sounds, given a limited amount of sampling data from that first class. This is done by combining a first statistic calculated from samples of that class of speech sounds with a corresponding second statistic calculated from samples of a second, broader, class of speech sounds. Preferably the second statistic is calculated from many more samples than the first statistic, so it has less sampling error that the first statistic, and preferably the second class is a super-set of the first class, so that the second statistic will provide information about the first class. In one embodiment, the invention combines statistics from the models of a plurality of first classes of speech sounds to reduce the sampling error of such statistics and thus improve the accuracy with which such models can be divided into groups of similar models. The first and second statistics can be measurements of spread, of central tendency, or both. They also can relate to different types of parameters, including spectral parameters and parameters representing the duration of speech sounds.

human language technology | 1990

A rapid match algorithm for continuous speech recognition

Laurence S. Gillick; Robert Roth

This paper describes an algorithm for performing rapid match on continuous speech that makes it possible to recognize sentences from an 842 word vocabulary on a desktop 33 megahertz 80486 computer in near real time. This algorithm relies on a combination of smoothing and linear segmentation together with the notion of word start groups. It appears that the total computation required grows more slowly than linearly with the vocabulary size, so that larger vocabularies appear feasible, with only moderately enhanced hardware. The rapid match algorithm described here is closely related to the one that is used in DragonDictate, Dragons commercial 30,000 word discrete utterance recognizer.

Archive | 1995