Jay Gordon Wilpon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jay Gordon Wilpon is active.

Explore More

Publication

Featured researches published by Jay Gordon Wilpon.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1981

An improved endpoint detector for isolated word recognition

L. Lamel; Lawrence R. Rabiner; Aaron E. Rosenberg; Jay Gordon Wilpon

Accurate location of the endpoints of an isolated word is important for reliable and robust word recognition. The endpoint detection problem is nontrivial for nonstationary backgrounds where artifacts (i.e., nonspeech events) may be introduced by the speaker, the recording environment, and the transmission system. Several techniques for the detection of the endpoints of isolated words recorded over a dialed-up telephone line were studied. The techniques were broadly classified as either explicit, implicit, or hybrid in concept. The explicit techniques for endpoint detection locate the endpoints prior to and independent of the recognition and decision stages of the system. For the implicit methods, the endpoints are determined solely by the recognition and decision stages of the system, i.e., there is no separate stage for endpoint detection. The hybrid techniques incorporate aspects from both the explicit and implicit methods. Investigations showed that the hybrid techniques consistently provided the best estimates for both of the word endpoints and, correspondingly, the highest recognition accuracy of the three classes studied. A hybrid end-point detector is proposed which gives a rejection rate of less than 0.5 percent, while providing recognition accuracy close to that obtained from hand-edited endpoints.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1979

Speaker-independent recognition of isolated words using clustering techniques

Lawrence R. Rabiner; Stephen E. Levinson; Aaron E. Rosenberg; Jay Gordon Wilpon

A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large database consisting of 100 replications of each word (i.e., once by each of 100 talkers). The recognition system, which accepts telephone quality speech input, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule. Results for several test sets of data are presented. They show error rates that are comparable to, or better than, those obtained with speaker-trained isolated word recognition systems.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1979

Interactive clustering techniques for selecting speaker-independent reference templates for isolated word recognition

Stephen E. Levinson; Lawrence R. Rabiner; Aaron E. Rosenberg; Jay Gordon Wilpon

It is demonstrated that clustering can be a powerful tool for selecting reference templates for speaker-independent word recognition. We describe a set of clustering techniques specifically designed for this purpose. These interactive procedures identify coarse structure, fine structure, overlap of, and outliers from clusters. The techniques have been applied to a large speech data base consisting of four repetitions of a 39 word vocabulary (the letters of the alphabet, the digits, and three auxiliary commands) spoken by 50 male and 50 female speakers. The results of the cluster analysis show that the data are highly structured containing large prominent clusters. Some statistics of the analysis and their significance are presented.

Journal of the Acoustical Society of America | 1979

Considerations in applying clustering techniques to speaker-independent word recognition.

Lawrence R. Rabiner; Jay Gordon Wilpon

Recent work at Bell Laboratories has demonstrated the utility of applying sophisticated pattern recognition techniques to obtain a set of speaker‐independent word templates for an isolated word recognition system [Levinson et al., IEEE Trans. Acoust. Speech Signal Process. ASSP‐27 (2), 134–141 (1979); Rabiner et al., IEEE Trans. Acoust. Speech Signal Process.(in press)]. In these studies, it was shown that a careful experimenter could guide the clustering algorithms to choose a small set of templates that were representative of a large number of replications for each word in the vocabulary. Subsequent word recognition tests verified that the templates chosen were indeed representative of a fairly large population of talkers. Given the success of this approach, the next important step is to investigate fully automatic techniques for clustering multiple versions of a single word into a set of speaker‐independent word templates. Two such techniques are described in this paper. The first method uses distance ...

international conference on acoustics, speech, and signal processing | 1986

On the use of bandpass liftering in speech recognition

B.-H. Juang; Lawrence R. Rabiner; Jay Gordon Wilpon

In this paper, we extend the interpretation of distortion measures, based upon the observation that measurements of speech spectral envelopes (as normally obtained from analysis procedures) are prone to statistical variations due to window position fluctuations, excitation interference, measurement noise, etc. and may possess spurious characteristics because of analysis model constraints. We have found that these undesirable spectral measurement variations can be controlled (i.e. reduced in the level of variation) through proper cepstral processing and that a statistical model can be established to predict the variances of the cepstral coefficient measurements. The findings lead to the use of a bandpass liftering process aimed at reducing the variability of the statistical components of spectral measurements. We have applied this liftering process to various speech recognition problems; in particular, vowel recognition and isolated word recognition. With the liftering process, we have been able to achieve an average digit error rate of 1%, which is about half of the previously reported best results, with dynamic time warping in a speaker-independent isolated digit test.

Journal of the Acoustical Society of America | 2000

Adaptive decision directed speech recognition bias equalization method and apparatus

Biing-Hwang Juang; David Mansour; Jay Gordon Wilpon

The present invention provides a speech recognizer that creates and updates the equalization vector as input speech is provided to the recognizer. The present invention includes a speech analyzer which transforms an input speech signal into a series of feature vectors or observation sequence. Each feature vector is then provided to a speech recognizer which modifies the feature vector by subtracting a previously determined equalization vector therefrom. The recognizer then performs segmentation and matches the modified feature vector to a stored model vector which is defined as the segmentation vector. The recognizer then, from time to time, determines a new equalization vector, the new equalization vector being defined based on the difference between one or more input feature vectors and their respective segmentation vectors. The new equalization vector may then be used either for performing another segmentation iteration on the same observation sequence or for performing segmentation on subsequent feature vectors.

Proceedings of the IEEE | 1980

Automated directory listing retrieval system based on isolated word recognition

B. Aldefeld; Lawrence R. Rabiner; Aaron E. Rosenberg; Jay Gordon Wilpon

Automated directory listing retrieval has been a goal of the Bell System and others for a long time. Recent attempts at implementing such a system relied on button pushing on the part of the user. Since the Touch-Tone®keyboard does not contain a unique key corresponding to each letter of the alphabet, the button pushing system had some drawbacks. In an attempt to alleviate these problems and to provide a more natural form of communication for the user, the use of spoken spelled names was proposed in place of pushing buttons. An early form of this directory listing retrieval system was a speaker trained system (i.e., it had to be trained to each user individually) and it used a simplified directory search algorithm. Subsequent improvements and modifications to both the recognition algorithm and the directory search procedure have led to the current implementation in which the overall system is speaker independent and can automatically find the name (or names) in the directory which provides the best acoustic match to the spoken name. The new system can automatically detect and correct simple (i.e., single letter) anomalies in the spelling of the name, including letter substitutions, inversions, deletions, and insertions. If a conflict in the detected name occurs (e.g., 2 or more names with the same or close acoustic distance scores), the system automatically requests additional information to help resolve the ambiguity. In evaluational tests on an 18 000 name Bell Laboratories directory, the directory listing retrieval system found the unique correct name in 98.3 percent of the trials, on average, even though the acoustic recognizer provided the correct letters only about 70 percent of the time.

international conference on acoustics, speech, and signal processing | 1987

A linear predictive front-end processor for speech recognition in noisy environments

Yariv Ephraim; Jay Gordon Wilpon; Lawrence R. Rabiner

We investigate the performance of a recent algorithm for linear predictive (LP) modeling of speech signals, which have been degraded by uncorrelated additive noise, as a front-end processor in a speech recognition system. The system is speaker dependent, and recognizes isolated words, based on dynamic time warping principles. The LP model for the clean speech is estimated through appropriate composite modeling of the noisy speech. This is done by minimizing the Itakura-Saito distortion measure between the sample spectrum of the noisy speech and the power spectral density of the composite model. This approach results in a filtering-modeling scheme in which the filter for the noisy speech, and the LP model for the clean speech, are alternatively optimized. The proposed system was tested using the 26 word English alphabet, the ten English digits, and the three command words, stop, error, and repeat, which were contaminated by additive white noise at 5-20 dB signal to noise ratios (SNRs). By replacing the standard LP analysis with the proposed algorithm, during training on the clean speech and testing on the noisy speech, we achieve an improvement in recognition accuracy equivalent to an increase in input SNR of approximately 10 dB.

Journal of the Acoustical Society of America | 1984

A modified K‐means clustering algorithm for use in speaker‐independent isolated word recognition

Jay Gordon Wilpon; Lawrence R. Rabiner

Recent studies of isolated word recognition systems have shown that a set of carefully chosen templates can be used to bring the performance of speaker‐independent systems up to that of systems trained to the individual speaker. The earliest work in this area used a sophisticated set of pattern recognition algorithms in a human‐interactive mode to create the set of templates (multiple patterns) for each word in the vocabulary. Not only was this procedure time consuming but it was impossible to reproduce exactly, because it was highly dependent on decisions made by the experimenter. Subsequent work led to an automatic clustering procedure which, given only a set of clustering parameters, clustered tokens with the same performance as the previously developed supervised algorithms. The one drawback of the automatic procedure was that the specification of the input parameter set was found to be somewhat dependent on the vocabulary type and size of population to be clustered. Since the user of such a statistic...

IEEE Signal Processing Magazine | 2005

Intelligent virtual agents for contact center automation

Mazin Gilbert; Jay Gordon Wilpon; Benjamin J. Stern; G. Di Fabbrizio

The explosion of multimedia data, the continuous growth in computing power, and advances in machine learning and speech and natural language processing are making it possible to create a new breed of virtual intelligent agents capable of performing sophisticated and complex tasks that are radically transforming contact centers. These virtual agents are enabling ubiquitous and personalized access to communication services from anywhere. They ultimately provide a vehicle to fully automate eContact services without agent personnel. They are not limited to multimodal, multimedia, and multilingual capabilities, but also possess learning and data-mining capabilities to enable them to scale and self-maintain as well as extract and report on business intelligence. AT&T VoiceTone is a subset of this eContact revolution focused on creating this new wave of intelligent communication services.

Explore More