H. Silverman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where H. Silverman is active.

Explore More

Publication

Featured researches published by H. Silverman.

international conference on acoustics, speech, and signal processing | 1989

How limited training data can allow a neural network to outperform an 'optimal' statistical classifier

Les T. Niles; H. Silverman; G.N. Tajchman; Marcia A. Bush

Experiments comparing artificial neural network (ANN), k-nearest-neighbor (KNN), and Bayes rule with Gaussian distributions and maximum-likelihood estimation (BGM) classifiers were performed. Classifier error rate as a function of training set size was tested for synthetic data drawn from several different probability distributions. In cases where the true distributions were poorly modeled, ANN was significantly better than BGM. In some cases, ANN was also better than KNN. Similar experiments were performed on a voiced/unvoiced speech classification task. ANN had a lower error rate than KNN or BGM for all training set sizes, although BGM approached the ANN error rate as the training set became larger. It is concluded that there are pattern classification tasks in which an ANN is able to make better use of training data to achieve a lower error rate with a particular size training set.<<ETX>>

international conference on acoustics, speech, and signal processing | 1980

State constrained dynamic programming (SCDP) for discrete utterance recognition

H. Silverman; N. Dixon

A new dynamic programming (DP) formulation, Using the state characteristics of the data to constrain registration, is introduced. It is evaluated in comparison to a standard DP formulation using ten untrained male and female talkers with two separate ten utterance vocabularies. One of these vocabularies was specifically designed to assess the strengths and weaknesses of the analysis system and DP formulations. Results show that the new method is clearly superior to the standard formulation with regard to memory requirements and computational load. While recognition accuracy is only slightly better for one version of the new method, the character of the error is more sensible and explainable. This factor is thought to be important for reject control in a discrete utterance recognition system. The parameterization of the new system also provides a better way to marry system specification with system performance.

international conference on acoustics, speech, and signal processing | 1977

A method for programming the complex general-N Winograd Fourier transform algorithm

H. Silverman

The Winograd Fourier Transform Algorithm (WFTA) requires about 20% of the multiplications used in an optimized FFT, while the number of additions remains unchanged. This paper describes one General-N (i.e. many allowable DFT sizes (N) but certainly not any vector size) complex WFTA programming technique.

international conference on acoustics, speech, and signal processing | 1990

Neural networks, maximum mutual information training, and maximum likelihood training (speech recognition)

Les T. Niles; H. Silverman; Marcia A. Bush

A Gaussian-model classifier trained by maximum mutual information estimation (MMIE) is compared to one trained by maximum-likelihood estimation (MLE) and to an artificial neural network (ANN) on several classification tasks. Similarity of MMIE and ANN results for uniformly distributed data confirm that the ANN is better than the MLE in some cases due to the ANNs use of an error-correcting training algorithm. When the probability model fits the data well, MLE is better than MMIE if the training data are limited, but they are equal if there are enough data. When the model is a poor fit, MMIE is better than MLE. Training dynamics of MMIE and ANN are shown to be similar under certain assumptions. MMIE seems more susceptible to overtraining and computational difficulties than the ANN. Overall, ANN is the most robust of the classifiers.<<ETX>>

international conference on acoustics, speech, and signal processing | 1983

A comparison of three feature vector clustering procedures in a speech recognition paradigm

L. Niles; H. Silverman; N. Dixon

One possible approach to achieving talker independence in discrete utterance recognition (DUR) is to classify speech feature vectors by using a talker-independent clustering procedure. There are many possible choices of clustering algorithms. This work studied the characteristics of three clustering procedures, Agglomerative, Basic Isodata, and a Biased Mean modification of Basic Isodata, as applied to speech feature vectors. The feature extractor consisted of a six channel filterbank similar to those used in DUR systems. The speech data was derived from 19 (total) repetitions of a ten word vocabulary, spoken by 16 different talkers. Various distance functions and feature vector representations were employed. Agglomerative clustering did not produce clusters which corresponded to any apparent classification of speech events. The Biased Mean Isodata procedure did not converge, and therefore was not useful. The Basic Isodata algorithm produced clusters which were to varying degrees identifiable with classes of speech sounds. Simple classifiers for three such classes, based on these clusters, would classify feature vectors with 5-10% error rates. Best results were obtained by using feature vectors which consisted of the log filter channel energies. These test results are good enough to encourage further development of cluster-based feature vector classifiers.

international conference on acoustics, speech, and signal processing | 1982

Study of human and machine discrete utterance recognition (DUR)

C. Vickroy; H. Silverman; N. Dixon

Performance evaluation of DUR systems has typically consisted of percentage-correct recognition (PCR) for specific vocabularies. This numerical measure is misleading because it presumes that 100% recognition is equally achievable for all vocabularies. This, in fact, is not the case. In this paper, results from an experiment which compared human-listener performance to that of a particular recognition machine will be presented. Three different vocabularies were studied. Preliminary results for normalizing machine performance with respect to the difficulty of a test vocabulary are given. Relevant data from the experiment are included to demonstrate the problem and its potential solution.

international conference on acoustics, speech, and signal processing | 1981

What are the significant variables in dynamic programming for discrete utterance recognition

N. Dixon; H. Silverman

In the recent literature, several papers have presented experimental data dealing with performance differences in discrete utterance recognition (DUR), as a function of dynamic programming variables. The theme of the present paper is that, for the most part, these differences are insignificant when viewed in the context of real-time, full-system performance. This theme is supported by the results of several large-scale experiments in which 22 untrained talkers and 2 trained talkers produced some 2676 utterances from two different vocabularies. Results show that the significant variables are the talkers and the vocabulary items themselves; other variables demonstrate strong talker and vocabulary dependencies. As independent variables, their effects are unpredictable.

international conference on acoustics, speech, and signal processing | 1987

A real-time evaluation system for a real-time connected-speech recognizer

S.M. Miller; David P. Morgan; H. Silverman; M. Karam; N. Dixon

A facility for evaluating a talker-dependent, connected-speech recognition system is described. It is implemented as an independent system and interacts in parallel with a recognizer in real-time. The evaluator includes software for speech acquisition and storage, connected-speech training, data transfer to a recognizer, database queries, and statistical analysis. Important considerations in the design were the human factors of recording, talker and recording-condition variability, and the embedded training paradigm. Automatic statistical analysis is derived via a simple string-alignment algorithm using just the orthography. In order to demonstrate the use of this system, two experiments are described for connected-digit recognition. These results are presented as automatlcally-generated confusion matrices for insertion, substitution and deletion error and individual string alignments.

international conference on acoustics, speech, and signal processing | 1989

Optimized implementation of the 2-D DFT on loosely-coupled parallel systems

S.M. Miller; H. Silverman

The problem of optimal implementation of the 2-D DFT (discrete Fourier transform) on a very large number of loosely coupled processors is addressed by means of a very accurate, high-level simulation model based on an existing hardware system, called Armstrong. Simulations were run for 2-D DFT sizes of 2*2 to 2048*2048. A comparison of simulation results to timings on the real Armstrong hardware, for up to 32 processors, showed differences of less than 3%. The simulation results were further validated through a simple analytic model for 2-D DFT performance. It is shown that there exists an optimum number of processors for each size DFT and that increasing the number of processors beyond this number actually decreases system performance.<<ETX>>

international conference on acoustics speech and signal processing | 1988

An event-synchronous signal processing system for connected-speech recognition

D. Morgan; H. Silverman

Speech-source models indicate that event-synchronous feature analysis for a standard real-time speech recognizer (speaker trained using dynamic programming) should provide a strong basis for improved performance. Among the problems which need to be addressed in implementing such a system are those of developing robust event (pitch, burst, etc.) detectors, synchronous-analysis methodologies, more meaningful feature sets, and algorithms for recognition, e.g. dynamic programming. Descriptions of current approaches to these problems for a left-to-right, connected-word recognizer are given. The event-synchronous systems performance, relative to standard asynchronous recognition methodologies, is presented for an easy and a difficult vocabulary.<<ETX>>

Explore More