D. Raj Reddy
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by D. Raj Reddy.
ACM Computing Surveys | 1980
Lee D. Erman; Frederick Hayes-Roth; Victor R. Lesser; D. Raj Reddy
The Hearsay-II system, developed during the DARPA-sponsored five-year speech-understanding research program, represents both a specific solution to the speech-understanding problem and a general framework for coordinating independent processes to achieve cooperative problem-solving behavior. As a computational problem, speech understanding reflects a large number of intrinsically interesting issues. Spoken sounds are achieved by a long chain of successive transformations, from intentions, through semantic and syntactic structuring, to the eventually resulting audible acoustic waves. As a consequence, interpreting speech means effectively inverting these transformations to recover the speakers intention from the sound. At each step in the interpretive process, ambiguity and uncertainty arise. The Hearsay-II problem-solving framework reconstructs an intention from hypothetical interpretations formulated at various levels of abstraction. In addition, it allocates limited processing resources first to the most promising incremental actions. The final configuration of the Hearsay-II system comprises problem-solving components to generate and evaluate speech hypotheses, and a focus-of-control mechanism to identify potential actions of greatest value. Many of these specific procedures reveal novel approaches to speech problems. Most important, the system successfully integrates and coordinates all of these independent activities to resolve uncertainty and control combinatorics. Several adaptations of the Hearsay-II framework have already been undertaken in other problem domains, and it is anticipated that this trend will continue; many future systems necessarily will integrate diverse sources of knowledge to solve complex problems cooperatively. Discussed in this paper are the characteristics of the speech problem in particular, the special kinds of problem-solving uncertainty in that domain, the structure of the Hearsay-II system developed to cope with that uncertainty, and the relationship between Hearsay-IIs structure and those of other speech-understanding systems. The paper is intended for the general computer science audience and presupposes no speech or artificial intelligence background.
Computer Graphics and Image Processing | 1978
Ron Ohlander; Keith Price; D. Raj Reddy
This paper presents a description of a general segmentation method which can be applied to many different types of scenes. We describe the segmentation method in detail and discuss the potential performance of other segmentation techniques on general scenes. We also present a subset of the images which have been analyzed using this technique and a summary of the computational effort required. Details of some of the major programs are given in the appendix.
International Journal of Human-computer Studies \/ International Journal of Man-machine Studies | 1983
Philip J. Hayes; D. Raj Reddy
Natural language processing is often seen as a way to provide easy-to-use and flexible interfaces to interactive computer systems. White natural language interfaces typically perform well in response to straightforward requests and questions within their domain of discourse, they often fail to interact gracefully with their users in less predictable circumstances. Most current systems cannot, for instance: respond reasonably to input not conforming to a rigid grammar; ask for and understand clarification if their users input is unclear; offer clarification of their own output if the user asks for it; or interact to resolve any ambiguities that may arise when the user attempts to describe things to the system. We believe that graceful interaction in these and the many other contingencies that can arise in human conversation is essential if interfaces are ever to appear co-operative and helpful, and hence be suitable for the casual or naive user, and more habitable for the experienced user. In this paper, we attempt to outline key components of graceful interaction, to identify major problems involved in realizing them, and in some cases to suggest the shape of solutions. To this end we propose a decomposition of graceful interaction into a number of relatively independent skills: skills involved in parsing elliptical, fragmented, and otherwise ungrammatical input; in ensuring robust communication; in explaining abilities and limitations, actions and the motives behind them; in keeping track of the focus of attention of a dialogue; in identifying things from descriptions, even if ambiguous or unsatisfiable; and in describing things in terms appropriate for the context. We suggest these skills are necessary for graceful interaction in general and form a good working basis for graceful interaction in a certain large class of application domains, which we define. None of these components appear individually much beyond the current state of the art, at least for suitably restricted domains of discourse. Thus, we advocate research into the construction of gracefully interacting systems as an activity likely to pay major dividends in improved man-machine communication in a relatively short time.
Journal of the Acoustical Society of America | 1974
D. Raj Reddy; Lee D. Erman; Richard D. Fennell; Bruce T. Lowerre; Richard B. Neely
This talk describes the present state of performance of the HEARSAY system. [For more complete descriptions of the system see D. R. Reddy, L. D. Erman, and R. D. Neely, “A Model and a System for Machine Recognition of Speech,” IEEE Trans. Audio Electroacoust. AU‐21, 229–238 (1973) and D. R. Reddy, L. D. Erman, R. D. Fennell, and R. B. Neely, “The HEARSAY Speech Understanding System : An Example of the Recognition Process,” Proc. 3rd Int. Joint Conf. on Artificial Intelligence (Aug. 1973)]. The system uses task and context‐dependent information to help in the recognition of the utterance; this system consists of a set of cooperating parallel processes, each representing a different source of knowledge (e.g., acoustic‐phonetic, syntactic, semantic). The knowledge is used either to predict what may appear in a given context or to verify an hypothesis resulting from a previous prediction. Performance data of the system on several tasks (e.g., medical diagnosis, news retrieval, chess, and programming) will be ...
Journal of the Acoustical Society of America | 1978
B. Yegnanarayana; D. Raj Reddy
Spectral distortions in speech affecting the performance of speech processing systems can be classified into multiplicative and additive types. We report in this paper the results of our study on the performance of Harpy continuous speech recognition system for multiplicative type of distortion with special reference to telephone quality speech. It has been observed that the recognition performance of Harpy system for digit task dropped from 99% to 93% for telephone speech [G. Goodman, B. Lowerre, D. R. Reddy, and D. Scelza, J. Acoust. Soc. Am. Vol. 60, S11(A)]. The effect of distortion on segmentation and labeling will be discussed for simulated and actual telephone data. Performance evaluation is obtained for A1 information retrieval task (1011 words) in terms of word and sentence accuracies for different types of distortion. The results will be discussed with reference to signal processing functions in the system and some normalization techniques are suggested to account for the distortion. Comparison ...
Journal of the Acoustical Society of America | 1978
B. Yegnanarayana; D. Raj Reddy
Parameters representing smoothed spectral characteristics of short segments of speech are often used as features in speech processing systems. The main pattern recognition problem in speech is matching the test spectrum with the reference spectrum. In this paper we show that the matching methods usually adopted do not yield a true measure of the actual differences in the envelopes of spectra. This is particularly true for additive type of noise degradation in speech. Two types of such degradation namely, the quantization noise of waveform encoding and additive bandlimited white noise, are considered for illustration. We show that the parameters, linear prediction coefficients or cepstral coefficients, do not represent the true spectral envelope information of the distorted signal, which explains the discrepancy among various distance measures based on these parameters. We propose a more practical approach which involves transforming one spectrum relative to the other to bring both of them to the same level of dynamic range before any comparison is made between them. The main result of this study is that quantization distortion of ADPCM speech is not very significant even at low bit rates, whereas additive white noise is deleterious even for high signal to noise ratio. This result explains to some extent the good recognition capability of Harpy speech recognition system for ADPCM speech even for the lowest bit rate [B. Yegnanarayana and D. Raj Reddy, J. Acoust. Soc. Am. 62, S27 (A)].
Journal of the Acoustical Society of America | 1977
B. Yegnanarayana; D. Raj Reddy
One of the major problems of a speech processing system is the degradation it suffers due to distortions in the speech input. One such distortion is caused by the quantization noise of waveform encoding schemes which have several attractive features for speech transmission. The objective of this study is to evaluate the performance of the Harpy continuous speech recognition system when the speech input to the system is corrupted by the quantization noise of an ADPCM system. The Harpy system uses segmentation of continuous speech based on Itakura metric and LPC‐based parameters for template generation. The effect of quantization noise on the segmentation and the estimation of LPC‐based parameters is studied for different bit rates in the range 16–48 kbs of the ADPCM system and the overall word and sentence recognition accuracies are evaluated.
Journal of the Acoustical Society of America | 1977
D. Raj Reddy; Bruce T. Lowerre
It is easy enough to digitize a large amount of speech data. But before it can be effectively used in speech research, it must be cataloged to indicate position and descriptions of words and phones that are present in the data. In the absence of adequate tools, experts must do these tasks manually. Given interactive, automatic, and semiautomatic speech analysis programs, one can significantly improve the quality of the data base and the productivity of the experts. This paper describes the structure and characteristics of programs developed at Carnegie‐Mellon University which interactively bootstrap themselves to generate symbolic descriptions of a given set of data. The present system contains programs for (1) generating environment and speaker adapted phone templates (2) interactive generation of a phone lexicon containing alternating pronunciation of words generated from data, and (3) a program for machine aided labeling of a given phrase or sentence giving the beginning and ending of each phone and ea...
Journal of the Acoustical Society of America | 1976
Henry C. Goldberg; D. Raj Reddy
Most present programs for feature extraction, segmentation, and phonetic labeling of speech are highly dependent upon the specific input‐parametric representation used. We have been developing approaches which are relatively independent of the choice of parameters. Vectors of parameter values are dealt with uniformly by statistical pattern‐recognition methods. Such an approach permits systematic evaluations of different representations (be they formants, spectra, LPCs, or analog filter measurements). Cost can be balanced against performance. Given the large number of design choices in this area, careful attention must be paid to methods which are invariant under change of parametric representation. For segmentation, regions of acoustic change are detected by functions of parameter vector similarity and by amplitude cues. The basic model of signal detection theory can be applied to quantify the missed/extra‐segment error trade‐off. Error rates of 3.7% missed for 19% extra have been achieved. To account fo...
Journal of the Acoustical Society of America | 1976
Gary Goodman; D. Raj Reddy
Comparing the realtive performances of speech‐understanding systems has always been difficult and subject to speculation. Different tasks naturally require different vocabularies with varying acoustical similarities. Moreover, constraints imposed by the syntax may make recognition easier, even for vocabularies with high ambiguity. We define “inherent size” as a measure of vocabulary complexity and investigate its relation to recognition rates and the (apparent) vocabulary size. Word recognition is modeled as a probabilistic function of a Markov process using phoneme confusion probabilities derived from an articulatory position model. Multiple pronunciations and allophonic variations are allowed. Analysis of maximal word confusions leads to lower bounds for expected word recognition rates. Inherent vocabulary size is derived from these bounds. To evaluate syntatic constraints for finite state languages, an inherent size is computed for each state based on the subset of words which lead from that state. Then, average path complexity is computed for the language. This language complexity, when compared with the total vocabulary complexity and average path length, yields the degree of syntactic constraint. Analysis of several tasks will be reported.Comparing the realtive performances of speech‐understanding systems has always been difficult and subject to speculation. Different tasks naturally require different vocabularies with varying acoustical similarities. Moreover, constraints imposed by the syntax may make recognition easier, even for vocabularies with high ambiguity. We define “inherent size” as a measure of vocabulary complexity and investigate its relation to recognition rates and the (apparent) vocabulary size. Word recognition is modeled as a probabilistic function of a Markov process using phoneme confusion probabilities derived from an articulatory position model. Multiple pronunciations and allophonic variations are allowed. Analysis of maximal word confusions leads to lower bounds for expected word recognition rates. Inherent vocabulary size is derived from these bounds. To evaluate syntatic constraints for finite state languages, an inherent size is computed for each state based on the subset of words which lead from that state. The...