Yeshwant K. Muthusamy
Texas Instruments
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yeshwant K. Muthusamy.
IEEE Signal Processing Magazine | 1994
Yeshwant K. Muthusamy; E. Barnard; R.A. Cole
The Oregon Graduate Institute Multi-language Telephone Speech Corpus (OGI-TS) was designed specifically for language identification research. It currently consists of spontaneous and fixed-vocabulary utterances in 11 languages: English, Farsi, French, German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, and Vietnamese. These utterances were produced by 90 native speakers in each language over real telephone lines. Language identification is related to speaker-independent speech recognition and speaker identification in several interesting ways. It is therefore not surprising that many of the recent developments in language identification can be related to developments in those two fields. We review some of the more important recent approaches to language identification against the background of successes in speaker and speech recognition. In particular, we demonstrate how approaches to language identification based on acoustic modeling and language modeling, respectively, are similar to algorithms used in speaker-independent continuous speech recognition. Thereafter, prosodic and duration-based information sources are studied. We then review an approach to language identification that draws heavily on speaker identification. Finally, the performance of some representative algorithms is reported.<<ETX>>
international conference on acoustics, speech, and signal processing | 1994
Yeshwant K. Muthusamy; Neena Jain; Ronald A. Cole
There has been renewed interest in the field of automatic language identification over the past two years. The advent of a public-domain ten-language corpus of telephone speech has made the evaluation of different approaches to automatic language identification feasible. In an effort to provide benchmarks for evaluating machine performance, we conducted perceptual experiments on 1-, 2-, 4- and 6-second excerpts of telephone speech excised from spontaneous speech utterances in this corpus. The subject population consisted of 10 native speakers of English and 2 speakers from each of the remaining 9 languages. Statistical analyses of our results indicate that duration of the excerpt, familiarity with the language, and number of languages known are important factors affecting a subjects performance on the identification task.<<ETX>>
international conference on acoustics, speech, and signal processing | 1994
Barbara Wheatley; Kazuhiro Kondo; Wallace Anderson; Yeshwant K. Muthusamy
The feasibility of cross-language transfer of speech technology is of increasing concern as the demand for recognition systems in multiple languages grows. The paper presents a systematic study of the relative effectiveness of different methods for seeding and training HMMs in a new language, using transfer from English to Japanese for small vocabulary speaker independent continuous speech recognition as a test case. Effects of limited training data are also explored. The study found that cross-language adaptation produced better models than alternative approaches with relatively little effort, and that the number of speakers is more critical than the number of utterances for small training data sets.<<ETX>>
international conference on acoustics speech and signal processing | 1999
Yeshwant K. Muthusamy; Rajeev Agarwal; Yifan Gong; Vishu R. Viswanathan
With the advances in speech recognition and wireless communications, the possibilities for information access in the automobile have expanded significantly. We describe four system prototypes for (i) voice-dialing, (ii) Internet information retrieval-called InfoPhone, (iii) voice e-mail, and (iv) car navigation. These systems are designed primarily for hands-busy, eyes-busy conditions, use speaker-independent speech recognizers, and can be used with a restricted display or no display at all. The voice-dialing prototype incorporates our hands-free speech recognition engine that is very robust in noisy car environments (1% WER and 3% string error rate on the continuous digit recognition task at 0 db SNR). The InfoPhone, voice e-mail, and car navigation prototypes use a client-server architecture with the client designed to be resident on a phone or other hand-held device.
international conference on acoustics, speech, and signal processing | 1991
Yeshwant K. Muthusamy; Ronald A. Cole; M. Gopalakrishnan
A segment-based approach to automatic language identification is discussed which is based on the idea that the acoustic structure of languages can be estimated by segmenting speech into broad phonetic categories. Automatic language identification can then be achieved by computing features that describe the phonetic and prosodic characteristics of the language, and using these feature measurements to train a classifier to distinguish between languages. As a first step in this approach, a multilanguage neural-network-based segmentation and broad classification algorithm using seven broad phonetic categories has been built. The algorithm was trained and tested on separate sets of speakers of American English, Japanese, Mandarin Chinese, and Tamil. It currently performs with an accuracy of 82.3% on the utterances of the test set.<<ETX>>
international conference on acoustics, speech, and signal processing | 1990
Yeshwant K. Muthusamy; Ronald A. Cole; Malcolm Slaney
The ability of multilayer perceptrons (MLPs) trained with backpropagation to classify vowels excised from natural continuous speech is examined. Two spectral representations are compared: spectrograms and cochleagrams. The features used to train the MLPs include discrete Fourier transform (DFT) or cochleagram coefficients from a single frame in the middle of the vowel, or coefficients from each third of the vowel. The effects of estimates of pitch, duration, and the relative amplitude of the vowel were investigated. The experiments show that with coefficients alone, the cochleagram is superior to the spectrogram in classification performance for all experimental conditions. With the three additional features, however, the results are comparable. Perceptual experiments with trained human listeners on the same data revealed that MLPs perform much better than humans on vowels excised from context.<<ETX>>
international conference on acoustics, speech, and signal processing | 1995
Yeshwant K. Muthusamy; Edward Holliman; Barbara Wheatley; Joseph Picone; John J. Godfrey
As part of the Polyphone project, Texas Instruments is in the process of collecting and developing a corpus of telephone speech in American Spanish. The corpus, called Voice Across Hispanic America (VAHA), will attempt to provide balanced phonetic coverage of the language, in addition to containing widely used vocabulary items such as digits, letter strings, yes/no responses, proper names, and selected command words and phrases used in automated telephone service applications. The speakers are native speakers of Spanish living in the United States. The collection and development of the corpus is expected to be completed by June 1995. So far, the authors have collected about 500 speakers from various parts of the U.S. They describe the design issues in various aspects of the project, such as subject recruitment, corpus and prompt sheet design, the data acquisition system, and validation and transcription. They conclude with a brief statistical profile of the data collected.
systems man and cybernetics | 1989
Les E. Atlas; Jerome T. Connor; Dong Chul Park; Mohamed A. El-Sharkawi; Robert J. Marks; Alan Lippman; Ronald A. Cole; Yeshwant K. Muthusamy
Multilayer perceptrons and trained classification trees are two very different techniques which have recently become popular. Giving enough data and time, both methods are capable of performing arbitrary nonlinear classification. The two techniques have not previously been compared on real-world problems. The authors first consider the important differences between multilayer perceptrons and classification trees and conclude that there is not enough theoretical basis for the clear-cut superiority of one technique over the other. They then present results of a number of empirical tests on quite different problems in power system load forecasting and speaker-independent vowel identification. They compare the performance for classification and prediction in terms of accuracy outside the training set. In all cases, even with various sizes of training sets, the multilayer perceptron performed as well as or better than the trained classification trees. The authors are confident that the univariate version of the trained classification trees do not perform as well as the multilayer perceptron. More studies are needed, however, on the comparative performance of the linear combination version of the classification trees.<<ETX>>
Archive | 1994
Yeshwant K. Muthusamy; Ronald A. Cole
Automatic language identification is the problem of identifying the language being spoken from a sample of speech by an unknown speaker. Within seconds of hearing speech, people are able to determine whether it is a language they know. If it is a language with which they are not familiar, they often can make subjective judgments as to its similarity to a language they know, e.g., “sounds like German”.
conference of the international speech communication association | 1992
Yeshwant K. Muthusamy; Ronald A. Cole; Beatrice T. Oshika