Shubha Kadambe
Bell Labs
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shubha Kadambe.
international conference on acoustics, speech, and signal processing | 1997
James L. Hieronymus; Shubha Kadambe
A robust, task independent spoken language identification (LID) system which uses a large vocabulary continuous speech recognition (LVCSR) module for each language to choose the most likely language spoken is described. The acoustic analysis uses mean cepstral removal on mel scale cepstral coefficients to compensate for different input channels. The system has been trained on 5 languages: English, German, Japanese, Mandarin Chinese and Spanish using a subset of the Oregon Graduate Institute 11 language data base. The five language results show 88% correct recognition for 50 second utterances without using confidence measures and 98% correct with confidence measures without the robust front end. The recognition rate is 81% correct for 10 second utterances without confidence measures and 93% correct with confidence measures without the robust front end. Adding the robust front end improves the recognition rate approximately 3% on the short utterances and 1% for the long utterances. The best performance has been obtained for systems trained on phonetically hand labeled speech.
Optical Engineering | 1994
Shubha Kadambe; Pramila Srinivasan
Our objective is to demonstrate the applicability of adaptive wavelets for speech applications. In particular, we discuss two applications, namely, classification of unvoiced sounds and speaker identification. First, a method to classify unvoiced sounds using adaptive wavelets, which would help in developing a unified algorithm to classify phonemes (speech sounds), is described. Next, the applicability of adaptive wavelets to identify speakers using very short speech data (one pitch period) is exhibited. The described text-independent phoneme based speaker identification algorithm identifies a speaker by first modeling phonemes and then by clustering all the phonemes belonging to the same speaker into one class. For both applications, we use feed-forward neural network architecture. We demonstrate the performance of both unvoiced sounds classifier and speaker identification algorithms by using representative real speech examples.
international conference on acoustics, speech, and signal processing | 1995
Shubha Kadambe; James L. Hieronymus
A task independent spoken language identification (LID) system which uses phonological and lexical models to distinguish languages is described. We demonstrate that the performance of an LID system which is based only on acoustic models can be improved by incorporating higher level linguistic knowledge in the form of trigram phonemotatics and lexical matching. We also present the performance of our LID system for four languages (English, German, Mandarin and Spanish).
asilomar conference on signals, systems and computers | 1992
Shubha Kadambe; R. Murray; G.F. Boudreaux-Bartels
A QRS complex detector for electrocardiogram (ECG) analysis based on the dyadic wavelet transform (D/sub y/WT) is described. It overcomes the problems associated with several QRS detectors; namely, they are insensitive to non-stationarities in the QRS complex and are not robust to noise. The performance of the detector is illustrated with representative examples, and its performance is compared to that of some of the other QRS detectors developed thus far.<<ETX>>
international conference on spoken language processing | 1996
James L. Hieronymus; Shubha Kadambe
A task independent spoken language identification (LID) system which uses a large vocabulary automatic speech recognition (LVASR) module for each language to choose the most likely language spoken is described in detail. The system has been trained on 5 languages: English, German, Japanese, Mandarin Chinese and Spanish. It is demonstrated that the performance of a LID system which is based on LVASR gives very good performance, when trained and tested on a 5 language subset (English, German, Spanish, Japanese, and Mandarin Chinese) of the Oregon Graduate Institute language data base. The performance advantage is shown for both long (50 second) and short (10 second) test utterances. The five language results show 88% correct recognition for 50 second utterances without confidence measures and 98% correct with confidence measures. The recognition rate is 81% correct for 10 second utterances without confidence measures and 93% correct with confidence measures. The best performance has been obtained for systems trained on phonetically hand labeled speech.
international conference on acoustics speech and signal processing | 1996
Shubha Kadambe; Richard S. Orr; Michael J. Lyall
A decision statistic in the case of cross term deleted Wigner representation (CDWR) is derived in this paper. This statistic is then used to detect/classify signals present in underwater acoustic data. Using the output signal-to-noise ratio as a performance measure, we show in this paper that the performance of the cross CDWR based detection method is better than the performance of the auto CDWR. In addition, we show that the performance in the case of cross CDWR is similar to the performance of the classical cross-correlator when the biorthogonal analysis window satisfies minimum energy condition. Experimental results are provided to illustrate our theoretical deductions.
SPIE's International Symposium on Optical Engineering and Photonics in Aerospace Sensing | 1994
Shubha Kadambe; Pramila Srinivasan
In this paper, we describe a text-independent phoneme-based speaker identification system that uses adaptive wavelets to model the phonemes. This system identifies a speaker by modeling a very short segment of phonemes and then by clustering all the phonemes belonging to the same speaker into one class. The classification is achieved by using a two layer feed forward neural network classifier. The performance of this speaker identification system is demonstrated by considering the phonemes that were extracted from various sentences spoken by three speakers in the TIMIT acoustic-phonetic speech corpus.
international conference on acoustics, speech, and signal processing | 1997
Shubha Kadambe; Richard S. Orr
A novel cross term deleted Wigner representation can be obtained by expanding the Wigner distribution (WD) in terms of two complementary Gabor coefficients of the signal and a translated set of Wigner basis functions. Two such complementary Gabor coefficients of a signal can be obtained by reversing the role of the Gabor synthesis window h(t) and its biorthogonal function b(t). Such a representation is defined here, as the cross-biorthogonal representation (XBIO). Details of derivation of this new representation is provided in this paper. The choice of the synthesis functions and their corresponding biorthogonal functions with respect to (i) concentration/resolution capabilities, (ii) redundancy vs. minimum-dimension tradeoffs, (iii) noise reduction and (iv) basis set properties of the XBIO representation are also discussed. Simulation results are provided to substantiate the theoretical findings.
Journal of the Acoustical Society of America | 1994
Shubha Kadambe; James L. Hieronymus
A language identification (LID) system that uses phonemotactic models in addition to phoneme models to identify languages is described. The proposed LID system is trained and tested using the OGI multilanguage telephone speech database. The continuous density second‐order ergodic variable duration hidden Markov phonemic models are trained for each language using a high accuracy phoneme recognition system developed at Bell Laboratories. The phonemotactic models for each language are trained using a text corpora of about ten million words and grapheme to phoneme converters. The language Li of an incoming speech signal x is hypothesized as the one that produced the highest likelihood f(x‖λi)f(λi‖Li) for all the phonemic models λi of a given set with the phonemotactic constraint. Initially, this LID system was trained and evaluated for English/Spanish language identification and the language identification was 83% correct (79% on English and 88% on Spanish). Results for four languages will be presented. The d...
asilomar conference on signals, systems and computers | 1997
Shubha Kadambe; Richard S. Orr
Two time-frequency representations that are devoid of cross-terms can be derived by expanding the Wigner distribution (WD) in terms of the Gabor coefficients of a given signal. These two representations are referred to as the cross term deleted Wigner representation (CDWR) and cross biorthogonal representation (XBIO). In this paper, we comparatively study these two representations with respect to (i) concentration/resolution, (ii) frequency and time resolvability, and (iii) noise reduction capabilities. Such a study would give insights to the applicability of these representations for a given signal. Simulation results of the comparative study are also provided.