K. Samudravijaya
Tata Institute of Fundamental Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by K. Samudravijaya.
Sadhana-academy Proceedings in Engineering Sciences | 2002
Aniruddha Sen; K. Samudravijaya
Incorporation of speech and Indian scripts can greatly enhance the accessibility of web information among common people. This paper describes a ‘web reader’ which ‘reads out’ the textual contents of a selected web page in Hindi or in English with Indian accent. The content of the page is downloaded and parsed into suitable textual form. It is then passed on to an indigenously developed text-to-speech system for Hindi/Indian English, to generate spoken output. The text-to-speech conversion is performed in three stages: text analysis, to establish pronunciation, phoneme to acoustic-phonetic parameter conversion and, lastly, parameter-to-speech conversion through a production model. Different types of voices are used to read special messages. The web reader detects the hypertext links in the web pages and gives the user the option to follow the link or continue perusing the current web page. The user can exercise the option either through a keyboard or via spoken commands. Future plans include refining the web parser, improvement of naturalness of synthetic speech and improving the robustness of the speech recognition system.
Pattern Recognition Letters | 1996
S. Krishnan; K. Samudravijaya; P. V. S. Rao
The selection of a feature set is an important aspect of the pattern classification process. The Fisher ratio is commonly used to rank features with respect to their effectiveness for a given classification task. The procedure used implicitly assumes a symmetric and unimodal probability density for each class. In this paper, we propose a generalized definition of the Fisher ratio as applicable to Gaussian mixture densities, which can represent multi-modal or skewed distributions. The validity and usefulness of the proposed definition is tested by a Monte Carlo simulation experiment. The correlation between the classification results and the proposed objective criterion is found to be better than that attained with the conventional uni-modal measure.
Speech Communication | 1998
K. Samudravijaya; Sanjeev K. Singh; P. V. S. Rao
The accuracy of speech recognition systems is known to be affected by fast speech. If fast speech can be detected by means of a measure of speaking rate, the acoustic as well as language models of a speech recognition system can be adapted to compensate for fast speech effects. We have studied several measures of speaking rate which have the advantage that they can be computed prior to speech recognition. The proposed measures have been compared with conventional measures, viz., word and phone rate on the TIMIT database. Some of the proposed measures have significant correlations with phone rate and vowel duration. We have shown that the mismatch between actual and expected durations of test vowels reduces if the vowel duration models are adapted to speaking rate, as estimated by the proposed measures. These measures can be computed from features commonly employed in speech recognition, do not entail significant additional computational load and do not need labeling or segmentation of unknown utterance in terms of linguistic units.
Sadhana-academy Proceedings in Engineering Sciences | 1998
K. Samudravijaya; R Ahuja; Nandini Bondale; T Jose; S. Krishnan; Pinaki Poddar; xxPVS Rao; R Raveendran
This paper presents a description of a speech recognition system forHindi. The system follows a hierarchic approach to speech recognition and integrates multiple knowledge sources within statistical pattern recognition paradigms at various stages of signal decoding. Rather than make hard decisions at the level of each processing unit, relative confidence scores of individual units are propagated to higher levels. Phoneme recognition is achieved in two stages: broad acoustic classification of a frame is followed by fine acoustic classification. A semi-Markov model processes the frame level outputs of a broad acoustic maximum likelihood classifier to yield a sequence of segments with broad acoustic labels. The phonemic identities of selected classes of segments are decoded by class-dependent neural nets which are trained with class-specific feature vectors as input. Lexical access is achieved by string matching using a dynamic programming technique. A novel language processor disambiguates between multiple choices given by the acoustic recognizer to recognize the spoken sentence.
national conference on communications | 2013
Jigar Gada; Preeti Rao; K. Samudravijaya
Errors of speech recognition systems occur due to a variety of reasons. It is desirable to have a confidence measure that gives an idea of the accuracy of the decoder output, so that appropriate remedial measures can be taken. In this paper, we compare two approaches to detect incorrect output of a speech recognition system. The first approach employs multiple decoders, and uses a voting method to surmise confidence in the accuracy of the speech recognition system. The second approach uses a single decoder, but judiciously combines information at the segmental as well as supra segmental level to derive a measure of confidence in the output of the decoder. A neural network is trained with three features based on phone duration and one feature based on acoustic score. The output of the neural network is used to estimate the confidence in the output of the decoder. The two approaches are compared for their efficacy in detecting utterances that do not contain a valid input according to the task grammar as well as wrongly recognized valid inputs. It was observed that the second method achieves much better rejection of invalid input utterances as compared to the multi-decoder method, despite decoding a test utterance just once.
international symposium on chinese spoken language processing | 2006
K. Samudravijaya
This paper describes a recently initiated effort for collection and transcription of read as well as spontaneous speech data in four Indian languages. The completed preparatory work include the design of phonetically rich sentences, data acquisition setup for recording speech data over telephone channel, a Wizard of Oz setup for acquiring speech data of a spoken dialogue of a caller with the machine in the context of a remote information retrieval task. An account of care taken to collect speech data that is as close to real world as possible is given. The current status of the programme and the set of actions planned to achieve the goal is given.
Journal of Applied Physics | 1984
B. K. Basu; K. Samudravijaya; A. K. Nigam
We have measured the room temperature elastic constants of tetragonal PdPb2 and derived from it the Debye temperature of the material. The Debye temperature evaluated from elastic constants is compared with that obtained from low‐temperature resistivity measurements.
Journal of Low Temperature Physics | 1983
S. Sathish; K. Samudravijaya; B. K. Basu
We have measured longitudinal ultrasonic attenuation along the [110] direction in normal and superconducting states in two single crystals of lead, one made from high-purity lead and the other made with high-purity lead doped with 0.1 at % gold. In both specimens an amplitude-dependent effect in the superconducting state has been observed. The data have been taken in the frequency range from 12 to 108 MHz. In high-purity lead the amplitude-independent ratio αs/αn shows the frequency dependence observed by Randorff and Marshall, whereas in the doped specimen this ratio shows a very small spread with frequency. In both specimens deformation does not change the αs/αn ratio appreciably.
international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013
Tejas Godambe; Nandini Bondale; K. Samudravijaya; Preeti Rao
We describe the development of a continuous speech database in Marathi language. Speech data was collected from about 1500 literate speakers from 34 districts of Maharashtra, with a variety of characteristics such as age group, gender, mother tongue and educational qualification. The subjects called the data acquisition system with personal mobile handsets, and read specially designed sentence sets. The sentence data acquisition process was conducted on field in contrast to a quiet environment. As a result, the acquired speech data captured large amount of nonspeech sounds as well as incompletely spoken words. So, the speech data was transcribed employing additional labels to denote frequently occurring nonspeech sounds, different kinds of incomplete words and invalid words. We characterize the database in terms of the statistics of features such as gender distribution of speakers, phonemic richness, amount of non speech sounds, and average sentence and word lengths for both reference and actual sentences.
Archive | 2011
Ganesh Sivaraman; Swapnil Mehta; Neeraj Nabar; K. Samudravijaya
Speaker Adaptation is a technique which is used to improve the recognition accuracy of Automatic Speech Recognition (ASR) systems. Here, we report a study of the impact of online speaker adaptation on the performance of a speaker independent, continuous speech recognition system for Hindi language. The speaker adaptation is performed using the Maximum Likelihood Linear Regression (MLLR) transformation approach. The ASR system was trained using narrowband speech. The efficacy of the speaker adaptation is studied by using an unrelated speech database. The MLLR transform based speaker adaptation technique is found to significantly improve the accuracy of the Hindi ASR system by 3%.