Vishu R. Viswanathan
Texas Instruments
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vishu R. Viswanathan.
international conference on acoustics, speech, and signal processing | 1995
Levent M. Arslan; Alan V. McCree; Vishu R. Viswanathan
We propose three new adaptive noise suppression algorithms for enhancing noise-corrupted speech: smoothed spectral subtraction (SSS), vector quantization of line spectral frequencies (VQ-LSF), and modified Wiener filtering (MWF). SSS is an improved version of the well-known spectra subtraction algorithm, while the other two methods are based on generalised Wiener filtering. We have compared these three algorithms with each other and with spectral subtraction on both simulated noise and actual car noise. All three proposed methods perform substantially better than spectral subtraction, primarily because of the absence of any musical noise artifacts in the processed speech. Listening tests showed preference for MWF and SSS over VQ-LSF. Also, MWF provides a much higher mean opinion score (MOS) than does spectral subtraction. Finally, VQ-LSF provides a relatively good spectral match to the clean speech, and may, therefore, be better suited for speech recognition.
international conference on acoustics, speech, and signal processing | 1997
Erdal Paksoy; Alan V. McCree; Vishu R. Viswanathan
In general, a variable rate coder can obtain the same speech quality as a fixed rate coder, while reducing the average bit rate. We have developed a variable-rate multimodal speech coder with an average bit rate of 3 kb/s for a speech activity factor of 80% and quality comparable to the GSM full rate coder. The coder has four coding modes and uses a robust classification method involving the pitch gain, zero crossings, and a peakiness measure. Also the coder employs a novel gain-matched analysis-by-synthesis technique for very low rate coding of unvoiced frames and an improved noise-level-dependent postfilter. This paper describes the details of our algorithm and presents the results from subjective listening tests.
international conference on acoustics, speech, and signal processing | 1992
Stephen S. Oh; Vishu R. Viswanathan; Panos E. Papamichalis
The authors present the result of their research on developing a hands-free voice communication system with a microphone array for use in an automobile environment. The goal of this research is to develop a speech acquisition and enhancement system so that a speech recognizer can reliably be used inside a noise automobile environment, for digital cellular phone application. Speech data have been collected using a microphone array and a digital audio tape (DAT) recorder inside a real car for several idling and driving conditions, and processed using delay-and-sum and adaptive beamforming algorithms. Performance criteria including signal-to-noise ratio and speech recognition error rate have been evaluated for the processed data. Detailed performance results presented show that the microphone array is superior to a single microphone.<<ETX>>
international conference on acoustics speech and signal processing | 1999
Yeshwant K. Muthusamy; Rajeev Agarwal; Yifan Gong; Vishu R. Viswanathan
With the advances in speech recognition and wireless communications, the possibilities for information access in the automobile have expanded significantly. We describe four system prototypes for (i) voice-dialing, (ii) Internet information retrieval-called InfoPhone, (iii) voice e-mail, and (iv) car navigation. These systems are designed primarily for hands-busy, eyes-busy conditions, use speaker-independent speech recognizers, and can be used with a restricted display or no display at all. The voice-dialing prototype incorporates our hands-free speech recognition engine that is very robust in noisy car environments (1% WER and 3% string error rate on the continuous digit recognition task at 0 db SNR). The InfoPhone, voice e-mail, and car navigation prototypes use a client-server architecture with the client designed to be resident on a phone or other hand-held device.
international conference on acoustics, speech, and signal processing | 2000
J.C. de Martin; Takahiro Unno; Vishu R. Viswanathan
This paper describes new techniques for concealing frame erasures for CELP-based speech coders. Two main approaches were followed: interpolative, where both past and future information are used to reconstruct the missing data, and repetition-based, where no future information is required. Key features of the repetition-based approach include improved muting, pitch delay jittering, and LPC bandwidth expansion. The interpolative approach can be employed in voice over IP scenarios at no extra cost in terms of delay. Applied to the ITU-T G.729 ACELP 8 kb/s speech coding standard, both interpolation- and repetition-based techniques outperform standard concealment in informal listening tests.
international conference on acoustics speech and signal processing | 1999
Erdal Paksoy; J. Carlos de Martin; Alan V. McCree; C.G. Gerlach; Anand K. Anandakumar; Wai-Ming Lai; Vishu R. Viswanathan
We have developed an adaptive multi-rate (AMR) speech coder designed to operate under the GSM digital cellular full rate (22.8 kb/s) and half rate (11.4 kb/s) channels and to maintain high quality in the presence of highly varying background noise and channel conditions. Within each total rate, several codec modes with different source/channel bit rate allocations are used. The speech coders in each codec mode are based on the CELP algorithm operating at rates ranging from 11.85 kb/s down to 5.15 kb/s, where the lowest rate coder is a source controlled multi-modal speech coder. The decoders monitor the channel quality at both ends of the wireless link using the soft values for the received bits and assist the base station in selecting the codec mode that is appropriate for a given channel condition. The coder was submitted to the GSM AMR standardization competition and met the qualification requirements in an independent formal MOS test.
international conference on acoustics, speech, and signal processing | 2000
Anand K. Anandakumar; Alan V. McCree; Vishu R. Viswanathan
Diversity schemes include information about packet n in future packets or send information about packet n via separate paths. If packet n is lost, it is reconstructed from information included in future packets or information received via separate paths. This paper presents CELP-based diversity schemes for voice over packet applications. The diversity schemes reduce the impact of packet losses while being efficient in terms of both bandwidth requirement and computational complexity. With our diversity schemes, transmission schemes that allocate bandwidth resources among diversity stages during congestion give significantly better performance than schemes that use no diversity during congestion, for the same bandwidth usage.
international conference on acoustics speech and signal processing | 1999
Jacek Stachurski; Alan V. McCree; Vishu R. Viswanathan
A number of coding techniques have been reported to achieve near toll quality synthesized speech at bit-rates around 4 kb/s. These include variants of code excited linear prediction (CELP), sinusoidal transform coding (STC) and multi-band excitation (MBE). While CELP has been an effective technique for bit-rates above 6 kb/s, STC, MBE, waveform interpolation (WI) and mixed excitation linear prediction (MELP) models seem to be attractive at bit-rates below 3 kb/s. We present a system to encode speech with high quality using MELP, a technique previously demonstrated to be effective at bit-rates of 1.6-2.4 kb/s. We have enhanced the MELP model producing significantly higher speech quality at bit-rates above 2.4 kb/s. We describe the development and testing of a high quality 4 kb/s MELP coder.
international conference on acoustics speech and signal processing | 1998
S. Yeldener; J.C. de Martin; Vishu R. Viswanathan
There is currently a great deal of interest in the development of speech coding algorithms capable of delivering toll quality at 4 kb/s and below. For synthesizing high quality speech, accurate representation of the voiced portions of speech is essential. For bit rates of 4 kb/s and below, conventional code excited linear prediction (CELP) may likely not provide the appropriate degree of periodicity. It has been shown that good quality low bit rate speech coding can be obtained by frequency domain techniques such as sinusoidal transform coding (STC), multi-band excitation (MBE), mixed excitation linear prediction (MELP), and multi-band LPC (MB-LPC) vocoders. In this paper, a speech coding algorithm based on an improved version of MB-LPC is presented. Main features of this algorithm include a multi-stage time/frequency pitch estimation and an improved mixed voicing representation. An efficient quantization scheme for the spectral amplitudes of the excitation, called formant weighted vector quantization, is also used. This improved coder, called mixed sinusoidally excited linear prediction (MSELP), yields an unquantized model with speech quality better than the 32 kb/s AD-PCM quality. Initial efforts towards a fully quantized 4 kb/s coder, although not yet successful in achieving the toll quality goal, have produced good output speech quality.
international conference on acoustics speech and signal processing | 1998
Michael W. Macon; Alan V. McCree; Wai-Ming Lai; Vishu R. Viswanathan
It is well-known that an impulse-excited, all-pole filter is capable of representing many physical phenomena, including the oscillatory modes of percussion musical instruments like woodblocks, xylophones, or chimes. In contrast to the more common application of all-pole models to speech, however, practical problems arise in music synthesis due to the location of poles very close to the unit circle. The objective of this work was to develop algorithms to find excitation and filter parameters for synthesis of percussion instrument sounds using only an inexpensive all-pole filter chip (TI TSP50C1x). The paper describes analysis methods for dealing with pole locations near the unit circle, as well as a general method for modeling the transient attack characteristics of a particular sound while independently controlling the amplitudes of each oscillatory mode.