Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where P. Vijayalakshmi is active.

Publication


Featured researches published by P. Vijayalakshmi.


ieee region 10 conference | 2009

Selective pole modification-based technique for the analysis and detection of hypernasality

P. Vijayalakshmi; T. Nagarajan; Jayanthan Rav

Inadequate velopharyngeal closure, due to structural or neurological problems, allows air to pass through the nasal cavity leading to introduction of inappropriate nasal resonances during speech production resulting in hypernasal speech. Our previous work on the acoustic analysis of hypernasal speech using group delay function for the detection of hypernasality showed stable effects of vowel nasalization in the low frequency region. In the current work linear prediction (LP)-based pole modification technique is used as a detection method for hypernasality. Lower order LP analysis frequently fails to resolve two closely spaced formants. On the contrary, increasing the LP-order may show spurious peaks in the low frequency region for the normal speech also, that is due to pitch harmonics. To disambiguate this in the present study using a higher order LP spectrum, the pole corresponding to strongest-peak in the low frequency region is defocussed and a new signal is resynthesized. Maximum of the cross-correlation value between the input and the resynthesized speech signal is taken as a measure for the detection of hypernasality. Speech data from 25 hypernasal speakers with unrepaired cleft-lip and palate and 25 normal speakers are considered for the analysis. The performance on hypernasality/nasal decision task is 100%.


national conference on communications | 2013

Development and evaluation of unit selection and HMM-based speech synthesis systems for Tamil

Ramani Boothalingam; V. Sherlin Solomi; Anushiya Rachel Gladston; S. Lilly Christina; P. Vijayalakshmi; Nagarajan Thangavelu; Hema A. Murthy

An unrestricted text-to-speech system is expected to produce a speech signal, corresponding to the given text in a language, that is highly intelligible to a human listener. Presently, unit selection-based synthesis (USS) and statistical parametric synthesis techniques are the state-of-art techniques for this task. Earlier, in [3], a concatenative synthesizer was developed for the language, Tamil, using 12 hrs of speech data, and shown that syllable is the better subword unit. The current work focuses on building FestVox voices using phoneme/CV unit as the subword unit, for a reduced amount of speech data (5 hrs) and to compare their performances in terms of quality. Further, the focus is to compare the performance of this synthesizer with that of the well known HMM-based speech synthesizer. Among the phoneme and CV-based systems built, although there are bound to be more concatenation points in a phoneme-based system, it is observed that it triumphs the CV-based system with an MOS of 2.96, primarily because, there are more examples available for each phoneme for the given amount of speech data. Further, an HMM-based speech synthesis system is developed using 5 hrs data. Although, in the synthesized speech, the speaker identity is not completely preserved, there are no sonic-glitches and the quality obtained is much better than that of a phoneme/CV-based systems, with an MOS of 3.86. Further, the footprint size of the system is exorbitantly reduced from 1 GB in USS system to 6 MB in HMM-based speech synthesis system.


international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013

Analysis on acoustic similarities between Tamil and English phonemes using product of likelihood-Gaussians for an HMM-based mixed-language synthesizer

V. Sherlin Solomi; S. Lilly Christina; G. Anushiya Rachel; B. Ramani; P. Vijayalakshmi; T. Nagarajan

A mixed-language (polyglot) synthesizer is one that synthesizes intelligible multilingual speech with a single speakers voice with appropriate pronunciations. Two main requirements of a mixed-language synthesizer are that (i) the transition from one language to another (language switching) and (ii) the influence of one language on another should not be perceivable. In this regard, in [1], while developing a bilingual text-to-speech (TTS) system for Mandarin and English, the minimum Kullback-Leibler divergence(KLD) criterion, applied state-wise to the context-independent hidden Markov models(HMMs) is used to cluster the states of acoustically similar phonemes across the two languages. In the current work, using context-independent HMMs trained separately for two languages, namely, Tamil and English, an attempt has been made to find the acoustically similar phonemes using product of Gaussians (PoG) in the log-likelihood space. A speech corpus, with Tamil and English data, uttered by the same speaker, is used for this task. The quality of the speech synthesized by the mixed-language synthesizer is assessed subjectively, and the mean opinion score of 3.49 is obtained when acoustically similar phonemes alone are merged. In addition, analyses are carried out to find the amount of language switching and the influence of one language on the other.


ieee region 10 conference | 2013

Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages

B. Ramani; M. P. Actlin Jeeva; P. Vijayalakshmi; T. Nagarajan

A multilingual text-to-speech (TTS) system synthesizes speech signal in multiple languages for a given text, that is intelligible to human listener. However, given a mixed language text to the system, the synthesized output is observed to have speaker switching at the language switching points, which is annoying to the listeners. To overcome this switching effect, a polyglot speech synthesizer is developed, which generates synthesized speech in multiple languages with single voice identity. This can be achieved by inherent voice conversion during synthesis or by using voice conversion to convert the multilingual speech corpus to polyglot speech corpus and then perform synthesis. In this work, the polyglot speech corpus is obtained using Gaussian mixture model (GMM)-based cross-lingual voice conversion technique and a polyglot speech synthesizer for Indian languages is developed using hidden Markov model (HMM)- based synthesis technique. Here, the speech data collected from the native speakers for the Indian languages namely, Telugu, Malayalam, and Hindi are converted to have the voice identity of the native Tamil speaker. Building a HMM-based synthesizer using the obtained polyglot corpus enables the system to synthesize speech for any given text in any language or mixed language text. The performance of the polyglot speech synthesizer is evaluated for the similarity of the synthesized speech to the source or target speaker by performing ABX listening test. The scores obtained shows that the percentage of similarity to the target Tamil speaker varies from 73% to 86%. Further the performance of the system is analyzed for speaker switching.


international conference on recent trends in information technology | 2012

Improving speech intelligibility in cochlear implants using vocoder-centric acoustic models

Anushiya Rachel Gladston; P. Vijayalakshmi; Nagarajan Thangavelu

The cochlear implant is a prosthetic device, used to replace a damaged inner ear. It consists of an externally worn speech processor and an internal receiver stimulator. The cochlear implant is patient specific and system specific and so in the current work, a lab model for the speech processor, based on various vocoder models is designed to analyse the effect of system specific parameters such as filter bandwidth, number of channels and vocal excitation, on the speech intelligibility. Initially a formant vocoder is designed and used in the analysis and synthesis of English vowels. A channel vocoder is then developed for the same and extended to perform the analysis and synthesis of words from the Lexical Neighbourhood Test and sentences from the TIMIT database. The effect of number of channels on the synthetic speech quality is analysed and a 21-channel vocoder is found to yield the best response with a mean opinion score (MOS) of 4 out of 5 for vowels and 3.4 for sentences. The formant trajectories and CosH distance are also used to validate the speech intelligibility. The influence of glottal pulse on speech intelligibility is analysed and the synthetic speech is found to sound more natural with a glottal pulse train than an impulse train with an MOS of 4.2 for vowels and 4 for sentences.


ieee students technology symposium | 2011

Speaker identification using utterances correspond to speaker-specific-text

B. Bharathi; P. Vijayalakshmi; T. Nagarajan

In speaker recognition tasks, the main reason for reduced accuracy is due to closely resembling speakers in the acoustic space. Conventional GMM-based modelling technique captures unique features along with common features among various classes. Further, it ignores knowledge of phonetic content of the speech. In order to increase the discriminative power of the classifier, the system must be able to use only the unique features of a given speaker with respect to his/her acoustically closely resembling speaker. This paper proposes a technique to reduce the confusion errors, by finding speaker-specific phonemes and formulate a text using the subset of phonemes that are unique, for speaker identification task. Experiments have been conducted on speaker identification task using speech data of 192 female speakers from TIMIT corpus.The performance of the proposed system is compared with that of a conventional GMM-based technique and a significant improvement is noted.


ieee students technology symposium | 2010

Pole-focused linear prediction-based spectrogram for coarticulation analysis

Anu Abraham; P. Vijayalakshmi; T. Nagarajan

Coarticulation refers to the influence of the articulation of one sound on the articulation of another sound in the same utterance [1]. Effects of the coarticulation in an utterance have to be analyzed for developing a triphone-based speech recognition system, text-to-speech (TTS) system, etc. The conventional Fourier transform-based spectrogram fails to capture the formant transitions between two adjacent phonemes. Further, depending on the size of the analysis window, either horizontal or vertical striations are observed, which affects the clarity of the spectrogram. In this work, to overcome these problems, a linear prediction-based (a model-based) spectrogram is suggested and implemented. To further improve the clarity and to make the formant trajectories more prominent, the poles of the LP analysis are focussed and a modified LP-based spectrogram is derived. The resultant pole-focussed LP-based spectrogram is found to be a better candidate for analyzing the coarticulation effects. Using this technique, a matlab-based, user-friendly tool is developed for coarticulation analysis.


ieee region 10 conference | 2009

Cochlear implant models based on critical band filters

P. Vijayalakshmi; P. Mukesh Kumar; Ra. V. Jayanthan; T. Nagarajan

Cochlear implants (CI) are electrical prosthesis that partially replaces the function of the human ear. As cochlear implants are system specific there is a need for simulation of the system parameters prior to implantation. In the current work we have considered the system specific parameters like number of channels, type of filters by developing uniform bandwidth filter based acoustic CI model and an auditory model based CI system with frequency bands spacing similar to the critical bands of the auditory system. Acoustic CI simulations are generated for all the vowels and words from the list-1 of lexical neighborhood test using waveform and feature extraction strategies. The first four formant frequencies of each vowel are used as features. A closed set listening test is conducted and a comparative study is made among the systems developed. The perceptual quality of the speech is rated in a scale of 5 point grading. The acoustic CI simulations of all the words generated by critical band filter based CI system showed a grading of 4–5 as opposed to 2–4 for uniform filters based CI system.


ieee region 10 conference | 2014

Performance comparison of KLD and PoG metrics for finding the acoustic similarity between phonemes for the development of a polyglot synthesizer

V. Sherlin Solomi; M. Saranya; G. Anushiya Rachel; P. Vijayalakshmi; T. Nagarajan

A polyglot synthesizer is a text-to-speech synthesis system, that converts a mixed-language text into speech with single speakers voice. The straightforward way to develop such a system is to build multiple language-specific synthesizers or to build a single synthesizer after merging common phonemes. In these cases, either language-switching between languages or influence of one language phoneme on the other in synthetic speech is unavoidable. In this paper, to make a compromise between amount of language-switching and language-influence, Kullback-Leibler-based and Product of likelihood-Gaussians(PoG)-based metrics are used to identify acoustically similar phonemes for the languages Tamil and English. Separate HMM-based polyglot synthesizers are built by merging the acoustically similar phones identified by these two metrics. Performance of these synthesizers are evaluated by means of degraded mean-opinion-score(DMOS). Analysis shows that the synthesizer using PoG-based-metric, performs better with a DMOS of 3.54 and 8% of language-switching and 16.6% language-influence.


international conference on recent trends in information technology | 2016

Sign language to speech conversion

P. Vijayalakshmi; M Aarthi

Human beings interact with each other to convey their ideas, thoughts, and experiences to the people around them. But this is not the case for deaf-mute people. Sign language paves the way for deaf-mute people to communicate. Through sign language, communication is possible for a deaf-mute person without the means of acoustic sounds. The aim behind this work is to develop a system for recognizing the sign language, which provides communication between people with speech impairment and normal people, thereby reducing the communication gap between them. Compared to other gestures (arm, face, head and body), hand gesture plays an important role, as it expresses the users views in less time. In the current work flex sensor-based gesture recognition module is developed to recognize English alphabets and few words and a Text-to-Speech synthesizer based on HMM is built to convert the corresponding text.

Collaboration


Dive into the P. Vijayalakshmi's collaboration.

Top Co-Authors

Avatar

T. Nagarajan

Sri Sivasubramaniya Nadar College of Engineering

View shared research outputs
Top Co-Authors

Avatar

G. Anushiya Rachel

Sri Sivasubramaniya Nadar College of Engineering

View shared research outputs
Top Co-Authors

Avatar

B. Ramani

Sri Sivasubramaniya Nadar College of Engineering

View shared research outputs
Top Co-Authors

Avatar

M. P. Actlin Jeeva

Sri Sivasubramaniya Nadar College of Engineering

View shared research outputs
Top Co-Authors

Avatar

S. Lilly Christina

Sri Sivasubramaniya Nadar College of Engineering

View shared research outputs
Top Co-Authors

Avatar

V. Sherlin Solomi

Sri Sivasubramaniya Nadar College of Engineering

View shared research outputs
Top Co-Authors

Avatar

Mrinalini K

Sri Sivasubramaniya Nadar College of Engineering

View shared research outputs
Top Co-Authors

Avatar

Nagarajan Thangavelu

Sri Sivasubramaniya Nadar College of Engineering

View shared research outputs
Top Co-Authors

Avatar

Anu Abraham

Sri Sivasubramaniya Nadar College of Engineering

View shared research outputs
Top Co-Authors

Avatar

Anushiya Rachel Gladston

Sri Sivasubramaniya Nadar College of Engineering

View shared research outputs
Researchain Logo
Decentralizing Knowledge