Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nagaraj Adiga is active.

Publication


Featured researches published by Nagaraj Adiga.


international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013

A syllable-based framework for unit selection synthesis in 13 Indian languages

Hemant A. Patil; Tanvina B. Patel; Nirmesh J. Shah; Hardik B. Sailor; Raghava Krishnan; G. R. Kasthuri; T. Nagarajan; Lilly Christina; Naresh Kumar; Veera Raghavendra; S P Kishore; S. R. M. Prasanna; Nagaraj Adiga; Sanasam Ranbir Singh; Konjengbam Anand; Pranaw Kumar; Bira Chandra Singh; S L Binil Kumar; T G Bhadran; T. Sajini; Arup Saha; Tulika Basu; K. Sreenivasa Rao; N P Narendra; Anil Kumar Sao; Rakesh Kumar; Pranhari Talukdar; Purnendu Acharyaa; Somnath Chandra; Swaran Lata

In this paper, we discuss a consortium effort on building text to speech (TTS) systems for 13 Indian languages. There are about 1652 Indian languages. A unified framework is therefore attempted required for building TTSes for Indian languages. As Indian languages are syllable-timed, a syllable-based framework is developed. As quality of speech synthesis is of paramount interest, unit-selection synthesizers are built. Building TTS systems for low-resource languages requires that the data be carefully collected an annotated as the database has to be built from the scratch. Various criteria have to addressed while building the database, namely, speaker selection, pronunciation variation, optimal text selection, handling of out of vocabulary words and so on. The various characteristics of the voice that affect speech synthesis quality are first analysed. Next the design of the corpus of each of the Indian languages is tabulated. The collected data is labeled at the syllable level using a semiautomatic labeling tool. Text to speech synthesizers are built for all the 13 languages, namely, Hindi, Tamil, Marathi, Bengali, Malayalam, Telugu, Kannada, Gujarati, Rajasthani, Assamese, Manipuri, Odia and Bodo using the same common framework. The TTS systems are evaluated using degradation Mean Opinion Score (DMOS) and Word Error Rate (WER). An average DMOS score of ≈3.0 and an average WER of about 20 % is observed across all the languages.


IEEE Signal Processing Letters | 2015

Detection of Glottal Activity Using Different Attributes of Source Information

Nagaraj Adiga; S. R. M. Prasanna

The major activity during speech production is glottal activity and is earlier detected using strength of excitation (SoE). This work uses the normalized autocorrelation peak strength (NAPS) and higher order statistics (HOS) as additional features for detecting glottal activity. The three features, namely, SoE, NAPS, and HOS, are, respectively indicators of different attributes of glottal activity, namely, energy, periodicity, and asymmetrical nature of the resulting source signal. The effectiveness of these features is analyzed using the differential electroglottograph signal, zero-frequency filtered signal, and integrated linear prediction residual, as representatives of source signal. The combination of glottal activity information from the three features outperforms any single of them, demonstrating different information represented by each of these features.


international conference on signal processing | 2014

Significance of epoch identification accuracy for prosody modification

Nagaraj Adiga; D. Govind; S. R. Mahadeva Prasanna

Epoch refers to instant of significant excitation in speech [1]. Prosody modification is the process of manipulating the pitch and duration of speech by fixed or dynamic modification factors. In epoch based prosody modification, the prosodic features of the speech signal are modified by anchoring around the epochs location in speech. The objective of the present work is to demonstrate the significance of epoch identification accuracy for prosody modification. Epoch identification accuracy is defined as standard deviation of identification timing error between estimated epochs with the reference epochs. Initially, the epochs location of the original speech are randomly varied for arbitrary time factors and corresponding prosody modified speech is generated. The perceptual quality of the prosody modified speech is evaluated from the mean opinion scores (MOS) and objective measure. The issues in the prosody modification of telephonic speech signals are also presented.


ieee region 10 conference | 2015

Speech and EGG polarity detection using Hilbert Envelope

K. T. Deepak; K. Ramesh; Nagaraj Adiga; S. R. M. Prasanna

This work proposes two different methods for polarity detection in speech and Electroglottograph (EGG) signals using Hilbert Envelope (HE). HE is defined as the magnitude of complex time function and hence an unipolar signal. The zero frequency filtering (ZFF) obtained from HE of LP residual is of same phase for both polarity. Alternatively, the ZFF of speech and EGG, integrated linear prediction residual (ILPR) of speech and Difference EGG (DEGG) are out of phase for opposite polarities. These observations are exploited to develop two methods for polarity detection. The methods are evaluated using CMU-Arctic and PTDB-TUG databases and compared with other state-of-the-art methods under clean and noisy conditions. It is found that the performance of the proposed methods are comparable in clean condition and robust under noisy condition to other existing methods.


ieee region 10 conference | 2015

Development of Assamese Text-to-speech synthesis system

Bidisha Sharma; Nagaraj Adiga; S. R. Mahadeva Prasanna

This paper presents the design and development of Assamese Text to speech (TTS) synthesis system. In particular, work focused on designing language specific rules, developing quality database, data segmentation, and to handle bilingual sound units. In Assamese language, till now no study is done to construct the grapheme to phoneme conversion rules. In this work, grapheme to phoneme conversion rules are proposed for Assamese language. The database is recorded by checking the speaking rate, variation in amplitude level, dc wandering, and clipping during data collection. A significant improvement in the synthesized voice is observed by ensuring uniform speaking rate, controlling variation in the signal amplitude level, and avoiding dc wandering and clipping during data collection. A semi-automatic segmentation approach is developed for data segmentation. Initially, segmentation is done by automatic process and later manual correction of segmentation boundaries is done to improve quality and intelligibility. It also reduce time required for the segmentation process. The developed TTS can work in bilingual mode. It can switch between Assamese and English language smoothly and maintains the sentence level intonation even for mixed texts.


ieee region 10 conference | 2015

Significance of glottal activity detection for speaker verification in degraded and limited data condition

Ashutosh Pandey; Rohan Kumar Das; Nagaraj Adiga; Naresh Gupta; S. R. Mahadeva Prasanna

The objective of this work is to establish the importance of speaker information present in the glottal regions of speech signal. In addition, its robustness for degraded data and significance for limited data is sought for the task of speaker verification. An adaptive threshold method is proposed to use on zero frequency filtered signal to get the glottal activity regions. Feature vectors are extracted from regions having significant glottal activity. An i-vector based speaker verification system is developed using NIST SRE 2003 database and the performance of proposed method is evaluated in degraded and limited data condition. Robustness of proposed method is tested for white and babble noise. Further, short utterances of test data are considered to evaluate the performance in limited data condition. The proposed method based on the selection of glottal regions is found to perform better than the baseline energy based voice activity detection method in degraded and limited data conditions.


international conference on acoustics, speech, and signal processing | 2016

Source modeling for HMM based speech synthesis using integrated LP residual

Nagaraj Adiga; S. R. Mahadeva Prasanna

In this work, new method of source modeling for HMM based speech synthesis is proposed using integrated LP residual (ILPR). The nature of ILPR waveform resembles the glottal flow derivative signal and may keep the speaker characteristics in a better way. The ILPR signal is modeled in the frequency domain by dividing the spectrum into two bands to characterize harmonic and noise components of the voice speech segment. The harmonic components of ILPR signals below the maximum voiced frequency (fm) is modeled using mel-cepstral coefficients called as RMCEPs, whereas noise component above fm is modeled by pitch adaptive triangular noise envelope weighted by the strength of excitation (SoE). The RMCEPs and SoE are modeled on the HMM framework along with MCEPs and F0 representing vocal tract information and fundamental frequency, respectively. The synthesized speech by the proposed source modeling reduces the buzziness and improves the speaker similarity compared to the conventional impulse / noise and mixed excitation source modeling and comparable with STRAIGHT based excitation. This is further reflected in both objective and subjective valuations.


ieee india conference | 2014

A hybrid Text-to-Speech synthesis using vowel and non vowel like regions

Nagaraj Adiga; S. R. Mahadeva Prasanna

This paper presents a hybrid Text-to-Speech synthesis (TTS) approach by combining advantages present in both Hidden Markov model speech synthesis (HTS) and Unit selection speech synthesis (USS). In hybrid TTS, speech sound units are classified into vowel like regions (VLRs) and non vowel like regions (NVLRs) for selecting the units. The VLRs here refers to vowel, diphthong, semivowel and nasal sound units [1], which can be better modeled from HMM framework and hence waveforms units are chosen from HTS. Remaining sound units such as stop consonants,fricatives and affricates, which are not modeled properly using HMM [2] are classified as NVLRs and for these phonetic classes natural sound units are picked from USS. The VLRs and NVLRs evidence obtained from manual and automatic segmentation of speech signal. The automatic detection is done by fusing source features obtained from Hilbert envelope (HE) and Zero frequency filter (ZFF) of speech signal. Speech synthesized from manual and automated hybrid TTS method is compared with HTS and USS voice using subjective and objective measures. Results show that synthesis quality of hybrid TTS in case of manual segmentation is better compared to HTS voice, whereas automatic segmentation has slightly inferior quality.


Iete Technical Review | 2018

Acoustic Features Modelling for Statistical Parametric Speech Synthesis: A Review

Nagaraj Adiga; S. R. M. Prasanna

ABSTRACT The objective of this paper is to present a detailed review of modelling various acoustic features employed in statistical parametric speech synthesis (SPSS). As reported in the literature, many acoustic features have been modelled in SPSS to enhance the synthesis quality. This work studies those approaches that add to the quality of SPSS by including such acoustic features. In particular, several categories of acoustic features that improve the perceptual quality of synthetic speech are discussed. The acoustic features modelling reported in the literature can be broadly classified as F0, vocal-tract, and source features, which primarily represent the prosody, intelligibility, and naturalness of speech, respectively. Besides, SPSS techniques to synthesize speech from these acoustic features and recent advancement in synthesis based on direct waveform generation are also studied in the paper. Finally, the paper concludes with a brief discussion and a mention on the scope in SPSS.


Digital Signal Processing | 2017

Improved voicing decision using glottal activity features for statistical parametric speech synthesis

Nagaraj Adiga; Banriskhem K. Khonglah; S. R. Mahadeva Prasanna

Abstract A method to improve voicing decision using glottal activity features proposed for statistical parametric speech synthesis. In existing methods, voicing decision relies mostly on fundamental frequency F 0 , which may result in errors when the prediction is inaccurate. Even though F 0 is a glottal activity feature, other features that characterize this activity may help in improving the voicing decision. The glottal activity features used in this work are the strength of excitation (SoE), normalized autocorrelation peak strength (NAPS), and higher-order statistics (HOS). These features obtained from approximated source signals like zero-frequency filtered signal and integrated linear prediction residual. To improve voicing decision and to avoid heuristic threshold for classification, glottal activity features are trained using different statistical learning methods such as the k-nearest neighbor, support vector machine (SVM), and deep belief network. The voicing decision works best with SVM classifier, and its effectiveness is tested using the statistical parametric speech synthesis. The glottal activity features SoE, NAPS, and HOS modeled along with F 0 and Mel-cepstral coefficients in Hidden Markov model and deep neural network to get the voicing decision. The objective and subjective evaluations demonstrate that the proposed method improves the naturalness of synthetic speech.

Collaboration


Dive into the Nagaraj Adiga's collaboration.

Top Co-Authors

Avatar

S. R. Mahadeva Prasanna

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

S. R. M. Prasanna

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Bidisha Sharma

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Sanasam Ranbir Singh

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Anil Kumar Sao

Indian Institute of Technology Mandi

View shared research outputs
Top Co-Authors

Avatar

Arup Saha

Centre for Development of Advanced Computing

View shared research outputs
Top Co-Authors

Avatar

Ashutosh Pandey

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Banriskhem K. Khonglah

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Bira Chandra Singh

Centre for Development of Advanced Computing

View shared research outputs
Top Co-Authors

Avatar

Biswajit Dev Sarma

Indian Institute of Technology Guwahati

View shared research outputs
Researchain Logo
Decentralizing Knowledge