Ashwin Bellur
Indian Institute of Technology Madras
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ashwin Bellur.
Journal of New Music Research | 2014
Preeti Rao; Joe Cheri Ross; Kaustuv Kanti Ganguli; Vedhas Pandit; Vignesh Ishwar; Ashwin Bellur; Hema A. Murthy
Abstract Ragas are characterized by their melodic motifs or catch phrases that constitute strong cues to the raga identity for both the performer and the listener, and therefore are of great interest in music retrieval and automatic transcription. While the characteristic phrases, or pakads, appear in written notation as a sequence of notes, musicological rules for interpretation of the phrase in performance in a manner that allows considerable creative expression, while not transgressing raga grammar, are not explicitly defined. In this work, machine learning methods are used on labelled databases of Hindustani and Carnatic vocal audio concerts to obtain phrase classification on manually segmented audio. Dynamic time warping and HMM based classification are applied on time series of detected pitch values used for the melodic representation of a phrase. Retrieval experiments on raga-characteristic phrases show promising results while providing interesting insights on the nature of variation in the surface realization of raga-characteristic motifs within and across concerts.
Journal of New Music Research | 2014
Sankalp Gulati; Ashwin Bellur; Justin Salamon; Hg Ranjani; Vignesh Ishwar; Hema A. Murthy; Xavier Serra
Abstract The tonic is a fundamental concept in Indian art music. It is the base pitch, which an artist chooses in order to construct the melodies during a rāg(a) rendition, and all accompanying instruments are tuned using the tonic pitch. Consequently, tonic identification is a fundamental task for most computational analyses of Indian art music, such as intonation analysis, melodic motif analysis and rāg recognition. In this paper we review existing approaches for tonic identification in Indian art music and evaluate them on six diverse datasets for a thorough comparison and analysis. We study the performance of each method in different contexts such as the presence/absence of additional metadata, the quality of audio data, the duration of audio data, music tradition (Hindustani/Carnatic) and the gender of the singer (male/female). We show that the approaches that combine multi-pitch analysis with machine learning provide the best performance in most cases (90% identification accuracy on average), and are robust across the aforementioned contexts compared to the approaches based on expert knowledge. In addition, we also show that the performance of the latter can be improved when additional metadata is available to further constrain the problem. Finally, we present a detailed error analysis of each method, providing further insights into the advantages and limitations of the methods.
national conference on communications | 2011
Ashwin Bellur; K. Badri Narayan; K Raghava Krishnan; Hema A. Murthy
This paper describes ways to improve prosody modeling in syllable-based concatenative speech synthesis systems for two Indian languages, namely Hindi and Tamil, within the unit selection paradigm. The syllable is a larger unit than the diphone and contains most of the coarticulation information. Although syllable-based synthesis is quite intelligible compared to diphone based systems, naturalness especially in terms of prosody, requires additional processing. Since the synthesizer is built using a cluster unit framework, a hybrid approach, where a combination of both rule based and statistical models are proposed to model prosody of syllable like units better. It is further observed that prediction of phrase boundaries is crucial, particularly because Indian languages are replete with polysyllabic words. CART based phrase modeling for Hindi and Tamil are discussed. Perceptual experiments show a significant improvement in the MOS for both Hindi and Tamil synthesizers. Index Terms: speech synthesis, unit selection, cluster unit synthesis, phrase boundaries
national conference on communications | 2010
M. V. Vinodh; Ashwin Bellur; K. Badri Narayan; Deepali M. Thakare; Anila Susan; N. M. Suthakar; Hema A. Murthy
This paper describes the design and development of Indian language Text-To-Speech (TTS) synthesis systems, using polysyllabic units. Firstly, a phone based TTS is built. Later, a monosyllable cluster unit TTS is built. It is observed that the quality of the synthesized sentences can improve if polysyllable units are used (when the appropriate units are available), since the effects of co-articulation will be preserved in such a case. Hence, we built Hindi and Tamil TTS with polysyllabic units, that contains cluster units of more than one type (monosyllable, bisyllable and trisyllable). The system selects the best set of units during the unit selection process, so as to minimize the join and concatenation costs. Preliminary listening tests indicated that the polysyllable TTS has better quality.
international conference on acoustics, speech, and signal processing | 2013
Akshay Anantapadmanabhan; Ashwin Bellur; Hema A. Murthy
In this paper we use a Non-negative Matrix Factorization (NMF) based approach to analyze the strokes of the mridangam, a South Indian hand drum, in terms of the normal modes of the instrument. Using NMF, a dictionary of spectral basis vectors are first created for each of the modes of the mridangam. The composition of the strokes are then studied by projecting them along the direction of the modes using NMF. We then extend this knowledge of each stroke in terms of its basic modes to transcribe audio recordings. Hidden Markov Models are adopted to learn the modal activations for each of the strokes of the mridangam, yielding up to 88.40% accuracy during transcription.
national conference on communications | 2013
Ashwin Bellur; Hema A. Murthy
This work addresses the task of tonic pitch identification in Indian classical music. The drone or the tambura establishes the tonic in Indian classical music. A cepstrum based pitch extraction technique is proposed to identify the tuning of the tambura. We show that by identifying the musical note Sadja in the lower octave of a performance, the pitch of the tonic can be identified accurately. We also show that by estimating pitch of low energy frames, tonic can be identified with greater speed and higher accuracy. In order to further enhance the speed and also illustrate the ubiquitous nature of the tonic, a Non-Negative Matrix Factorization (NMF) technique based method is developed to identify tonic. The proposed methods are validated by testing on a large varied dataset and accuracies close to 100% is reported.
IEEE Transactions on Audio, Speech, and Language Processing | 2017
Ashwin Bellur; Mounya Elhilali
Parsing natural acoustic scenes using computational methodologies poses many challenges. Given the rich and complex nature of the acoustic environment, data mismatch between train and test conditions is a major hurdle in data-driven audio processing systems. In contrast, the brain exhibits a remarkable ability at segmenting acoustic scenes with relative ease. When tackling challenging listening conditions that are often faced in everyday life, the biological system relies on a number of principles that allow it to effortlessly parse its rich soundscape. In the current study, we leverage a key principle employed by the auditory system: its ability to adapt the neural representation of its sensory input in a high-dimensional space. We propose a framework that mimics this process in a computational model for robust speech activity detection. The system employs a 2-D Gabor filter bank whose parameters are retuned offline to improve the separability between the feature representation of speech and nonspeech sounds. This retuning process, driven by feedback from statistical models of speech and nonspeech classes, attempts to minimize the misclassification risk of mismatched data, with respect to the original statistical models. We hypothesize that this risk minimization procedure results in an emphasis of unique speech and nonspeech modulations in the high-dimensional space. We show that such an adapted system is indeed robust to other novel conditions, with a marked reduction in equal error rates for a variety of databases with additive and convolutive noise distortions. We discuss the lessons learned from biology with regard to adapting to an ever-changing acoustic environment and the impact on building truly intelligent audio processing systems.
conference on information sciences and systems | 2015
Ashwin Bellur; Mounya Elhilali
Neurophysiological studies of sound encoding at the level of auditory cortex paint a picture of an intricate filterbank that encodes detailed spectral and temporal modulations in the sensory input. Furthermore, these filters exhibit adaptive qualities called neural plasticity that shape their tuning parameters in line with behavioral goals of interest. In this work, we explore qualitative principles about how this neuronal reshaping can aid in an enhanced representation of target sounds. Here, we employ a set of parameterized two-dimensional Gabor filters as basis functions that tile the space of neurophysiological spectrotemporal modulations. We examine mechanisms for judiciously retuning parameters of the Gabor filterbank in order to enhance the representation of target sounds of interest. We test the efficacy of this scheme in enhancing representation of sound tokens in adverse noisy backgrounds.
international symposium/conference on music information retrieval | 2013
Vignesh Ishwar; Shrey Dutta; Ashwin Bellur; Hema A. Murthy
2nd CompMusic Workshop | 2012
Ashwin Bellur; Vignesh Ishwar; Xavier Serra; Hema A. Murthy