Vinayak Abrol
Indian Institute of Technology Mandi
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vinayak Abrol.
Pattern Recognition Letters | 2016
Vinayak Abrol; Pulkit Sharma; Anil Kumar Sao
Proposed a novel kernel dictionary learning algorithm.Dictionary is updated in the coefficient domain instead of the signal domain.Proposed a hierarchical learning framework for efficient sparse representation.Proposed algorithm has much less computational complexity.Proposed approach performs well for various pattern classification tasks. We present a novel dictionary learning (DL) approach for sparse representation based classification in kernel feature space. These sparse representations are obtained using dictionaries, which are learned using training exemplars that are mapped into a high-dimensional feature space using the kernel trick. However, the complexity of such approaches using kernel trick is a function of the number of training exemplars. Hence, the complexity increases for large datasets, since more training exemplars are required to get good performance for most of the pattern classification tasks. To address this, we propose a hierarchical DL approach which requires the kernel matrix to update the dictionary atoms only once. Further, in contrast to the existing methods, the dictionary is learned in a linearly transformed/coefficient space involving sparse matrices, rather than the kernel space. Compared to the existing state-of-the-art methods, the proposed method has much less computational complexity, but performs similar for various pattern classification tasks.
Speech Communication | 2015
Vinayak Abrol; Pulkit Sharma; Anil Kumar Sao
Abstract We leverage the recent algorithmic advances in compressive sensing (CS), and propose a novel unsupervised voiced/nonvoiced (V/NV) detection method for compressively sensed speech signals. It attempts to exploit the fact that there is significant glottal activity during production of voiced speech while the same is not true for nonvoiced speech. This characteristic of the speech production mechanism is captured in the sparse feature vector derived using CS framework. Further, we propose an information theoretic metric, for V/NV classification, exploiting the sparsity of the extracted feature using a signal adaptive dictionary motivated by speech production mechanism. The final classification is done using an adaptive threshold selection scheme, which uses the temporal information of speech signals. While existing methods of feature extraction use speech samples directly, proposed method performs V/NV detection in compressively sensed speech signals (requiring very less memory), where existing time or frequency domain detection methods are not directly applicable. Hence, this method can be effective for various speech applications. Performance of the proposed method is studied on CMU-ARCTIC database, for eight types of additive noises, taken from the NOISEX database, at different signal-to-noise ratios (SNRs). The proposed method performs similar or better compared to the existing methods, especially at lower SNRs and this provide compelling evidence of the effectiveness of sparse feature vector for V/NV detection.
european signal processing conference | 2015
Pulkit Sharma; Vinayak Abrol; Anil Kumar Sao
This paper proposes an approach based on compressed sensing to reduce the footprint of speech corpus in unit selection based speech synthesis (USS) systems. It exploits the observation that speech signal can have a sparse representation (in suitable choice of basis functions) and can be estimated effectively using the sparse coding framework. Thus, only few significant coefficients of the sparse vector needed to be stored instead of entire speech signal. During synthesis, speech signal can be reconstructed (with less error) using these significant coefficients only. Furthermore, the number of significant coefficients can be chosen adaptively based on type of segment such as voiced or unvoiced. Simulation results suggest that the proposed compression method effectively preserves most of the spectral information and can be used as an alternative to existing compression methods used in USS systems.
national conference on communications | 2015
Pulkit Sharma; Vinayak Abrol; Anil Kumar Sao
Supervised approaches for speech enhancement require models to be learned for different noisy environments, which is a difficult criterion to meet in practical scenarios. In this paper, compressed sensing (CS) based supervised speech enhancement approach is proposed, where model (dictionary) for noise is derived from the noisy speech signal. It exploits the observation that unvoiced/silence regions of noisy speech signal will be predominantly noise and a method is proposed to measure the same, thus eliminating pre-training of noise model. The proposed method is particularly effective in scenarios where noise type is not known a priori. Experimental results validate that the proposed approach can be an alternative to the existing approaches for speech enhancement.
international conference on acoustics, speech, and signal processing | 2017
Vinayak Abrol; Pulkit Sharma; Anil Kumar Sao
Extracting inherent patterns from large data using decompositions of data matrix by a sampled subset of exemplars has found many applications in machine learning. We propose a computationally efficient algorithm for adaptive exemplar sampling, called fast exemplar selection (FES). The proposed algorithm can be seen as an efficient variant of the oASIS algorithm [1]. FES iteratively selects incoherent exemplars based on the exemplars that are already sampled. This is done by ensuring that the selected exemplars forms a positive definite Gram matrix which is checked by exploiting its Cholesky factorization in an incremental manner. FES is a deterministic rank revealing algorithm delivering a tighter matrix approximation bound. Further, FES can also be used to exactly represent low rank matrices and signals sampled from a unions of independent subspaces. Experimental results show that FES performs comparable to existing methods for tasks such as matrix approximation, feature selection, outlier detection, and clustering.
IEEE Transactions on Audio, Speech, and Language Processing | 2017
Pulkit Sharma; Vinayak Abrol; Anil Kumar Sao
Features derived using sparse representation (SR)-based approaches have been shown to yield promising results for speech recognition tasks. In most of the approaches, the SR corresponding to speech signal is estimated using a dictionary, which could be either exemplar based or learned. However, a single-level decomposition may not be suitable for the speech signal, as it contains complex hierarchical information about various hidden attributes. In this paper, we propose to use a multilevel decomposition (having multiple layers), also known as the deep sparse representation (DSR), to derive a feature representation for speech recognition. Instead of having a series of sparse layers, the proposed framework employs a dense layer between two sparse layers, which helps in efficient implementation. Our studies reveal that the representations obtained at different sparse layers of the proposed DSR model have complimentary information. Thus, the final feature representation is derived after concatenating the representations obtained at the sparse layers. This results in a more discriminative representation, and improves the speech recognition performance. Since the concatenation results in a high-dimensional feature, principal component analysis is used to reduce the dimension of the obtained feature. Experimental studies demonstrate that the proposed feature outperforms existing features for various speech recognition tasks.
Speech Communication | 2016
Vinayak Abrol; Pulkit Sharma; Anil Kumar Sao
This paper proposes a greedy double sparse (DS) dictionary learning algorithm for speech signals, where the dictionary is the product of a predefined base dictionary, and a sparse matrix. Exploiting the DS structure, we show that the dictionary can be learned efficiently in the coefficient domain rather than the signal domain. It is achieved by modifying the objective function such that all the matrices involved in the coefficient domain are either sparse or near-sparse, thus making the dictionary update stage fast. The dictionary is learned on frames extracted from a speech signal using a hierarchical subset selection approach. Here, each dictionary atom is a training speech frame, chosen in accordance to its energy contribution for representing all other training speech frames. In other words, dictionary atoms are encouraged to be close to the training signals that uses them in their decomposition. After each atom update the modified residual serves as the new training data, thus the information learned by the previous atoms guides the update of subsequent dictionary atoms. In addition, we have shown that for a suitable choice of the base dictionary, storage efficiency of the DS dictionary can be further improved. Finally, the efficiency of the proposed method is demonstrated for the problem of speech representation and speech denoising.
Communication (NCC), 2016 Twenty Second National Conference on | 2016
Pulkit Sharma; Vinayak Abrol; Anil Kumar Sao
In this paper, we have employed learned dictionaries to compute sparse representation of speech utterances, which will be used to reduce the footprint of unit selection based speech synthesis (USS) systems. Speech database labeled at phoneme level is used to obtain multiple examples of the same phoneme, and all the examples (of each phoneme) are then used to learn a single overcomplete dictionary for the same phoneme. Two dictionary learning algorithms namely KSVD (K-singular value decomposition) and GAD (greedy adaptive dictionary) are employed to obtain respective sparse representations. The learned dictionaries are then used to compute the sparse vector for all the speech units corresponding to a speech utterance. Significant coefficients (along with their index locations) of the sparse vector and the learned dictionaries are stored instead of entire speech utterance. During synthesis, the speech waveform is synthesized using the significant coefficients of sparse vector and the corresponding dictionary. Experimental results demonstrate that the quality of the synthesized speech is better using the proposed approach while it achieves comparable compression to the existing compression methods employed in the USS systems.
Computer Speech & Language | 2018
Pulkit Sharma; Vinayak Abrol; Aroor Dinesh Dileep; Anil Kumar Sao
Abstract In this work, we propose sparse representation based features for speech units classification tasks. In order to effectively capture the variations in a speech unit, the proposed method employs multiple class specific dictionaries. Here, the training data belonging to each class is clustered into multiple clusters, and a principal component analysis (PCA) based dictionary is learnt for each cluster. It has been observed that coefficients corresponding to middle principal components can effectively discriminate among different speech units. Exploiting this observation, we propose to use a transformation function known as weighted decomposition (WD) of principal components, which is used to emphasize the discriminative information present in the PCA-based dictionary. In this paper, both raw speech samples and mel frequency cepstral coefficients (MFCC) are used as an initial representation for feature extraction. For comparison, various popular dictionary learning techniques such as K-singular value decomposition (KSVD), simultaneous codeword optimization (SimCO) and greedy adaptive dictionary (GAD) are also employed in the proposed framework. The effectiveness of the proposed features is demonstrated using continuous density hidden Markov model (CDHMM) based classifiers for (i) classification of isolated utterances of E-set of English alphabet, (ii) classification of consonant-vowel (CV) segments in Hindi language and (iii) classification of phoneme from TIMIT phonetic corpus.
european signal processing conference | 2016
Pulkit Sharma; Vinayak Abrol; Abhijeet Sachdev; Aroor Dinesh Dileep
In this paper, we propose to use a kernel sparse representation based classifier (KSRC) for the task of speech emotion recognition. Further, the recognition performance using the KSRC is improved by imposing a group sparsity constraint. The speech utterances with same emotion may have different duration, but the frame sequence information does not play a crucial role in this task. Hence, in this work, we propose to use dynamic kernels which explicitly models the variability in duration of speech signals. Experimental results demonstrate that, given a suitable kernel, KSRC with group sparsity constraint performs better as compared to the state-of-the-art support vector machines (SVM) based classifiers.