Abhinav Misra
University of Texas at Dallas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Abhinav Misra.
international conference on acoustics, speech, and signal processing | 2016
Chunlei Zhang; Shivesh Ranjan; Mahesh Kumar Nandwana; Qian Zhang; Abhinav Misra; Gang Liu; Finnian Kelly; John H. L. Hansen
Sustaining automatic speaker verification(ASV) systems from spoofing attacks remains an essential challenge, even if significant progress in ASV has been achieved in recent years. In this study, an automatic spoofing detection approach using an i-vector framework is proposed. Two approaches are used for frame-level feature extraction: cepstral-based Perceptual Minimum Variance Distortionless Response (PMVDR), and non-linear speech-production-motivated Teager Energy Operator (TEO) Critical Band (CB) Autocorrelation Envelope (Auto-Env). An utterance-level i-vector for each recording is formed by concatenating PMVDR and TEO-CB-Auto-Envi-vectors, followed by linear discriminative analysis (LDA) for maximizing the ratio of between-class to within-class scatterings. A Gaussian classifier and DNN are also investigated for back-end scoring. Experiments using the ASVspoof 2015 corpus show that our proposed method successfully detects spoofing attacks. By combining the TEO-CB-Auto-Env and PMVDR features, a relative 76.7% improvement in terms of EER is obtained compared with the best single-feature system.
spoken language technology workshop | 2014
Abhinav Misra; John H. L. Hansen
Compensation for mismatch between acoustic conditions in automatic speaker recognition has been widely addressed in recent years. However, performance degradation due to language mismatch has yet to be thoroughly addressed. In this study, we address langauge mismatch for speaker verification. We select bilingual speaker data from the NIST SRE 04-08 corpora and develop train/test-trials for language matched and mismatched conditions. We first show that language variability significantly degrades speaker recognition performance even with a state-of-the-art i-vector system. Next, we consider two ideas to improve performance: i) we introduce small amounts of multi-lingual speech data to the Probabilistic Linear Discriminant Analysis (PLDA) development set, and ii) explore phoneme level analysis to investigate the effect of language mismatch. It is shown that introducing small amounts of multi-lingual seed data within PLDA training has a significant improvement in speaker verification performance. Also, using data from the CRSS Bi-Ling corpus, we show how various phoneme classes affect speaker verification in language mismatch. This speech corpus consists of bilingual speakers who speak either Hindi or Mandarin, in addition to English. Using this corpus, we propose a novel phoneme histogram normalization technique to match the phonetic spaces of two different languages and show a +16.6% relative improvement in speaker verification performance in the presence of language mismatch.
international conference on acoustics, speech, and signal processing | 2016
Chengzhu Yu; Chunlei Zhang; Shivesh Ranjan; Qian Zhang; Abhinav Misra; Finnian Kelly; John H. L. Hansen
In this paper, we present the system developed by the Center for Robust Speech Systems (CRSS), University of Texas at Dallas, for the NIST 2015 language recognition i-vector machine learning challenge. Our system includes several subsystems, based on Linear Discriminant Analysis - Support Vector Machine (LDA-SVM) and deep neural network (DNN) approaches. An important feature of this challenge is the emphasis on out-of-set language detection. As a result, our system development focuses mainly on the evaluation and comparison of two different out-of-set language detection strategies: direct out-of-set detection and indirect out-of-set detection. These out-of-set detection strategies differ mainly on whether the unlabeled development data are used or not. The experimental results indicate that indirect out-of-set detection strategies used in our system could efficiently exploit the unlabeled development data, and therefore consistently outperform the direct out-of-set detection approach. Finally, by fusing four variants of indirect out-of-set detection based subsystems, our system achieves a relative performance gain of up to 45%, compared to the baseline cosine distance scoring (CDS) system provided by organizer.
spoken language technology workshop | 2014
Gang Liu; Chengzhu Yu; Navid Shokouhi; Abhinav Misra; Hua Xing; John H. L. Hansen
State-of-the-art speaker verification systems model speaker identity by mapping i-Vectors onto a probabilistic linear discriminant analysis (PLDA) space. Compared to other modeling approaches (such as cosine distance scoring), PLDA provides a more efficient mechanism to separate speaker information from other sources of undesired variabilities and offers superior speaker verification performance. Unfortunately, this efficiency is obtained at the cost of a required large corpus of labeled development data, which is too expensive/unrealistic in many cases. This study investigates a potential solution to resolve this challenge by effectively utilizing unlabeled development data with universal imposter clustering. The proposed method offers +21.9% and +34.6% relative gains versus the baseline system on two public available corpora, respectively. This significant improvement proves the effectiveness of the proposed method.
Odyssey 2016 | 2016
Abhinav Misra; Qian Zhang; Finnian Kelly; John H. L. Hansen
Linear Discriminant Analysis (LDA) is one of the most widely-used channel compensation techniques in current speaker and language recognition systems. In this study, we propose a technique of Between-Class Covariance Correction (BCC) to improve language recognition performance. This approach builds on the idea of WithinClass Covariance Correction (WCC), which was introduced as a means to compensate for mismatch between different development data-sets in speaker recognition. In BCC, we compute eigendirections representing the multimodal distributions of language i-vectors, and show that incorporating these directions in LDA leads to an improvement in recognition performance. Considering each cluster in the multi-modal i-vector distribution as a separate class, the betweenand within-cluster covariance matrices are used to update the global between-language covariance. This is in contrast to WCC, for which the within-class covariance is updated. Using the proposed method, a relative overall improvement of +8.4% Equal Error Rate (EER) is obtained on the 2015 NIST Language Recognition Evaluation (LRE) data. Our approach offers insights toward addressing the challenging problem of mismatch compensation, which has much wider applications in both speaker and language recognition.
Speech Communication | 2018
Abhinav Misra; John H. L. Hansen
Abstract Language mismatch represents one of the more difficult challenges in achieving effective speaker verification in naturalistic audio streams. The portion of bi-lingual speakers worldwide continues to grow making speaker verification for speech technology more difficult. In this study, three specific methods are proposed to address this issue. Experiments are conducted on the PRISM (Promoting Robustness in Speaker Modeling) evaluation-set. We first show that adding small amounts of multi-lingual seed data to the Probabilistic Linear Discriminant Analysis (PLDA) development set, leads to a significant relative improvement of +17.96% in system Equal Error Rate (EER). Second, we compute the eigendirections that represent the distribution of multi-lingual data added to PLDA. We show that by adding these new eigendirections as part of the Linear Discriminant Analysis (LDA), and then minimizing them to directly compensate for language mismatch, further performance gains for speaker verification are achieved. By combining both multi-lingual PLDA and this minimization step with the new set of eigendirections, we obtain a +26.03% relative improvement in EER. In practical scenarios, it is highly unlikely that multi-lingual seed data representing the languages present in the test-set would be available. Hence, in the third phase, we address such scenarios, by proposing a method for Locally Weighted Linear Discriminant Analysis (LWLDA). In this third method, we reformulate the LDA equations to incorporate a local affine transform that weighs the same speaker samples. This method effectively preserves the local intrinsic information represented by the multimodal structure of the within-speaker scatter matrix, thereby helping to improve the class discriminating ability of LDA. It also helps in extending the ability of LDA to transform the speaker i-Vectors to dimensions that are greater than the total number of speaker classes. Using LWLDA, a relative improvement of +8.54% is obtained in system EER. LWLDA provides even more gain when multi-lingual seed data is available, and improves the system peformance by relative +26.03% in terms of EER. We also compare LWLDA to the recently proposed Nearest Neighbor Non-Parametric Discriminant Analysis (NDA). We show that not only is LWLDA better than NDA in terms of system performance but is also computationally less expensive. Comparative studies on DARPA Robust Automatic Transcription of Speech (RATS) corpus also show that LWLDA consistently outperforms NDA and LDA on different evaluation conditions. Our solutions offer new directions for addressing a challenging problem which has received limited attention in the speaker recognition community.
Archive | 2014
Gang Liu; Chengzhu Yu; Abhinav Misra; Navid Shokouhi; John H. L. Hansen
conference of the international speech communication association | 2017
Shivesh Ranjan; Abhinav Misra; John H. L. Hansen
conference of the international speech communication association | 2015
Abhinav Misra; Shivesh Ranjan; Chunlei Zhang; John H. L. Hansen
IEEE Transactions on Audio, Speech, and Language Processing | 2018
Abhinav Misra; John H. L. Hansen