Ahilan Kanagasundaram
Queensland University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ahilan Kanagasundaram.
Speech Communication | 2014
Ahilan Kanagasundaram; David Dean; Sridha Sridharan; Javier Gonzalez-Dominguez; Joaquin Gonzalez-Rodriguez; Daniel Ramos
This paper proposes techniques to improve the performance of i-vector based speaker verification systems when only short utterances are available. Short-length utterance i-vectors vary with speaker, session variations, and the phonetic content of the utterance. Well established methods such as linear discriminant analysis (LDA), source-normalized LDA (SN-LDA) and within-class covariance normalization (WCCN) exist for compensating the session variation but we have identified the variability introduced by phonetic content due to utterance variation as an additional source of degradation when short-duration utterances are used. To compensate for utterance variations in short i-vector speaker verification systems using cosine similarity scoring (CSS), we have introduced a short utterance variance normalization (SUVN) technique and a short utterance variance (SUV) modelling approach at the i-vector feature level. A combination of SUVN with LDA and SN-LDA is proposed to compensate the session and utterance variations and is shown to provide improvement in performance over the traditional approach of using LDA and/or SN-LDA followed by WCCN. An alternative approach is also introduced using probabilistic linear discriminant analysis (PLDA) approach to directly model the SUV. The combination of SUVN, LDA and SN-LDA followed by SUV PLDA modelling provides an improvement over the baseline PLDA approach. We also show that for this combination of techniques, the utterance variation information needs to be artificially added to full-length i-vectors for PLDA modelling.
international conference on acoustics, speech, and signal processing | 2012
Ahilan Kanagasundaram; David Dean; Robbie Vogt; Mitchell McLaren; Sridha Sridharan; Michael Mason
This paper introduces the Weighted Linear Discriminant Analysis (WLDA) technique, based upon the weighted pairwise Fisher criterion, for the purposes of improving i-vector speaker verification in the presence of high inter-session variability. By taking advantage of the speaker discriminative information that is available in the distances between pairs of speakers clustered in the development i-vector space, the WLDA technique is shown to provide an improvement in speaker verification performance over traditional Linear Discriminant Analysis (LDA) approaches. A similar approach is also taken to extend the recently developed Source Normalised LDA (SNLDA) into Weighted SNLDA (WSNLDA) which, similarly, shows an improvement in speaker verification performance in both matched and mismatched enrolment/verification conditions. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that both WLDA and WSNLDA are viable as replacement techniques to improve the performance of LDA and SNLDA-based i-vector speaker verification.
Computer Speech & Language | 2018
Hafizur Rahman; Ahilan Kanagasundaram; Ivan Himawan; David Dean; Sridha Sridharan
Domain mismatch significantly affects the speaker verification performance.Domain invariant linear discriminant analysis (DI-LDA) for compensating domain mismatch in the LDA subspace.Domain invariant probabilistic linear discriminant analysis (DI-PLDA) for domain mismatch modelling n the PLDA subspace.DI-LDA approach followed by the DI-PLDA (DI-PLDA[DI-LDA]) to compensate domain mismatch from both LDA and PLDA subspaces.Limited target domain data requirement using domain mismatch compensation techniques. The performance of state-of-the-art i-vector speaker verification systems relies on a large amount of training data for probabilistic linear discriminant analysis (PLDA) modeling. During the evaluation, it is also crucial that the target condition data is matched well with the development data used for PLDA training. However, in many practical scenarios, these systems have to be developed, and trained, using data that is often outside the domain of the intended application, since the collection of a significant amount of in-domain data is often difficult. Experimental studies have found that PLDA speaker verification performance degrades significantly due to this development/evaluation mismatch. This paper introduces a domain-invariant linear discriminant analysis (DI-LDA) technique for out-domain PLDA speaker verification that compensates domain mismatch in the LDA subspace. We also propose a domain-invariant probabilistic linear discriminant analysis (DI-PLDA) technique for domain mismatch modeling in the PLDA subspace, using only a small amount of in-domain data. In addition, we propose the sequential and score-level combination of DI-LDA, and DI-PLDA to further improve out-domain speaker verification performance. Experimental results show the proposed domain mismatch compensation techniques yield at least 27% and 14.5% improvement in equal error rate (EER) over a pooled PLDA system for telephone-telephone and interview-interview conditions, respectively. Finally, we show that the improvement over the baseline pooled system can be attained even when significantly reducing the number of in-domain speakers, down to 30 in most of the evaluation conditions.
conference of the international speech communication association | 2016
Houman Ghaemmaghami; Md. Hafizur Rahman; Ivan Himawan; David Dean; Ahilan Kanagasundaram; Sridha Sridharan; Clinton Fookes
This paper presents the QUT speaker recognition system, as a competing system in the Speakers In The Wild (SITW) speaker recognition challenge. Our proposed system achieved an overall ranking of second place, in the main core-core condition evaluations of the SITW challenge. This system uses an ivector/ PLDA approach, with domain adaptation and a deep neural network (DNN) trained to provide feature statistics. The statistics are accumulated by using class posteriors from the DNN, in place of GMM component posteriors in a typical GMM UBM i-vector/PLDA system. Once the statistics have been collected, the i-vector computation is carried out as in a GMM-UBM based system. We apply domain adaptation to the extracted i-vectors to ensure robustness against dataset variability, PLDA modelling is used to capture speaker and session variability in the i-vector space, and the processed i-vectors are compared using the batch likelihood ratio. The final scores are calibrated to obtain the calibrated likelihood scores, which are then used to carry out speaker recognition and evaluate the performance of the system. Finally, we explore the practical application of our system to the core-multi condition recordings of the SITW data and propose a technique for speaker recognition in recordings with multiple speakers.
conference of the international speech communication association | 2016
Ahilan Kanagasundaram; David Dean; Sridha Sridharan; Clinton Fookes; Ivan Himawan
This paper analyses the short utterance probabilistic linear discriminant analysis (PLDA) speaker verification with utterance partitioning and short utterance variance (SUV) modelling approaches. Experimental studies have found that instead of using single long-utterance as enrolment data, if long enrolled utterance is partitioned into multiple short utterances and average of short utterance i-vectors is used as enrolled data, that improves the Gaussian PLDA (GPLDA) speaker verification. This is because short utterance i-vectors have speaker, session and utterance variations, and utterance-partitioning approach compensates the utterance variation. Subsequently, SUV-PLDA is also studied with utterance partitioning approach, and utterance partitioning-based SUV-GPLDA system shows relative improvement of 9% and 16% in EER for NIST 2008 and NIST 2010 truncated 10sec-10sec evaluation condition as utterance partitioning approach compensates the utterance variation and SUV modelling approach compensates the mismatch between full-length development data and short-length evaluation data.
Faculty of Built Environment and Engineering; Information Security Institute | 2011
Ahilan Kanagasundaram; Robbie Vogt; David Dean; Sridha Sridharan; Michael Mason
arXiv: Sound | 2012
Ahilan Kanagasundaram; Robert J. Vogt; David Dean; Sridha Sridharan
conference of the international speech communication association | 2013
Ahilan Kanagasundaram; David Dean; Javier Gonzalez-Dominguez; Sridha Sridharan; Daniel Ramos; Joaquin Gonzalez-Rodriguez
Odyssey | 2012
Ahilan Kanagasundaram; Robbie Vogt; David Dean; Sridha Sridharan
Information Security Institute; Science & Engineering Faculty | 2013
Ahilan Kanagasundaram; David Dean; Javier Gonzalez-Dominguez; Sridha Sridharan; Daniel Ramos; Joaquin Gonzalez-Rodriguez