Lakshmi Saheer
Idiap Research Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lakshmi Saheer.
international conference on acoustics, speech, and signal processing | 2010
Hui Liang; John Dines; Lakshmi Saheer
The EMIME project aims to build a personalized speech-to-speech translator, such that spoken input of a user in one language is used to produce spoken output that still sounds like the users voice however in another language. This distinctiveness makes unsupervised cross-lingual speaker adaptation one key to the projects success. So far, research has been conducted into unsupervised and cross-lingual cases separately by means of decision tree marginalization and HMM state mapping respectively. In this paper we combine the two techniques to perform unsupervised cross-lingual speaker adaptation. The performance of eight speaker adaptation systems (supervised vs. unsupervised, intra-lingual vs. cross-lingual) is compared using objective and subjective evaluations. Experimental results show the performance of unsupervised cross-lingual speaker adaptation is comparable to that of the supervised case in terms of spectrum adaptation in the EMIME scenario, even though automatically obtained transcriptions have a very high phoneme error rate.
international conference on acoustics, speech, and signal processing | 2010
Lakshmi Saheer; Philip N. Garner; John Dines; Hui Liang
The advent of statistical speech synthesis has enabled the unification of the basic techniques used in speech synthesis and recognition. Adaptation techniques that have been successfully used in recognition systems can now be applied to synthesis systems to improve the quality of the synthesized speech. The application of vocal tract length normalization (VTLN) for synthesis is explored in this paper. VTLN based adaptation requires estimation of a single warping factor, which can be accurately estimated from very little adaptation data and gives additive improvements over CMLLR adaptation. The challenge of estimating accurate warping factors using higher order features is solved by initializing warping factor estimation with the values calculated from lower order features.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Lakshmi Saheer; John Dines; Philip N. Garner
Vocal tract length normalization (VTLN) has been successfully used in automatic speech recognition for improved performance. The same technique can be implemented in statistical parametric speech synthesis for rapid speaker adaptation during synthesis. This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing VTLN for synthesis. Jacobian normalization, high-dimensionality features and truncation of the transformation matrix are a few challenges presented with the appropriate solutions. Detailed evaluations are performed to estimate the most suitable technique for using VTLN in speech synthesis. Evaluating VTLN in the framework of speech synthesis is also not an easy task since the technique does not work equally well for all speakers. Speakers have been selected based on different objective and subjective criteria to demonstrate the difference between systems. The best method for implementing VTLN is confirmed to be use of the lower order features for estimating warping factors.
Computer Speech & Language | 2013
John Dines; Hui Liang; Lakshmi Saheer; Matthew Gibson; William Byrne; Keiichiro Oura; Keiichi Tokuda; Junichi Yamagishi; Simon King; Mirjam Wester; Teemu Hirsimäki; Reima Karhila; Mikko Kurimo
In this paper we present results of unsupervised cross-lingual speaker adaptation applied to text-to-speech synthesis. The application of our research is the personalisation of speech-to-speech translation in which we employ a HMM statistical framework for both speech recognition and synthesis. This framework provides a logical mechanism to adapt synthesised speech output to the voice of the user by way of speech recognition. In this work we present results of several different unsupervised and cross-lingual adaptation approaches as well as an end-to-end speaker adaptive speech-to-speech translation system. Our experiments show that we can successfully apply speaker adaptation in both unsupervised and cross-lingual scenarios and our proposed algorithms seem to generalise well for several language pairs. We also discuss important future directions including the need for better evaluation metrics.
international conference on acoustics, speech, and signal processing | 2012
Lakshmi Saheer; Junichi Yamagishi; Philip N. Garner; John Dines
Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLR-based adaptation techniques, being much closer in quality to that generated by the original average voice model. However with only a single parameter, VTLN captures very few speaker specific characteristics when compared to linear transform based adaptation techniques. This paper proposes that the merits of VTLN can be combined with those of linear transform based adaptation in a hierarchial Bayesian framework, where VTLN is used as the prior information. A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented. Experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity.
Proceedings of the 7th ISCA Speech Synthesis Workshop | 2010
Mirjam Wester; John Dines; Matthew Gibson; Hui Liang; Yi-Jian Wu; Lakshmi Saheer; Simon King; Keiichiro Oura; Philip N. Garner; William Byrne; Yong Guan; Teemu Hirsimäki; Reima Karhila; Mikko Kurimo; Matt Shannon; Sayaka Shiota; Jilei Tian; Keiichi Tokuda; Junichi Yamagishi
meeting of the association for computational linguistics | 2010
Mikko Kurimo; William Byrne; John Dines; Philip N. Garner; Matthew Gibson; Yong Guan; Teemu Hirsimäki; Reima Karhila; Simon King; Hui Liang; Keiichiro Oura; Lakshmi Saheer; Matt Shannon; Sayaki Shiota; Jilei Tian
Proceedings of ISCA Speech Synthesis Workshop | 2010
Lakshmi Saheer; John Dines; Philip N. Garner; Hui Liang
conference of the international speech communication association | 2009
John Dines; Lakshmi Saheer; Hui Liang
Archive | 2012
Lakshmi Saheer; Hui Liang; John Dines; Philip N. Garner