Federico Flego
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Federico Flego.
Computer Speech & Language | 2010
Mark J. F. Gales; Federico Flego
Discriminative classifiers are a popular approach to solving classification problems. However, one of the problems with these approaches, in particular kernel based classifiers such as support vector machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. This paper describes a scheme for overcoming this problem for speech recognition in noise by adapting the kernel rather than the SVM decision boundary. Generative kernels, defined using generative models, are one type of kernel that allows SVMs to handle sequence data. By compensating the parameters of the generative models for each noise condition noise-specific generative kernels can be obtained. These can be used to train a noise-independent SVM on a range of noise conditions, which can then be used with a test-set noise kernel for classification. The noise-specific kernels used in this paper are based on Vector Taylor Series (VTS) model-based compensation. VTS allows all the model parameters to be compensated and the background noise to be estimated in a maximum likelihood fashion. A brief discussion of VTS, and the optimisation of the mismatch function representing the impact of noise on the clean speech, is also included. Experiments using these VTS-based test-set noise kernels were run on the AURORA 2 continuous digit task. The proposed SVM rescoring scheme yields large gains in performance over the VTS compensated models.
ieee automatic speech recognition and understanding workshop | 2009
Federico Flego; Mark J. F. Gales
Adaptive training is a powerful approach for building speech recognition systems on non-homogeneous training data. Recently approaches based on predictive model-based compensation schemes, such as Joint Uncertainty Decoding (JUD) and Vector Taylor Series (VTS), have been proposed. This paper reviews these model-based compensation schemes and relates them to factor-analysis style systems. Forms of Maximum Likelihood (ML) adaptive training with these approaches are described, based on both second-order optimisation schemes and Expectation Maximisation (EM). However, discriminative training is used in many state-of-the-art speech recognition. Hence, this paper proposes discriminative adaptive training with predictive model-compensation approaches for noise robust speech recognition. This training approach is applied to both JUD and VTS compensation with minimum phone error training. A large scale multi-environment training configuration is used and the systems evaluated on a range of in-car collected data tasks.
international conference on acoustics, speech, and signal processing | 2009
Federico Flego; Mark J. F. Gales
Model compensation schemes are a powerful approach to handling mismatches between training and testing conditions. Normally these schemes are run in a batch adaptation mode, re-recognising the utterance used to estimate the noise model parameters. For many applications this introduces unacceptable latency. This paper examines three forms of incremental mode model-based compensation: vector Taylor series; joint uncertainty decoding; and predictive CMLLR. These predictive schemes can also be combined with adaptive schemes such as CMLLR. By combining the approaches, weaknesses of each can be addressed. The performance is evaluated on in-car recorded data, where the combined incremental scheme shows gains over either individually.
international conference on acoustics, speech, and signal processing | 2011
Federico Flego; Mark J. F. Gales
Model based compensation schemes are a powerful approach for noise robust speech recognition. Recently there have been a number of investigations into adaptive training, and estimating the noise models used for model adaptation. This paper examines the use of EM-based schemes for both canonical models and noise estimation, including discriminative adaptive training. One issue that arises when estimating the noise model is a mismatch between the noise estimation approximation and final model compensation scheme. This paper proposes FA-style compensation where this mismatch is eliminated, though at the expense of a sensitivity to the initial noise estimates. EM-based discriminative adaptive training is evaluated on in-car and Aurora4 tasks. FA-style compensation is then evaluated in an incremental mode on the in-car task.
international conference on acoustics, speech, and signal processing | 2012
Federico Flego; Mark J. F. Gales
Vector Taylor Series (VTS) model based compensation is a powerful approach for noise robust speech recognition. An important extension to this approach is VTS adaptive training (VAT), which allows canonical models to be estimated on diverse noise-degraded training data. These canonical model can be estimated using EM-based approaches, allowing simple extensions to discriminative VAT (DVAT). However to ensure a diagonal corrupted speech covariance matrix the Jacobian (loading matrix) relating the noise and clean speech is diagonalised. In this work an approach for yielding optimal diagonal loading matrices based on minimising the expected KL-divergence between the diagonal loading matrix and “correct” distributions is proposed. The performance of DVAT using the standard and optimal diagonalisation was evaluated on both in-car collected data and the Aurora4 task.
international conference on acoustics, speech, and signal processing | 2009
Mark J. F. Gales; Federico Flego
It is difficult to adapt discriminative classifiers, particularly kernel based ones such as support vector machines (SVMs), to handle mismatches between the training and test data. In previous work adaptation was performed by modifying the kernel used with the SVM, rather changing the SVM parameters themselves. However an idealised form of compensation, single pass retraining, was used to alter the generative models associated with the generative kernel. In this paper vector Taylor series model compensation is used. This scheme is more efficient and allows a noise model to be estimated. The performance of the new scheme is evaluated on two continuous digit tasks. On both tasks SVM-rescoring outperformed the baseline VTS compensated models.
conference of the international speech communication association | 2009
Rogier C. van Dalen; Federico Flego; Mark J. F. Gales
workshop on statistical machine translation | 2013
Juan Pino; Aurelien Waite; Tong Xiao; Adrià de Gispert; Federico Flego; William Byrne
conference of the international speech communication association | 2015
Xunying Liu; Federico Flego; Linlin Wang; Chao Zhang; Mark J. F. Gales; Philip C. Woodland
conference of the international speech communication association | 2012
Mark J. F. Gales; Federico Flego