Giulia Garau
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Giulia Garau.
international conference on acoustics, speech, and signal processing | 2007
Thomas Hain; Vincent Wan; Lukas Burget; Martin Karafiát; John Dines; Jithendra Vepa; Giulia Garau; Mike Lincoln
In this paper we describe the 2005 AMI system for the transcription of speech in meetings used in the 2005 NIST RT evaluations. The system was designed for participation in the speech to text part of the evaluations, in particular for transcription of speech recorded with multiple distant microphones and independent headset microphones. System performance was tested on both conference room and lecture style meetings. Although input sources are processed using different front-ends, the recognition process is based on a unified system architecture. The system operates in multiple passes and makes use of state of the art technologies such as discriminative training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, speaker adaptation with maximum likelihood linear regression and minimum word error rate decoding. In this paper we describe the system performance on the official development and test sets for the NIST RT05s evaluations. The system was jointly developed in less than 10 months by a multi-site team and was shown to achieve competitive performance.
Multimodal Technologies for Perception of Humans | 2008
Thomas Hain; Lukas Burget; John Dines; Giulia Garau; Martin Karafiát; David A. van Leeuwen; Mike Lincoln; Vincent Wan
Meeting transcription is one of the main tasks for large vocabulary automatic speech recognition (ASR) and is supported by several large international projects in the area. The conversational nature, the difficult acoustics, and the necessity of high quality speech transcripts for higher level processing make ASR of meeting recordings an interesting challenge. This paper describes the development and system architecture of the 2007 AMIDA meeting transcription system, the third of such systems developed in a collaboration of six research sites. Different variants of the system participated in all speech to text transcription tasks of the 2007 NIST RT evaluations and showed very competitive performance. The best result was obtained on close-talking microphone data where a final word error rate of 24.9% was obtained.
international conference on machine learning | 2005
Thomas Hain; Lukas Burget; John Dines; Iain A. McCowan; Giulia Garau; Martin Karafiát; Mike Lincoln; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals
This paper describes the AMI transcription system for speech in meetings developed in collaboration by five research groups. The system includes generic techniques such as discriminative and speaker adaptive training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, maximum likelihood linear regression, and phone posterior based features, as well as techniques specifically designed for meeting data. These include segmentation and cross-talk suppression, beam-forming, domain adaptation, Web-data collection, and channel adaptive training. The system was improved by more than 20% relative in word error rate compared to our previous system and was used in the NIST RT106 evaluations where it was found to yield competitive performance.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Giulia Garau; Steve Renals
In this paper, we investigate the combination of complementary acoustic feature streams in large-vocabulary continuous speech recognition (LVCSR). We have explored the use of acoustic features obtained using a pitch-synchronous analysis, Straight, in combination with conventional features such as Mel frequency cepstral coefficients. Pitch-synchronous acoustic features are of particular interest when used with vocal tract length normalization (VTLN) which is known to be affected by the fundamental frequency. We have combined these spectral representations directly at the acoustic feature level using heteroscedastic linear discriminant analysis (HLDA) and at the system level using ROVER. We evaluated this approach on three LVCSR tasks: dictated newspaper text (WSJCAM0), conversational telephone speech (CTS), and multiparty meeting transcription. The CTS and meeting transcription experiments were both evaluated using standard NIST test sets and evaluation protocols. Our results indicate that combining conventional and pitch-synchronous acoustic feature sets using HLDA results in a consistent, significant decrease in word error rate across all three tasks. Combining at the system level using ROVER resulted in a further significant decrease in word error rate.
Lecture Notes in Computer Science | 2006
Thomas Hain; Lukas Burget; John Dines; Giulia Garau; Martin Karafiát; Mike Lincoln; Lain Mccowan; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals
Lecture Notes in Computer Science | 2006
Thomas Hain; Lukas Burget; John Dines; Giulia Garau; Martin Karafiát; Mike Lincoln; Jithendra Vepa; Vincent Wan
conference of the international speech communication association | 2005
Thomas Hain; John Dines; Giulia Garau; Martin Karafiát; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals
international conference on machine learning | 2006
Thomas Hain; Lukas Burget; John Dines; Giulia Garau; Martin Karafiát; Mike Lincoln; Jithendra Vepa; Vincent Wan
conference of the international speech communication association | 2010
Alfred Dielmann; Giulia Garau
conference of the international speech communication association | 2005
Thomas Hain; John Dines; Giulia Garau; Martin Karafiát; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals