Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Giulia Garau is active.

Publication


Featured researches published by Giulia Garau.


international conference on acoustics, speech, and signal processing | 2007

The AMI System for the Transcription of Speech in Meetings

Thomas Hain; Vincent Wan; Lukas Burget; Martin Karafiát; John Dines; Jithendra Vepa; Giulia Garau; Mike Lincoln

In this paper we describe the 2005 AMI system for the transcription of speech in meetings used in the 2005 NIST RT evaluations. The system was designed for participation in the speech to text part of the evaluations, in particular for transcription of speech recorded with multiple distant microphones and independent headset microphones. System performance was tested on both conference room and lecture style meetings. Although input sources are processed using different front-ends, the recognition process is based on a unified system architecture. The system operates in multiple passes and makes use of state of the art technologies such as discriminative training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, speaker adaptation with maximum likelihood linear regression and minimum word error rate decoding. In this paper we describe the system performance on the official development and test sets for the NIST RT05s evaluations. The system was jointly developed in less than 10 months by a multi-site team and was shown to achieve competitive performance.


Multimodal Technologies for Perception of Humans | 2008

The 2007 AMI(DA) System for Meeting Transcription

Thomas Hain; Lukas Burget; John Dines; Giulia Garau; Martin Karafiát; David A. van Leeuwen; Mike Lincoln; Vincent Wan

Meeting transcription is one of the main tasks for large vocabulary automatic speech recognition (ASR) and is supported by several large international projects in the area. The conversational nature, the difficult acoustics, and the necessity of high quality speech transcripts for higher level processing make ASR of meeting recordings an interesting challenge. This paper describes the development and system architecture of the 2007 AMIDA meeting transcription system, the third of such systems developed in a collaboration of six research sites. Different variants of the system participated in all speech to text transcription tasks of the 2007 NIST RT evaluations and showed very competitive performance. The best result was obtained on close-talking microphone data where a final word error rate of 24.9% was obtained.


international conference on machine learning | 2005

The development of the AMI system for the transcription of speech in meetings

Thomas Hain; Lukas Burget; John Dines; Iain A. McCowan; Giulia Garau; Martin Karafiát; Mike Lincoln; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals

This paper describes the AMI transcription system for speech in meetings developed in collaboration by five research groups. The system includes generic techniques such as discriminative and speaker adaptive training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, maximum likelihood linear regression, and phone posterior based features, as well as techniques specifically designed for meeting data. These include segmentation and cross-talk suppression, beam-forming, domain adaptation, Web-data collection, and channel adaptive training. The system was improved by more than 20% relative in word error rate compared to our previous system and was used in the NIST RT106 evaluations where it was found to yield competitive performance.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

Giulia Garau; Steve Renals

In this paper, we investigate the combination of complementary acoustic feature streams in large-vocabulary continuous speech recognition (LVCSR). We have explored the use of acoustic features obtained using a pitch-synchronous analysis, Straight, in combination with conventional features such as Mel frequency cepstral coefficients. Pitch-synchronous acoustic features are of particular interest when used with vocal tract length normalization (VTLN) which is known to be affected by the fundamental frequency. We have combined these spectral representations directly at the acoustic feature level using heteroscedastic linear discriminant analysis (HLDA) and at the system level using ROVER. We evaluated this approach on three LVCSR tasks: dictated newspaper text (WSJCAM0), conversational telephone speech (CTS), and multiparty meeting transcription. The CTS and meeting transcription experiments were both evaluated using standard NIST test sets and evaluation protocols. Our results indicate that combining conventional and pitch-synchronous acoustic feature sets using HLDA results in a consistent, significant decrease in word error rate across all three tasks. Combining at the system level using ROVER resulted in a further significant decrease in word error rate.


Lecture Notes in Computer Science | 2006

The 2005 AMI system for the transcription of speech in meetings

Thomas Hain; Lukas Burget; John Dines; Giulia Garau; Martin Karafiát; Mike Lincoln; Lain Mccowan; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals


Lecture Notes in Computer Science | 2006

The AMI meeting transcription system : Progress and performance

Thomas Hain; Lukas Burget; John Dines; Giulia Garau; Martin Karafiát; Mike Lincoln; Jithendra Vepa; Vincent Wan


conference of the international speech communication association | 2005

Transcription of conference room meetings: an investigation

Thomas Hain; John Dines; Giulia Garau; Martin Karafiát; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals


international conference on machine learning | 2006

The AMI meeting transcription system: progress and performance

Thomas Hain; Lukas Burget; John Dines; Giulia Garau; Martin Karafiát; Mike Lincoln; Jithendra Vepa; Vincent Wan


conference of the international speech communication association | 2010

Floor Holder Detection and End of Speaker Turn Prediction in Meetings

Alfred Dielmann; Giulia Garau


conference of the international speech communication association | 2005

Proceedings of the 9th European Conference on Speech Communication and Technology

Thomas Hain; John Dines; Giulia Garau; Martin Karafiát; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals

Collaboration


Dive into the Giulia Garau's collaboration.

Top Co-Authors

Avatar

Thomas Hain

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar

John Dines

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Martin Karafiát

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Steve Renals

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Lukas Burget

Brno University of Technology

View shared research outputs
Top Co-Authors

Avatar

Mike Lincoln

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Darren Moore

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Jithendra Vepa

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Iain A. McCowan

Queensland University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge