Mike Lincoln
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mike Lincoln.
international conference on acoustics, speech, and signal processing | 2007
Thomas Hain; Vincent Wan; Lukas Burget; Martin Karafiát; John Dines; Jithendra Vepa; Giulia Garau; Mike Lincoln
In this paper we describe the 2005 AMI system for the transcription of speech in meetings used in the 2005 NIST RT evaluations. The system was designed for participation in the speech to text part of the evaluations, in particular for transcription of speech recorded with multiple distant microphones and independent headset microphones. System performance was tested on both conference room and lecture style meetings. Although input sources are processed using different front-ends, the recognition process is based on a unified system architecture. The system operates in multiple passes and makes use of state of the art technologies such as discriminative training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, speaker adaptation with maximum likelihood linear regression and minimum word error rate decoding. In this paper we describe the system performance on the official development and test sets for the NIST RT05s evaluations. The system was jointly developed in less than 10 months by a multi-site team and was shown to achieve competitive performance.
conference on computers and accessibility | 2002
Stephen J. Cox; Mike Lincoln; Judy Tryggvason; Melanie Nakisa; Mark Wells; Marcus Tutt; Sanja Abbott
TESSA is an experimental system that aims to aid transactions between a deaf person and a clerk in a Post Office by translating the clerks speech to sign language. A speech recogniser recognises speech from the clerk and the system then synthesizes the appropriate sequence of signs in British Sign language (BSL) using a specially-developed avatar. By using a phrase lookup approach to language translation, which is appropriate for the highly constrained discourse in a Post Office, we were able to build a working system that we could evaluate. We summarise the results of this evaluation (undertaken by deaf users and Post office clerks), and discuss how the findings from the evaluation are being used in the development of an improved system.
ieee automatic speech recognition and understanding workshop | 2005
Mike Lincoln; Iain A. McCowan; Jithendra Vepa; Hari Krishna Maganti
The recognition of speech in meetings poses a number of challenges to current automatic speech recognition (ASR) techniques. Meetings typically take place in rooms with non-ideal acoustic conditions and significant background noise, and may contain large sections of overlapping speech. In such circumstances, headset microphones have to date provided the best recognition performance, however participants are often reluctant to wear them. Microphone arrays provide an alternative to close-talking microphones by providing speech enhancement through directional discrimination. Unfortunately, however, development of array front-end systems for state-of-the-art large vocabulary continuous speech recognition suffers from a lack of necessary resources, as most available speech corpora consist only of single-channel recordings. This paper describes the collection of an audio-visual corpus of read speech from a number of instrumented meeting rooms. The corpus, based on the WSJCAM0 database, is suitable for use in continuous speech recognition experiments and is captured using a variety of microphones, including arrays, as well as close-up and wider angle cameras. The paper also describes some initial ASR experiments on the corpus comparing the use of close-talking microphones with both a fixed and a blind array beamforming technique
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Thomas Hain; Lukas Burget; John Dines; Philip N. Garner; Frantisek Grezl; Asmaa El Hannani; Marijn Huijbregts; Martin Karafiát; Mike Lincoln; Vincent Wan
In this paper, we give an overview of the AMIDA systems for transcription of conference and lecture room meetings. The systems were developed for participation in the Rich Transcription evaluations conducted by the National Institute for Standards and Technology in the years 2007 and 2009 and can process close talking and far field microphone recordings. The paper first discusses fundamental properties of meeting data with special focus on the AMI/AMIDA corpora. This is followed by a description and analysis of improved processing and modeling, with focus on techniques specifically addressing meeting transcription issues such as multi-room recordings or domain variability. In 2007 and 2009, two different strategies of systems building were followed. While in 2007 we used our traditional style system design based on cross adaptation, the 2009 systems were constructed semi-automatically, supported by improved decoders and a new method for system representation. Overall these changes gave a 6%-13% relative reduction in word error rate compared to our 2007 results while at the same time requiring less training material and reducing the real-time factor by five times. The meeting transcription systems are available at www.webasr.org.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Iain A. McCowan; Mike Lincoln; Ivan Himawan
This correspondence presents a microphone array shape calibration procedure for diffuse noise environments. The procedure estimates intermicrophone distances by fitting the measured noise coherence with its theoretical model and then estimates the array geometry using classical multidimensional scaling. The technique is validated on noise recordings from two office environments.
international conference on machine learning | 2005
Thomas Hain; Lukas Burget; John Dines; Iain A. McCowan; Giulia Garau; Martin Karafiát; Mike Lincoln; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals
This paper describes the AMI transcription system for speech in meetings developed in collaboration by five research groups. The system includes generic techniques such as discriminative and speaker adaptive training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, maximum likelihood linear regression, and phone posterior based features, as well as techniques specifically designed for meeting data. These include segmentation and cross-talk suppression, beam-forming, domain adaptation, Web-data collection, and channel adaptive training. The system was improved by more than 20% relative in word error rate compared to our previous system and was used in the NIST RT106 evaluations where it was found to yield competitive performance.
international conference on acoustics, speech, and signal processing | 2010
Erich Zwyssig; Mike Lincoln; Steve Renals
In this paper, the design, implementation and testing of a digital microphone array is presented. The array uses digital MEMS microphones which integrate the microphone, amplifier and analogue to digital converter on a single chip in place of the analogue microphones and external audio interfaces currently used. The device has the potential to be smaller, cheaper and more flexible than typical analogue arrays, however the effect on speech recognition performance of using digital microphones is as yet unknown. In order to evaluate the effect, an analogue array and the new digital array are used to simultaneously record test data for a speech recognition experiment. Initial results employing no adaptation show that performance using the digital array is significantly worse (14% absolute WER) than the analogue device. Subsequent experiments using MLLR and CMLLR channel adaptation reduce this gap, and employing MLLR for both channel and speaker adaptation reduces the difference between the arrays to 4.5% absolute WER.
International Journal of Human-computer Interaction | 2003
Stephen J. Cox; Mike Lincoln; Judy Tryggvason; Melanie Nakisa; Mark Wells; Marcus Tutt; Sanja Abbott
The design, development, and evaluation of an experimental translation system that aims to aid transactions between a deaf person and a clerk in a post office (PO) is described. The system uses a speech recognizer to recognize speech from a PO clerk and then synthesizes recognized phrases in British Sign language (BSL) using a specially developed avatar. The main objective in developing this prototype system was to determine how useful it would be to a customer whose first language was BSL, and to discover what areas of the system required more research and development to make it more effective. The system was evaluated by 6 prelingually profoundly deaf people and 3 PO clerks. Deaf users and PO clerks were supportive of the system, but the former group required a higher quality of signing from the avatar and the latter a system that was less constrained in the phrases it could recognize; both these areas are being addressed in the next phase of development.
international conference on acoustics, speech, and signal processing | 2013
Erich Zwyssig; Friedrich Faubel; Steve Renals; Mike Lincoln
This paper presents a new corpus comprising single and overlapping speech recorded using digital MEMS and analogue microphone arrays. In addition to this, the paper presents results from speech separation and recognition experiments on this data. The corpus is a reproduction of the multi-channel Wall Street Journal audio-visual corpus (MC-WSJAV), containing recorded speech in both a meeting room and an anechoic chamber using two different microphone types as well as two different array geometries. The speech separation and speech recognition experiments were performed using SRP-PHAT-based speaker localisation, superdirective beamforming and multiple post-processing schemes, such as residual echo suppression and binary masking. Our simple, cMLLR-based recognition system matches the performance of state-of-the-art ASR systems on the single speaker task and outperforms them on overlapping speech. The corpus will be made publicly available via the LDC in spring 2013.
conference of the international speech communication association | 2016
Joachim Fainberg; Peter Bell; Mike Lincoln; Steve Renals
Children’s speech poses challenges to speech recognition due to strong age-dependent anatomical variations and a lack of large, publicly-available corpora. In this paper we explore data augmentation for children’s speech recognition using stochastic feature mapping (SFM) to transform out-of-domain adult data for both GMM-based and DNN-based acoustic models. We performed experiments on the English PF-STAR corpus, augmenting using WSJCAM0 and ABI. Our experimental results indicate that a DNN acoustic model for childrens speech can make use of adult data, and that out-of-domain SFM is more accurate than in-domain SFM.