Vincent Wan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vincent Wan is active.

Explore More

Publication

Featured researches published by Vincent Wan.

Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501) | 2000

Support vector machines for speaker verification and identification

Vincent Wan; William M. Campbell

The performance of the support vector machine (SVM) on a speaker verification task is assessed. Since speaker verification requires binary decisions, support vector machines seem to be a promising candidate to perform the task. A new technique for normalising the polynomial kernel is developed and used to achieve performance comparable to other classifiers on the YOHO database. We also present results on a speaker identification task.

IEEE Transactions on Speech and Audio Processing | 2005

Speaker verification using sequence discriminant support vector machines

Vincent Wan; Steve Renals

This paper presents a text-independent speaker verification system using support vector machines (SVMs) with score-space kernels. Score-space kernels generalize Fisher kernels and are based on underlying generative models such as Gaussian mixture models (GMMs). This approach provides direct discrimination between whole sequences, in contrast with the frame-level approaches at the heart of most current systems. The resultant SVMs have a very high dimensionality since it is related to the number of parameters in the underlying generative model. To address problems that arise in the resultant optimization we introduce a technique called spherical normalization that preconditions the Hessian matrix. We have performed speaker verification experiments using the PolyVar database. The SVM system presented here reduces the relative error rates by 34% compared to a GMM likelihood ratio system.

international conference on acoustics, speech, and signal processing | 2007

The AMI System for the Transcription of Speech in Meetings

Thomas Hain; Vincent Wan; Lukas Burget; Martin Karafiát; John Dines; Jithendra Vepa; Giulia Garau; Mike Lincoln

In this paper we describe the 2005 AMI system for the transcription of speech in meetings used in the 2005 NIST RT evaluations. The system was designed for participation in the speech to text part of the evaluations, in particular for transcription of speech recorded with multiple distant microphones and independent headset microphones. System performance was tested on both conference room and lecture style meetings. Although input sources are processed using different front-ends, the recognition process is based on a unified system architecture. The system operates in multiple passes and makes use of state of the art technologies such as discriminative training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, speaker adaptation with maximum likelihood linear regression and minimum word error rate decoding. In this paper we describe the system performance on the official development and test sets for the NIST RT05s evaluations. The system was jointly developed in less than 10 months by a multi-site team and was shown to achieve competitive performance.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

Transcribing Meetings With the AMIDA Systems

Thomas Hain; Lukas Burget; John Dines; Philip N. Garner; Frantisek Grezl; Asmaa El Hannani; Marijn Huijbregts; Martin Karafiát; Mike Lincoln; Vincent Wan

In this paper, we give an overview of the AMIDA systems for transcription of conference and lecture room meetings. The systems were developed for participation in the Rich Transcription evaluations conducted by the National Institute for Standards and Technology in the years 2007 and 2009 and can process close talking and far field microphone recordings. The paper first discusses fundamental properties of meeting data with special focus on the AMI/AMIDA corpora. This is followed by a description and analysis of improved processing and modeling, with focus on techniques specifically addressing meeting transcription issues such as multi-room recordings or domain variability. In 2007 and 2009, two different strategies of systems building were followed. While in 2007 we used our traditional style system design based on cross adaptation, the 2009 systems were constructed semi-automatically, supported by improved decoders and a new method for system representation. Overall these changes gave a 6%-13% relative reduction in word error rate compared to our 2007 results while at the same time requiring less training material and reducing the real-time factor by five times. The meeting transcription systems are available at www.webasr.org.

international conference on acoustics, speech, and signal processing | 2003

SVMSVM: support vector machine speaker verification methodology

Vincent Wan; Steve Renals

Support vector machines with the Fisher and score-space kernels are used for text independent speaker verification to provide direct discrimination between complete utterances. This is unlike approaches such as discriminatively trained Gaussian mixture models or other discriminative classifiers that discriminate at the frame-level only. Using the sequence-level discrimination approach we are able to achieve error-rates that are significantly better than the current state-of-the-art on the PolyVar database.

international conference on acoustics, speech, and signal processing | 2002

Evaluation of kernel methods for speaker verification and identification

Vincent Wan; Steve Renals

Support vector machines are evaluated on speaker verification and speaker identification tasks. We compare the polynomial kernel, the Fisher kernel, a likelihood ratio kernel and the pair hidden Markov model kernel with baseline systems based on a discriminative polynomial classifier and generative Gaussian mixture model classifiers. Simulations were carried out on the YOHO database and some promising results were obtained.

international conference on acoustics, speech, and signal processing | 2007

Finding Maximum Margin Segments in Speech

Yago Pereiro Estevan; Vincent Wan; Odette Scharenborg

Maximum margin clustering (MMC) is a relatively new and promising kernel method. In this paper, we apply MMC to the task of unsupervised speech segmentation. We present three automatic speech segmentation methods based on MMC, which are tested on TIMIT and evaluated on the level of phoneme boundary detection. The results show that MMC is highly competitive with existing unsupervised methods for the automatic detection of phoneme boundaries. Furthermore, initial analyses show that MMC is a promising method for the automatic detection of sub-phonetic information in the speech signal.

Multimodal Technologies for Perception of Humans | 2008

The 2007 AMI(DA) System for Meeting Transcription

Thomas Hain; Lukas Burget; John Dines; Giulia Garau; Martin Karafiát; David A. van Leeuwen; Mike Lincoln; Vincent Wan

Meeting transcription is one of the main tasks for large vocabulary automatic speech recognition (ASR) and is supported by several large international projects in the area. The conversational nature, the difficult acoustics, and the necessity of high quality speech transcripts for higher level processing make ASR of meeting recordings an interesting challenge. This paper describes the development and system architecture of the 2007 AMIDA meeting transcription system, the third of such systems developed in a collaboration of six research sites. Different variants of the system participated in all speech to text transcription tasks of the 2007 NIST RT evaluations and showed very competitive performance. The best result was obtained on close-talking microphone data where a final word error rate of 24.9% was obtained.

international conference on machine learning | 2005

The development of the AMI system for the transcription of speech in meetings

Thomas Hain; Lukas Burget; John Dines; Iain A. McCowan; Giulia Garau; Martin Karafiát; Mike Lincoln; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals

This paper describes the AMI transcription system for speech in meetings developed in collaboration by five research groups. The system includes generic techniques such as discriminative and speaker adaptive training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, maximum likelihood linear regression, and phone posterior based features, as well as techniques specifically designed for meeting data. These include segmentation and cross-talk suppression, beam-forming, domain adaptation, Web-data collection, and channel adaptive training. The system was improved by more than 20% relative in word error rate compared to our previous system and was used in the NIST RT106 evaluations where it was found to yield competitive performance.

international conference on acoustics, speech, and signal processing | 2006

Strategies for Language Model Web-Data Collection

Vincent Wan; Thomas Hain

This paper presents an analysis of the use of textual information collected from the Internet via a search engine for the purpose of building domain specific language models. A framework to analyse the effect of search query formulation on the resulting Web-data language model performance in an evaluation is developed. The framework gives rise to improved methods of selecting n-gram search engine queries, which return documents that make better domain specific language models

Explore More