Darren Moore
Idiap Research Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Darren Moore.
international conference on acoustics, speech, and signal processing | 2003
Iain A. McCowan; Samy Bengio; Daniel Gatica-Perez; Guillaume Lathoud; Florent Monay; Darren Moore; Pierre Wellner
The paper investigates the recognition of group actions in meetings by modeling the joint behaviour of participants. Many meeting actions, such as presentations, discussions and consensus, are characterised by similar or complementary behaviour across participants. Recognising these meaningful actions is an important step towards the goal of providing effective browsing and summarisation of processed meetings. A corpus of meetings was collected in a room equipped with a number of microphones and cameras. The corpus was labeled in terms of a predefined set of meeting actions characterised by global behaviour. In experiments, audio and visual features for each participant are extracted from the raw data and the interaction of participants is modeled using HMM-based approaches. Initial results on the corpus demonstrate the ability of the system to recognise the set of meeting actions.
international conference on acoustics, speech, and signal processing | 2003
Darren Moore; Iain A. McCowan
This paper investigates the use of microphone arrays to acquire and recognise speech in meetings. Meetings pose several interesting problems for speech processing, as they consist of multiple competing speakers within a small space, typically around a table. Due to their ability to provide hands-free acquisition and directional discrimination, microphone arrays present a potential alternative to close-talking microphones in such an application. We first propose an appropriate microphone array geometry and improved processing technique for this scenario, paying particular attention to speaker separation. during possible overlap segments. Data collection of a small vocabulary speech recognition corpus (Numbers) was performed in a real meeting room for a single speaker, and several overlapping speech scenarios. In speech recognition experiments on the acquired database, the performance of the microphone array system is compared to that of a close-talking lapel microphone, and a single table-top microphone.
international conference on machine learning | 2005
Thomas Hain; Lukas Burget; John Dines; Iain A. McCowan; Giulia Garau; Martin Karafiát; Mike Lincoln; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals
This paper describes the AMI transcription system for speech in meetings developed in collaboration by five research groups. The system includes generic techniques such as discriminative and speaker adaptive training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, maximum likelihood linear regression, and phone posterior based features, as well as techniques specifically designed for meeting data. These include segmentation and cross-talk suppression, beam-forming, domain adaptation, Web-data collection, and channel adaptive training. The system was improved by more than 20% relative in word error rate compared to our previous system and was used in the NIST RT106 evaluations where it was found to yield competitive performance.
international conference on image processing | 2003
Daniel Gatica-Perez; Guillaume Lathoud; Iain A. McCowan; Jean-Marc Odobez; Darren Moore
We present a probabilistic method for audio-visual (AV) speaker tracking, using an uncalibrated wide-angle camera and a micro- phone array. The algorithm fuses 2-D object shape and audio information via importance particle filters (I-PFs), allowing for the asymmetrical integration of AV information in a way that efficiently exploits the complementary features of each modality. Audio localization information is used to generate an importance sampling (IS) function, which guides the random search process of a particle filter towards regions of the configuration space likely to contain the true configuration (a speaker). The measurement process integrates contour-based and audio observations, which results in reliable head tracking in realistic scenarios. We show that imperfect single modalities can be combined into an algorithm that automatically initializes and tracks a speaker, switches between multiple speakers, tolerates visual clutter, and recovers from total AV object occlusion, in the context of a multimodal meeting room.
ambient intelligence | 2003
Iain A. McCowan; Daniel Gatica-Perez; Samy Bengio; Darren Moore
People meet in order to interact – disseminating information, making decisions, and creating new ideas. Automatic analysis of meetings is therefore important from two points of view: extracting the information they contain, and understanding human interaction processes. Based on this view, this article presents an approach in which relevant information content of a meeting is identified from a variety of audio and visual sensor inputs and statistical models of interacting people. We present a framework for computer observation and understanding of interacting people, and discuss particular tasks within this framework, issues in the meeting context, and particular algorithms that we have adopted. We also comment on current developments and the future challenges in automatic meeting analysis.
Digital Signal Processing | 2002
Iain A. McCowan; Darren Moore; Sridha Sridharan
Abstract McCowan, I. A., Moore, D. C., and Sridharan, S., Near-field Adaptive Beamformer for Robust Speech Recognition, Digital Signal Processing12 (2002) 87–106 This paper investigates a new microphone array processing technique specifically for the purpose of speech enhancement and recognition. The main objective of the proposed technique is to improve the low frequency directivity of a conventional adaptive beamformer, as low frequency performance is critical in speech processing applications. The proposed technique, termed near-field adaptive beamforming (NFAB), is implemented using the standard generalized sidelobe canceler (GSC) system structure, where a near-field superdirective (NFSD) beamformer is used as the fixed upper-path beamformer to improve the low frequency performance. In addition, to minimize signal leakage into the adaptive noise canceling path for near-field sources, a compensation unit is introduced prior to the blocking matrix. The advantage of the technique is verified by comparing the directivity patterns with those of conventional filter-sum, NFSD, and GSC systems. In speech enhancement and recognition experiments, the proposed technique outperforms the standard techniques for a near-field source in adverse noise conditions.
international conference on multimedia and expo | 2005
Iain A. McCowan; Maganti Hari Krishna; Daniel Gatica-Perez; Darren Moore; Silèye O. Ba
Close-talk headset microphones have been traditionally used for speech acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio -needed for recognition tasks-than single distant microphones. However, in multi-party conversational settings like meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intrusive, hands-free operation mode. In this article, we investigate the use of an audio-visual sensor array, composed of a small table-top microphone array and a set of cameras, for speaker tracking and speech enhancement in meetings. Our methodology first fuses audio and video for person tracking, and then integrates the output of the tracker with a beamformer for speech enhancement. We compare and discuss the features of the resulting speech signal with respect to that obtained from single close-talking and table-top microphones
international conference of the ieee engineering in medicine and biology society | 2006
Iain A. McCowan; Darren Moore; Mary-Jane Fry
This article investigates the classification of a patients lung cancer stage based on analysis of their free-text medical reports. The system uses natural language processing to transform the report text, including identification of UMLS terms and detection of negated findings. The transformed report is then classified using statistical machine learning techniques. A support vector machine is trained for each stage category based on word occurrences in a corpus of histology reports for pathologically staged patients. New reports can be classified according to the most likely stage, allowing the collection of population stage data for analysis of outcomes. While the system could in principle be applied to stage different cancer types, the current work focuses on lung cancer due to data availability. The article presents initial experiments quantifying system performance for T and N staging on a corpus of histology reports from more than 700 lung cancer patients
international conference on machine learning | 2004
Iain A. McCowan; Daniel Gatica-Perez; Samy Bengio; Darren Moore
People meet in order to interact – disseminating information, making decisions, and creating new ideas. Automatic analysis of meetings is therefore important from two points of view: extracting the information they contain, and understanding human interaction processes. Based on this view, this article presents an approach in which relevant information content of a meeting is identified from a variety of audio and visual sensor inputs and statistical models of interacting people. We present a framework for computer observation and understanding of interacting people, and discuss particular tasks within this framework, issues in the meeting context, and particular algorithms that we have adopted. We also comment on current developments and the future challenges in automatic meeting analysis.
Lecture Notes in Computer Science | 2006
Thomas Hain; Lukas Burget; John Dines; Giulia Garau; Martin Karafiát; Mike Lincoln; Lain Mccowan; Darren Moore; Vincent Wan; Roeland Ordelman; Steve Renals