Kofi Boakye
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kofi Boakye.
international conference on acoustics, speech, and signal processing | 2008
Kofi Boakye; Beatriz Trueba-Hornero; Oriol Vinyals; Gerald Friedland
State-of-the-art speaker diarization systems for meetings are now at a point where overlapped speech contributes significantly to the errors made by the system. However, little if no work has yet been done on detecting overlapped speech. We present our initial work toward developing an overlap detection system for improved meeting diarization. We investigate various features, with a focus on high-precision performance for use in the detector, and examine performance results on a subset of the AMI Meeting Corpus. For the high-quality signal case of a single mixed-headset channel signal, we demonstrate a relative improvement of about 7.4% DER over the baseline diarization system, while for the more challenging case of the single far-field channel signal relative improvement is 3.6%. We also outline steps towards improvement and moving beyond this initial phase.
international conference on machine learning | 2005
Andreas Stolcke; Xavier Anguera; Kofi Boakye; Özgür Çetin; Frantisek Grezl; Adam Janin; Arindam Mandal; Barbara Peskin; Chuck Wooters; Jing Zheng
We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversational telephone speech (CTS) recognition system, with meeting-adapted models and various audio preprocessing steps. This years system features better delay-sum processing of distant microphone channels and energy-based crosstalk suppression for close-talking microphones. Acoustic modeling is improved by virtue of various enhancements to the background (CTS) models, including added training data, decision-tree based state tying, and the inclusion of discriminatively trained phone posterior features estimated by multilayer perceptrons. In particular, we make use of adaptation of both acoustic models and MLP features to the meeting domain. For distant microphone recognition we obtained considerable gains by combining and cross-adapting narrow-band (telephone) acoustic models with broadband (broadcast news) models. Language models (LMs) were improved with the inclusion of new meeting and web data. In spite of a lack of training data, we created effective LMs for the CHIL lecture domain. Results are reported on RT-04S and RT-05S meeting data. Measured on RT-04S conference data, we achieved an overall improvement of 17% relative in both MDM and IHM conditions compared to last years evaluation system. Results on lecture data are comparable to the best reported results for that task.
ieee automatic speech recognition and understanding workshop | 2009
Kofi Boakye; Benoit Favre; Dilek Hakkani-Tür
In this paper, we describe our efforts toward the automatic detection of English questions in meetings. We analyze the utility of various features for this task, originating from three distinct classes: lexico-syntactic, turn-related, and pitch-related. Of particular interest is the use of parse tree information in classification, an approach as yet unexplored. Results from experiments on the ICSI MRDA corpus demonstrate that lexico-syntactic features are most useful for this task, with turn-and pitch-related features providing complementary information in combination. In addition, experiments using reference parse trees on the broadcast conversation portion of the OntoNotes release 2.9 data set illustrate the potential of parse trees to outperform word lexical features.
ieee automatic speech recognition and understanding workshop | 2005
Nikki Mirghafori; Andrew O. Hatch; Steven Stafford; Kofi Boakye; Daniel Gillick; Barbara Peskin
This paper describes ICSIs 2005 speaker recognition system, which was one of the top performing systems in the NIST 2005 speaker recognition evaluation. The system is a combination of four sub-systems: 1) a keyword conditional HMM system, 2) an SVM-based lattice phone n-gram system, 3) a sequential nonparametric system, and 4) a traditional cepstral GMM System, developed by SRI. The first three systems are designed to take advantage of higher-level and long-term information. We observe that their performance is significantly improved when there is more training data. In this paper, we describe these sub-systems and present results for each system alone and in combination on the speaker recognition evaluation (SRE) 2005 development and evaluation data sets
Multimodal Technologies for Perception of Humans | 2008
Andreas Stolcke; Xavier Anguera; Kofi Boakye; Özgür Çetin; Adam Janin; Mathew Magimai-Doss; Chuck Wooters; Jing Zheng
Odyssey | 2004
Kofi Boakye; Barbara Peskin
conference of the international speech communication association | 2011
Kofi Boakye; Oriol Vinyals; Gerald Friedland
conference of the international speech communication association | 2008
Kofi Boakye; Oriol Vinyals; Gerald Friedland
conference of the international speech communication association | 2006
Kofi Boakye; Andreas Stolcke
Archive | 2008
Nelson Morgan; Kofi Boakye