Is this you? Create Your Porfile

Kofi Boakye

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kofi Boakye is active.

Explore More

Publication

Featured researches published by Kofi Boakye.

international conference on acoustics, speech, and signal processing | 2008

Overlapped speech detection for improved speaker diarization in multiparty meetings

Kofi Boakye; Beatriz Trueba-Hornero; Oriol Vinyals; Gerald Friedland

State-of-the-art speaker diarization systems for meetings are now at a point where overlapped speech contributes significantly to the errors made by the system. However, little if no work has yet been done on detecting overlapped speech. We present our initial work toward developing an overlap detection system for improved meeting diarization. We investigate various features, with a focus on high-precision performance for use in the detector, and examine performance results on a subset of the AMI Meeting Corpus. For the high-quality signal case of a single mixed-headset channel signal, we demonstrate a relative improvement of about 7.4% DER over the baseline diarization system, while for the more challenging case of the single far-field channel signal relative improvement is 3.6%. We also outline steps towards improvement and moving beyond this initial phase.

international conference on machine learning | 2005

Further progress in meeting recognition: the ICSI-SRI spring 2005 speech-to-text evaluation system

Andreas Stolcke; Xavier Anguera; Kofi Boakye; Özgür Çetin; Frantisek Grezl; Adam Janin; Arindam Mandal; Barbara Peskin; Chuck Wooters; Jing Zheng

We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversational telephone speech (CTS) recognition system, with meeting-adapted models and various audio preprocessing steps. This years system features better delay-sum processing of distant microphone channels and energy-based crosstalk suppression for close-talking microphones. Acoustic modeling is improved by virtue of various enhancements to the background (CTS) models, including added training data, decision-tree based state tying, and the inclusion of discriminatively trained phone posterior features estimated by multilayer perceptrons. In particular, we make use of adaptation of both acoustic models and MLP features to the meeting domain. For distant microphone recognition we obtained considerable gains by combining and cross-adapting narrow-band (telephone) acoustic models with broadband (broadcast news) models. Language models (LMs) were improved with the inclusion of new meeting and web data. In spite of a lack of training data, we created effective LMs for the CHIL lecture domain. Results are reported on RT-04S and RT-05S meeting data. Measured on RT-04S conference data, we achieved an overall improvement of 17% relative in both MDM and IHM conditions compared to last years evaluation system. Results on lecture data are comparable to the best reported results for that task.

ieee automatic speech recognition and understanding workshop | 2009

Any questions? Automatic question detection in meetings

Kofi Boakye; Benoit Favre; Dilek Hakkani-Tür

In this paper, we describe our efforts toward the automatic detection of English questions in meetings. We analyze the utility of various features for this task, originating from three distinct classes: lexico-syntactic, turn-related, and pitch-related. Of particular interest is the use of parse tree information in classification, an approach as yet unexplored. Results from experiments on the ICSI MRDA corpus demonstrate that lexico-syntactic features are most useful for this task, with turn-and pitch-related features providing complementary information in combination. In addition, experiments using reference parse trees on the broadcast conversation portion of the OntoNotes release 2.9 data set illustrate the potential of parse trees to outperform word lexical features.

ieee automatic speech recognition and understanding workshop | 2005

ICSI'S 2005 speaker recognition system

Nikki Mirghafori; Andrew O. Hatch; Steven Stafford; Kofi Boakye; Daniel Gillick; Barbara Peskin

This paper describes ICSIs 2005 speaker recognition system, which was one of the top performing systems in the NIST 2005 speaker recognition evaluation. The system is a combination of four sub-systems: 1) a keyword conditional HMM system, 2) an SVM-based lattice phone n-gram system, 3) a sequential nonparametric system, and 4) a traditional cepstral GMM System, developed by SRI. The first three systems are designed to take advantage of higher-level and long-term information. We observe that their performance is significantly improved when there is more training data. In this paper, we describe these sub-systems and present results for each system alone and in combination on the speaker recognition evaluation (SRE) 2005 development and evaluation data sets

Multimodal Technologies for Perception of Humans | 2008