Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Damianos Karakos is active.

Publication


Featured researches published by Damianos Karakos.


ieee automatic speech recognition and understanding workshop | 2013

Score normalization and system combination for improved keyword spotting

Damianos Karakos; Richard M. Schwartz; Stavros Tsakalidis; Le Zhang; Shivesh Ranjan; Tim Ng; Roger Hsiao; Guruprasad Saikumar; Ivan Bulyko; Long Nguyen; John Makhoul; Frantisek Grezl; Mirko Hannemann; Martin Karafiát; Igor Szöke; Karel Vesely; Lori Lamel; Viet-Bac Le

We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system combination, where the detections of multiple systems are merged together, and their scores are interpolated with weights which are optimized using MTWV as the maximization criterion. Both score normalization and system combination approaches show that significant gains in ATWV/MTWV can be obtained, sometimes on the order of 8-10 points (absolute), in five different languages. A variant of these methods resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.


ieee automatic speech recognition and understanding workshop | 2013

Discriminative semi-supervised training for keyword search in low resource languages

Roger Hsiao; Tim Ng; Frantisek Grezl; Damianos Karakos; Stavros Tsakalidis; Long Nguyen; Richard M. Schwartz

In this paper, we investigate semi-supervised training for low resource languages where the initial systems may have high error rate (≥ 70.0% word eror rate). To handle the lack of data, we study semi-supervised techniques including data selection, data weighting, discriminative training and multilayer perceptron learning to improve system performance. The entire suite of semi-supervised methods presented in this paper was evaluated under the IARPA Babel program for the keyword spotting tasks. Our semi-supervised system had the best performance in the OpenKWS13 surprise language evaluation for the limited condition. In this paper, we describe our work on the Turkish and Vietnamese systems.


international conference on acoustics, speech, and signal processing | 2014

The 2013 BBN Vietnamese telephone speech keyword spotting system

Stavros Tsakalidis; Roger Hsiao; Damianos Karakos; Tim Ng; Shivesh Ranjan; Guruprasad Saikumar; Le Zhang; Long Nguyen; Richard M. Schwartz; John Makhoul

In this paper we describe the Vietnamese conversational telephone speech keyword spotting system under the IARPA Babel program for the 2013 evaluation conducted by NIST. The system contains several, recently developed, novel methods that significantly improve speech-to-text and keyword spotting performance such as stacked bottleneck neural network features, white listing, score normalization, and improvements on semi-supervised training methods. These methods resulted in the highest performance for the official IARPA Babel surprise language evaluation of 2013.


international conference on acoustics, speech, and signal processing | 2014

Normalizationofphonetic keyword search scores

Damianos Karakos; Ivan Bulyko; Richard M. Schwartz; Stavros Tsakalidis; Long Nguyen; John Makhoul

As shown in [1, 2], score normalization is of crucial importance for improving the Average Term-Weighted Value (ATWV) measure that is commonly used for evaluating keyword spotting systems. In this paper, we compare three different methods for score normalization within a keyword spotting system that employs phonetic search. We show that a new unsupervised linear fit method results in better-estimated posterior scores, that, when fed into the keyword-specific normalization of [1], result in ATWV gains of 3% on average. Furthermore, when these scores are used as features within a supervised machine learning framework, they result in additional gains of 3.8% on average over the five languages used in the first year of the IARPA-funded project Babel.


empirical methods in natural language processing | 2014

Morphological Segmentation for Keyword Spotting

Karthik Narasimhan; Damianos Karakos; Richard M. Schwartz; Stavros Tsakalidis; Regina Barzilay

We explore the impact of morphological segmentation on keyword spotting (KWS). Despite potential benefits, stateof-the-art KWS systems do not use morphological information. In this paper, we augment a state-of-the-art KWS system with sub-word units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological, phonetic and syllabic segmentations, we demonstrate substantial performance gains.


international conference on acoustics, speech, and signal processing | 2015

Combination of search techniques for improved spotting of OOV keywords

Damianos Karakos; Richard M. Schwartz

The most common pipelines in keyword spotting involve some kind of speech recognition, which leads to the generation of sets of plausible hypotheses (e.g., word lattices), which are subsequently explored. The case of out-of-vocabulary (OOV) keywords is of special interest, because it requires representing keywords and/or lattices in an alternative format, so that the two can match. A number of techniques for dealing with OOV keywords have appeared in the literature; here, we focus on (i) fuzzy-phonetic search using phonetic confusion networks [1], and (ii) proxy-keyword search [2]. As we demonstrate in this paper, the combination of these two diverse techniques improves the ATWV of OOV keywords by at least 3% on average over the five development languages used in the second year of the IARPA Babel program.


international conference on acoustics, speech, and signal processing | 2017

The 2016 BBN Georgian telephone speech keyword spotting system

Tanel Alumäe; Damianos Karakos; William Hartmann; Roger Hsiao; Le Zhang; Long Nguyen; Stavros Tsakalidis; Richard M. Schwartz

In this paper we describe the 2016 BBN conversational telephone speech keyword spotting system; the culmination of four years of research and development under the IARPA Babel program. The system was constructed in response to the NIST Open Keyword Search (OpenKWS) evaluation of 2016. We present our technological breakthroughs in building top-performing keyword spotting processing systems for new languages, in the face of limited transcribed speech, noisy conditions, and limited system build time of one week.


international conference on acoustics, speech, and signal processing | 2017

Analysis of keyword spotting performance across IARPA babel languages

William Hartmann; Damianos Karakos; Roger Hsiao; Le Zhang; Tanel Alumäe; Stavros Tsakalidis; Richard M. Schwartz

With the completion of the IARPA Babel program, it is possible to systematically analyze the performance of speech recognition systems across a wide variety of languages. We select 16 languages from the dataset and compare performance using a deep neural network-based acoustic model. The focus is on keyword spotting using the actual term-weighted value (ATWV) metric. We demonstrate that ATWV is keyword dependent, and that this must be accounted for in any cross-language analysis. Further, we show that while performance across languages does not track with any particular feature of the language, it is correlated with inter-annotator agreement.


international conference on acoustics, speech, and signal processing | 2017

Constructing sub-word units for spoken term detection

Charl Johannes van Heerden; Damianos Karakos; Karthik Narasimhan; Marelie H. Davel; Richard M. Schwartz

Spoken term detection, especially of out-of-vocabulary (OOV) keywords, benefits from the use of sub-word systems. We experiment with different language-independent approaches to sub-word unit generation, generating both syllable-like and morpheme-like units, and demonstrate how the performance of syllable-like units can be improved by artificially increasing the number of unique units. The effect of unit choice is empirically evaluated using the eight languages from the 2016 IARPA BABEL evaluation.


spoken language technology workshop | 2016

BBN technologies' OpenSAD system

Scott Novotney; Damianos Karakos; Jan Silovsky; Rich Schwartz

We describe our submission to the NIST OpenSAD evaluation of speech activity detection of noisy audio generated by the DARPA RATS program. With frequent transmission degradation, channel interference and other noises added, simple energy thresholds do a poor job at SAD for this audio. The evaluation measured performance on both in-training and novel channels. Our approach used a system combination of feed-forward neural networks and bidirectional LSTM recurrent neural networks. System combination and unsupervised adaptation provided further gains on novel channels that lack training data. These improvements lead to a 26% relative improvement for novel channels over simple decoding. Our system resulted in the lowest error rate on the in-training channels and second on the out-of-training channels.

Collaboration


Dive into the Damianos Karakos's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge