Stavros Tsakalidis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stavros Tsakalidis is active.

Explore More

Publication

Featured researches published by Stavros Tsakalidis.

computer vision and pattern recognition | 2012

Multimodal feature fusion for robust event detection in web videos

Pradeep Natarajan; Shuang Wu; Shiv Naga Prasad Vitaladevuni; Xiaodan Zhuang; Stavros Tsakalidis; Unsang Park; Rohit Prasad; Premkumar Natarajan

Combining multiple low-level visual features is a proven and effective strategy for a range of computer vision tasks. However, limited attention has been paid to combining such features with information from other modalities, such as audio and videotext, for large scale analysis of web videos. In our work, we rigorously analyze and combine a large set of low-level features that capture appearance, color, motion, audio and audio-visual co-occurrence patterns in videos. We also evaluate the utility of high-level (i.e., semantic) visual information obtained from detecting scene, object, and action concepts. Further, we exploit multimodal information by analyzing available spoken and videotext content using state-of-the-art automatic speech recognition (ASR) and videotext recognition systems. We combine these diverse features using a two-step strategy employing multiple kernel learning (MKL) and late score level fusion methods. Based on the TRECVID MED 2011 evaluations for detecting 10 events in a large benchmark set of ~45000 videos, our system showed the best performance among the 19 international teams.

ieee automatic speech recognition and understanding workshop | 2013

Score normalization and system combination for improved keyword spotting

Damianos Karakos; Richard M. Schwartz; Stavros Tsakalidis; Le Zhang; Shivesh Ranjan; Tim Ng; Roger Hsiao; Guruprasad Saikumar; Ivan Bulyko; Long Nguyen; John Makhoul; Frantisek Grezl; Mirko Hannemann; Martin Karafiát; Igor Szöke; Karel Vesely; Lori Lamel; Viet-Bac Le

We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system combination, where the detections of multiple systems are merged together, and their scores are interpolated with weights which are optimized using MTWV as the maximization criterion. Both score normalization and system combination approaches show that significant gains in ATWV/MTWV can be obtained, sometimes on the order of 8-10 points (absolute), in five different languages. A variant of these methods resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.

ieee automatic speech recognition and understanding workshop | 2013

Discriminative semi-supervised training for keyword search in low resource languages

Roger Hsiao; Tim Ng; Frantisek Grezl; Damianos Karakos; Stavros Tsakalidis; Long Nguyen; Richard M. Schwartz

In this paper, we investigate semi-supervised training for low resource languages where the initial systems may have high error rate (≥ 70.0% word eror rate). To handle the lack of data, we study semi-supervised techniques including data selection, data weighting, discriminative training and multilayer perceptron learning to improve system performance. The entire suite of semi-supervised methods presented in this paper was evaluated under the IARPA Babel program for the keyword spotting tasks. Our semi-supervised system had the best performance in the OpenKWS13 surprise language evaluation for the limited condition. In this paper, we describe our work on the Turkish and Vietnamese systems.

international conference on acoustics, speech, and signal processing | 2014

The 2013 BBN Vietnamese telephone speech keyword spotting system

Stavros Tsakalidis; Roger Hsiao; Damianos Karakos; Tim Ng; Shivesh Ranjan; Guruprasad Saikumar; Le Zhang; Long Nguyen; Richard M. Schwartz; John Makhoul

In this paper we describe the Vietnamese conversational telephone speech keyword spotting system under the IARPA Babel program for the 2013 evaluation conducted by NIST. The system contains several, recently developed, novel methods that significantly improve speech-to-text and keyword spotting performance such as stacked bottleneck neural network features, white listing, score normalization, and improvements on semi-supervised training methods. These methods resulted in the highest performance for the official IARPA Babel surprise language evaluation of 2013.

international conference on acoustics, speech, and signal processing | 2014

Normalizationofphonetic keyword search scores

Damianos Karakos; Ivan Bulyko; Richard M. Schwartz; Stavros Tsakalidis; Long Nguyen; John Makhoul

As shown in [1, 2], score normalization is of crucial importance for improving the Average Term-Weighted Value (ATWV) measure that is commonly used for evaluating keyword spotting systems. In this paper, we compare three different methods for score normalization within a keyword spotting system that employs phonetic search. We show that a new unsupervised linear fit method results in better-estimated posterior scores, that, when fed into the keyword-specific normalization of [1], result in ATWV gains of 3% on average. Furthermore, when these scores are used as features within a supervised machine learning framework, they result in additional gains of 3.8% on average over the five languages used in the first year of the IARPA-funded project Babel.

empirical methods in natural language processing | 2014

Morphological Segmentation for Keyword Spotting

Karthik Narasimhan; Damianos Karakos; Richard M. Schwartz; Stavros Tsakalidis; Regina Barzilay

We explore the impact of morphological segmentation on keyword spotting (KWS). Despite potential benefits, stateof-the-art KWS systems do not use morphological information. In this paper, we augment a state-of-the-art KWS system with sub-word units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological, phonetic and syllabic segmentations, we demonstrate substantial performance gains.

ieee automatic speech recognition and understanding workshop | 2015

Robust speech recognition in unknown reverberant and noisy conditions

Roger Hsiao; Jeff Z. Ma; William Hartmann; Martin Karafiát; Frantisek Grezl; Lukas Burget; Igor Szöke; Jan Cernocky; Shinji Watanabe; Zhuo Chen; Sri Harish Reddy Mallidi; Hynek Hermansky; Stavros Tsakalidis; Richard M. Schwartz

In this paper, we describe our work on the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge, which aims to assess the robustness of automatic speech recognition (ASR) systems. The main characteristic of the challenge is developing a high-performance system without access to matched training and development data. While the evaluation data are recorded with far-field microphones in noisy and reverberant rooms, the training data are telephone speech and close talking. Our approach to this challenge includes speech enhancement, neural network methods and acoustic model adaptation, We show that these techniques can successfully alleviate the performance degradation due to noisy audio and data mismatch.

international conference on acoustics, speech, and signal processing | 2017

The 2016 BBN Georgian telephone speech keyword spotting system

Tanel Alumäe; Damianos Karakos; William Hartmann; Roger Hsiao; Le Zhang; Long Nguyen; Stavros Tsakalidis; Richard M. Schwartz

In this paper we describe the 2016 BBN conversational telephone speech keyword spotting system; the culmination of four years of research and development under the IARPA Babel program. The system was constructed in response to the NIST Open Keyword Search (OpenKWS) evaluation of 2016. We present our technological breakthroughs in building top-performing keyword spotting processing systems for new languages, in the face of limited transcribed speech, noisy conditions, and limited system build time of one week.

Computer Speech & Language | 2013

BBN TransTalk: Robust multilingual two-way speech-to-speech translation for mobile platforms

Rohit Prasad; Prem Natarajan; David Stallard; Shirin Saleem; Shankar Ananthakrishnan; Stavros Tsakalidis; Chia-Lin Kao; Fred Choi; Ralf Meermeier; Mark Rawls; Jacob Devlin; Kriste Krstovski; Aaron Challenner

In this paper we present a speech-to-speech (S2S) translation system called the BBN TransTalk that enables two-way communication between speakers of English and speakers who do not understand or speak English. The BBN TransTalk has been configured for several languages including Iraqi Arabic, Pashto, Dari, Farsi, Malay, Indonesian, and Levantine Arabic. We describe the key components of our system: automatic speech recognition (ASR), machine translation (MT), text-to-speech (TTS), dialog manager, and the user interface (UI). In addition, we present novel techniques for overcoming specific challenges in developing high-performing S2S systems. For ASR, we present techniques for dealing with lack of pronunciation and linguistic resources and effective modeling of ambiguity in pronunciations of words in these languages. For MT, we describe techniques for dealing with data sparsity as well as modeling context. We also present and compare different user confirmation techniques for detecting errors that can cause the dialog to drift or stall.

international conference on acoustics, speech, and signal processing | 2010

Pashto speech recognition with limited pronunciation lexicon

Rohit Prasad; Stavros Tsakalidis; Ivan Bulyko; Chia-Lin Kao; Prem Natarajan

Automatic speech recognition (ASR) for low resource languages continues to be a difficult problem. In particular, colloquial dialects of Arabic, Farsi, and Pashto pose significant challenges in pronunciation dictionary creation. Therefore, most state-of-the-art ASR engines rely on the grapheme-as-phoneme approach for creating pronunciation dictionaries in these languages. While the grapheme approach simplifies ASR training, it performs significantly worse than a system trained with a high-quality phonetic dictionary. In this paper, we explore two techniques for bridging the performance gap between the grapheme and the phonetic approaches, without requiring manual pronunciations for all the words in the training data. The first approach is based on learning letter-to-sound rules from a small set of manual pronunciations in Pashto, and the second approach uses a hybrid phoneme/grapheme representation for recognition. Through experimental results on colloquial Pashto, we demonstrate that both techniques perform as well as a full phonetic system while requiring manual pronunciations for only a small fraction of the words in the acoustic training data.

Explore More