Spyros Matsoukas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Spyros Matsoukas is active.

Explore More

Publication

Featured researches published by Spyros Matsoukas.

empirical methods in natural language processing | 2009

Discriminative Corpus Weight Estimation for Machine Translation

Spyros Matsoukas; Antti-Veikko I. Rosti; Bing Zhang

Current statistical machine translation (SMT) systems are trained on sentence-aligned and word-aligned parallel text collected from various sources. Translation model parameters are estimated from the word alignments, and the quality of the translations on a given test set depends on the parameter estimates. There are at least two factors affecting the parameter estimation: domain match and training data quality. This paper describes a novel approach for automatically detecting and down-weighing certain parts of the training corpus by assigning a weight to each sentence in the training bitext so as to optimize a discriminative objective function on a designated tuning set. This way, the proposed method can limit the negative effects of low quality training data, and can adapt the translation model to the domain of interest. It is shown that such discriminative corpus weights can provide significant improvements in Arabic-English translation on various conditions, using a state-of-the-art SMT system.

workshop on statistical machine translation | 2008

Incremental Hypothesis Alignment for Building Confusion Networks with Application to Machine Translation System Combination

Antti-Veikko I. Rosti; Bing Zhang; Spyros Matsoukas; Richard M. Schwartz

Confusion network decoding has been the most successful approach in combining outputs from multiple machine translation (MT) systems in the recent DARPA GALE and NIST Open MT evaluations. Due to the varying word order between outputs from different MT systems, the hypothesis alignment presents the biggest challenge in confusion network decoding. This paper describes an incremental alignment method to build confusion networks based on the translation edit rate (TER) algorithm. This new algorithm yields significant BLEU score improvements over other recent alignment methods on the GALE test sets and was used in BBNs submission to the WMT08 shared translation task.

empirical methods in natural language processing | 2009

Effective Use of Linguistic and Contextual Information for Statistical Machine Translation

Libin Shen; Jinxi Xu; Bing Zhang; Spyros Matsoukas; Ralph M. Weischedel

Current methods of using lexical features in machine translation have difficulty in scaling up to realistic MT tasks due to a prohibitively large number of parameters involved. In this paper, we propose methods of using new linguistic and contextual features that do not suffer from this problem and apply them in a state-of-the-art hierarchical MT system. The features used in this work are non-terminal labels, non-terminal length distribution, source string context and source dependency LM scores. The effectiveness of our techniques is demonstrated by significant improvements over a strong base-line. On Arabic-to-English translation, improvements in lower-cased BLEU are 2.0 on NIST MT06 and 1.7 on MT08 newswire data on decoding output. On Chinese-to-English translation, the improvements are 1.0 on MT06 and 0.8 on MT08 newswire data.

international conference on acoustics, speech, and signal processing | 2006

Discriminatively Trained Region Dependent Feature Transforms for Speech Recognition

Bing Zhang; Spyros Matsoukas; Richard M. Schwartz

Discriminatively trained feature transforms such as MPE-HLDA, fMPE and MMI-SPLICE have been shown to be effective in reducing recognition errors in todays state-of-the-art speech recognition systems. This paper introduces the concept of region dependent linear transform (RDLT), which unifies the above three types of feature transforms and provides a framework for the estimation of piece-wise linear feature projections, based on the minimum phoneme error (MPE) criterion. Recognition results on English conversational telephone speech data show that RDLT offers consistent gains over the baseline systems, which are trained using the LDA+MLLT projection

international conference on acoustics, speech, and signal processing | 2004

Speech recognition in multiple languages and domains: the 2003 BBN/LIMSI EARS system

Richard M. Schwartz; Thomas Colthurst; Nicolae Duta; Herbert Gish; Rukmini Iyer; Chia-Lin Kao; Daben Liu; Owen Kimball; Jeff Z. Ma; John Makhoul; Spyros Matsoukas; Long Nguyen; Mohammed Noamany; Rohit Prasad; Bing Xiang; Dongxin Xu; Jean-Luc Gauvain; Lori Lamel; Holger Schwenk; Gilles Adda; Langzhou Chen

We report on the results of the first evaluations for the BBN/LIMSI system under the new DARPA EARS program. The evaluations were carried out for conversational telephone speech (CTS) and broadcast news (BN) for three languages: English, Mandarin, and Arabic. In addition to providing system descriptions and evaluation results, the paper highlights methods that worked well across the two domains and those few that worked well on one domain but not the other. For the BN evaluations, which had to be run under 10 times real-time, we demonstrated that a joint BBN/LIMSI system with a time constraint achieved better results than either system alone.

ieee automatic speech recognition and understanding workshop | 2003

Improved speaker adaptation using speaker dependent feature projections

Spyros Matsoukas; Richard M. Schwartz

We extend the formulation of constrained maximum likelihood linear regression (CMLLR) adaptation to take into account full covariance matrices in the adapted model, and we use it in conjunction with heteroscedastic linear discriminant analysis (HLDA) in order to estimate speaker dependent feature projections on both training and test data. Results on the broadcast news corpus show that the proposed HLDA adaptation technique is very effective, even when combined with traditional CMLLR and MLLR adaptation, providing up to 8% relative improvement in recognition accuracy.

international conference on acoustics, speech, and signal processing | 2006

Unsupervised Training on Large Amounts of Broadcast News Data

Jeff Z. Ma; Spyros Matsoukas; Owen Kimball; Richard M. Schwartz

This paper presents our recent effort that aims at improving our Arabic broadcast news (BN) recognition system by using thousands of hours of un-transcribed Arabic audio in the way of unsupervised training. Unsupervised training is first carried out on the 1,900-hour English topic detection and tracking (TDT) data and is compared with the lightly-supervised training method that we have used for the DARPA EARS evaluations. The comparison shows that unsupervised training produces a 21.7% relative reduction in word error rate (WER), which is comparable to the gain obtained with light supervision methods. The same unsupervised training strategy carried out on a similar amount of Arabic BN data produces an 11.6% relative gain. The gain, though considerable, is substantially smaller than what is observed on the English data. Our initial work towards understanding the reasons for this difference is also described

international conference on acoustics, speech, and signal processing | 2007

Language Model Adaptation in Machine Translation from Speech

Ivan Bulyko; Spyros Matsoukas; Richard M. Schwartz; Long Nguyen; John Makhoul

This paper investigates the use of several language model adaptation techniques applied to the task of machine translation from Arabic broadcast speech. Unsupervised and discriminative approaches slightly outperform the traditional perplexity-based optimization technique. Language model adaptation, when used for n-best rescoring, improves machine translation performance by 0.3-0.4 BLEU and reduces translation edit rate (TER) by 0.2-0.5% compared to an unadapted LM.

international conference on acoustics, speech, and signal processing | 2013

Developing a speaker identification system for the DARPA RATS project

Oldrich Plchot; Spyros Matsoukas; Pavel Matejka; Najim Dehak; Jeff Z. Ma; Sandro Cumani; Ondrej Glembek; Hynek Hermansky; Sri Harish Reddy Mallidi; Nima Mesgarani; Richard M. Schwartz; Mehdi Soufifar; Zheng-Hua Tan; Samuel Thomas; Bing Zhang; Xinhui Zhou

This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present results using multiple SID systems differing mainly in the algorithm used for voice activity detection (VAD) and feature extraction. We show that (a) unsupervised VAD performs as well supervised methods in terms of downstream SID performance, (b) noise-robust feature extraction methods such as CFCCs out-perform MFCC front-ends on noisy audio, and (c) fusion of multiple systems provides 24% relative improvement in EER compared to the single best system when using a novel SVM-based fusion algorithm that uses side information such as gender, language, and channel id.

international conference on acoustics, speech, and signal processing | 2005

Minimum phoneme error based heteroscedastic linear discriminant analysis for speech recognition

Bing Zhang; Spyros Matsoukas

We introduce a discriminative feature analysis method that seeks to minimize phoneme errors in lattice-based training frameworks. This technique, referred to as minimum phoneme error heteroscedastic linear discriminant analysis (MPE-HLDA), is shown to be more robust than traditional LDA methods in high dimensional spaces, and easy to incorporate with existing training procedures, such as HLDA-SAT and discriminative training of hidden Markov models (HMMs). Results on conversational telephone speech and broadcast news corpora also show that the recognition accuracy is improved using features selected by MPE-HLDA.

Explore More