Vassilios Diakoloukas

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vassilios Diakoloukas is active.

Explore More

Publication

Featured researches published by Vassilios Diakoloukas.

international conference on acoustics, speech, and signal processing | 1997

Development of dialect-specific speech recognizers using adaptation methods

Vassilios Diakoloukas; Vassilios Digalakis; Leonardo Neumeyer; Jaan Kaja

Several adaptation approaches have been proposed in an effort to improve the speech recognition performance in mismatched conditions. However, the application of these approaches had been mostly constrained to the speaker or channel adaptation tasks. We first investigate the effect of mismatched dialects between training and testing speakers in an automatic speech recognition (ASR) system. We find that a mismatch in dialects significantly influences the recognition accuracy. Consequently, we apply several adaptation approaches to develop a dialect-specific recognition system using a dialect-dependent system trained on a different dialect and a small number of training sentences from the target dialect. We show that adaptation improves the recognition performance dramatically with small amounts of training sentences. We further show that, although the recognition performance of traditionally trained systems highly degrades as we decrease the number of training speakers, the performance of adapted systems is not influenced so much.

Computer Speech & Language | 2001

Maximum likelihood stochastic transformation adaptation for medium and small data sets

Constantinos Boulis; Vassilios Diakoloukas; Vassilios Digalakis

Speaker adaptation is recognized as an essential part of today?s large-vocabulary automatic speech recognition systems. A family of techniques that has been extensively applied for limited adaptation data is transformation-based adaptation. In transformation-based adaptation we partition our parameter space in a set of classes, estimate a transform (usually linear) for each class and apply the same transform to all the components of the class. It is known, however, that additional gains can be made if we do not constrain the components of each class to use the same transform. In this paper two speaker adaptation algorithms are described. First, instead of estimating one linear transform for each class (as maximum likelihood linear regression (MLLR) does, for example) we estimate multiple linear transforms per class of models and a transform weights vector which is specific to each component (Gaussians in our case). This in effect means that each component receives its own transform without having to estimate each one of them independently. This scheme, termed maximum likelihood stochastic transformation (MLST) achieves a good trade-off between robustness and acoustic resolution. MLST is evaluated on the Wall Street Journal(WSJ) corpus for non-native speakers and it is shown that in the case of 40 adaptation sentences the algorithm outperforms MLLR by more than 13%. In the second half of this paper, we introduce a variant of the MLST designed to operate under sparsity of data. Since the majority of the adaptation parameters are the transformations, we estimate them on the training speakers and adapt to a new speaker by estimating the transform weights only. First we cluster the speakers in a number of sets and estimate the transformations on each cluster. The new speaker will use transformations from all clusters to perform adaptation. This method, termed basis transformation, can be seen as a speaker similarity scheme. Experimental results on the WSJ show that when basis transformation is cascaded with MLLR marginal gains can be obtained from MLLR only, for adaptation of native speakers.

international conference on acoustics, speech, and signal processing | 2007

Estimation of General Identifiable Linear Dynamic Models with an Application in Speech Recognition

Georgios Tsontzos; Vassilios Diakoloukas; Christos Koniaris; Vassilios Digalakis

Although hidden Markov models (HMMs) provide a relatively efficient modeling framework for speech recognition, they suffer from several shortcomings which set upper bounds in the performance that can be achieved. Alternatively, linear dynamic models (LDM) can be used to model speech segments. Several implementations of LDM have been proposed in the literature. However, all had a restricted structure to satisfy identifiability constraints. In this paper, we relax all these constraints and use a general, canonical form for a linear state-space system that guarantees identifiability for arbitrary state and observation vector dimensions. For this system, we present a novel, element-wise maximum likelihood (ML) estimation method. Classification experiments on the AURORA2 speech database show performance gains compared to HMMs, particularly on highly noisy conditions.

international conference on acoustics, speech, and signal processing | 2014

Linear dynamical models in speech synthesis

Vassilios Tsiaras; Ranniery Maia; Vassilios Diakoloukas; Yannis Stylianou; Vassilios Digalakis

Hidden Markov models (HMMs) are becoming the dominant approach for text-to-speech synthesis (TTS). HMMs provide an attractive acoustic modeling scheme which has been exhaustively investigated and developed for many years. Modern HMM-based speech synthesizers have approached the quality of the best state-of-the-art unit selection systems. However, we believe that statistical parametric speech synthesis has not reached its potential, since HMMs are limited by several assumptions which do not apply to the properties of speech. We, therefore, propose in this paper to use Linear Dynamical Models (LDMs) instead of HMMs. LDMs can better model the dynamics of speech and can produce a naturally smoother trajectory of the synthesized speech. We perform a series of experiments using different system configurations to check on the performance of LDMs for speech synthesis. We show that LDM-based synthesizers can outperform HMM-based ones in terms of cepstral distance and are a very promising acoustic modeling alternative for statistical parametric TTS.

IEEE Signal Processing Letters | 2016

Global Variance in Speech Synthesis With Linear Dynamical Models

Vassilis Tsiaras; Ranniery Maia; Vassilios Diakoloukas; Yannis Stylianou; Vassilios Digalakis

Linear Dynamical Models (LDMs) have been used in speech synthesis recently as an alternative to hidden Markov models (HMMs). Among the advantages of LDMs are the ability to capture the dynamics of speech and the achievement of synthesized speech quality similar to HMM-based speech systems on a smaller footprint. However, such as in the HMM case, LDMs produce over-smoothed trajectories of speech parameters, resulting in muffled quality of synthetic speech. Inspired by a similar problem found in HMM-based speech synthesis, where the naturalness of the synthesized speech is greatly improved when the global variance (GV) is compensated, this paper proposes a novel speech parameter generation algorithm that considers GV in LDM-based speech synthesis. Experimental results show that the application of GV during parameter generation significantly improves speech quality.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

Gaussian Mixture Clustering and Language Adaptation for the Development of a New Language Speech Recognition System

Nikos Chatzichrisafis; Vassilios Diakoloukas; Vassilios Digalakis; Costas Harizakis

The porting of a speech recognition system to a new language is usually a time-consuming and expensive process since it requires collecting, transcribing, and processing a large amount of language-specific training sentences. This work presents techniques for improved cross-language transfer of speech recognition systems to new target languages. Such techniques are particularly useful for target languages where minimal amounts of training data are available. We describe a novel method to produce a language-independent system by combining acoustic models from a number of source languages. This intermediate language-independent acoustic model is used to bootstrap a target-language system by applying language adaptation. For our experiments, we use acoustic models of seven source languages to develop a target Greek acoustic model. We show that our technique significantly outperforms a system trained from scratch when less than 8 h of read speech is available

conference of the international speech communication association | 2003