Christos Koniaris
Royal Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christos Koniaris.
international conference on acoustics, speech, and signal processing | 2010
Christos Koniaris; Saikat Chatterjee; W. Bastiaan Kleijn
We describe a method to select features for speech recognition that is based on a quantitative model of the human auditory periphery. The method maximizes the similarity of the geometry of the space spanned by the subset of features and the geometry of the space spanned by the auditory model output. The selection method uses a spectro-temporal auditory model that captures both frequency- and time-domain masking. The selection method is blind to the meaning of speech and does not require annotated speech data. We apply the method to the selection of a subset of features from a conventional set consisting of mel cepstra and their first-order and second-order time derivatives. Although our method uses only knowledge of the human auditory periphery, the experimental results show that it performs significantly better than feature-reduction algorithms based on linear and heteroscedastic discriminant analysis that require training with annotated speech data.
Journal of the Acoustical Society of America | 2010
Christos Koniaris; Marcin Kuropatwinski; W. Bastiaan Kleijn
It is shown that robust dimension-reduction of a feature set for speech recognition can be based on a model of the human auditory system. Whereas conventional methods optimize classification performance, the proposed method exploits knowledge implicit in the auditory periphery, inheriting its robustness. Features are selected to maximize the similarity of the Euclidean geometry of the feature domain and the perceptual domain. Recognition experiments using mel-frequency cepstral coefficients (MFCCs) confirm the effectiveness of the approach, which does not require labeled training data. For noisy data the method outperforms commonly used discriminant-analysis based dimension-reduction methods that rely on labeling. The results indicate that selecting MFCCs in their natural order results in subsets with good performance.
international conference on acoustics, speech, and signal processing | 2011
Christos Koniaris; Olov Engwall
One of the difficulties in second language (L2) learning is the weakness in discriminating between acoustic diversity within an L2 phoneme category and between different categories. In this paper, we describe a general method to quantitatively measure the perceptual difference between a group of native and individual nonnative speakers. Normally, this task includes subjective listening tests and/or a thorough linguistic study. We instead use a totally automated method based on a psycho-acoustic auditory model. For a certain phoneme class, we measure the similarity of the Euclidean space spanned by the power spectrum of a native speech signal and the Euclidean space spanned by the auditory model output. We do the same for a non-native speech signal. Comparing the two similarity measurements, we find problematic phonemes for a given speaker. To validate our method, we apply it to different groups of non-native speakers of various first language (L1) backgrounds. Our results are verified by the theoretical findings in literature obtained from linguistic studies.
international conference on acoustics, speech, and signal processing | 2007
Georgios Tsontzos; Vassilios Diakoloukas; Christos Koniaris; Vassilios Digalakis
Although hidden Markov models (HMMs) provide a relatively efficient modeling framework for speech recognition, they suffer from several shortcomings which set upper bounds in the performance that can be achieved. Alternatively, linear dynamic models (LDM) can be used to model speech segments. Several implementations of LDM have been proposed in the literature. However, all had a restricted structure to satisfy identifiability constraints. In this paper, we relax all these constraints and use a general, canonical form for a linear state-space system that guarantees identifiability for arbitrary state and observation vector dimensions. For this system, we present a novel, element-wise maximum likelihood (ML) estimation method. Classification experiments on the AURORA2 speech database show performance gains compared to HMMs, particularly on highly noisy conditions.
conference of the international speech communication association | 2011
Christos Koniaris; Olov Engwall
conference of the international speech communication association | 2009
Saikat Chatterjee; Christos Koniaris; W. Bastiaan Kleijn
Speech Communication | 2013
Christos Koniaris; Giampiero Salvi; Olov Engwall
International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden, June, 2012 | 2012
Christos Koniaris; Olov Engwall; Giampiero Salvi
conference of the international speech communication association | 2012
Christos Koniaris; Olov Engwall; Giampiero Salvi
Archive | 2012
Christos Koniaris