Yannis Agiomyrgiannakis
University of Crete
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yannis Agiomyrgiannakis.
IEEE Transactions on Audio, Speech, and Language Processing | 2009
Yannis Agiomyrgiannakis; Yannis Stylianou
The harmonic representation of speech signals has found many applications in speech processing. This paper presents a novel statistical approach to model the behavior of harmonic phases. Phase information is decomposed into three parts: a minimum phase part, a translation term, and a residual term referred to as dispersion phase. Dispersion phases are modeled by wrapped Gaussian mixture models (WGMMs) using an expectation-maximization algorithm suitable for circular vector data. A multivariate WGMM-based phase quantizer is then proposed and constructed using novel scalar quantizers for circular random variables. The proposed phase modeling and quantization scheme is evaluated in the context of a narrowband harmonic representation of speech. Results indicate that it is possible to construct a variable-rate harmonic codec that is equivalent to iLBC at approximately 13 kbps.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Yannis Agiomyrgiannakis; Yannis Stylianou
In many speech-coding-related problems, there is available information and lost information that must be recovered. When there is significant correlation between the available and the lost information source, coding with side information (CSI) can be used to benefit from the mutual information between the two sources. In this paper, we consider CSI as a special VQ problem which will be referred to as conditional vector quantization (CVQ). A fast two-step divide-and-conquer solution is proposed. CVQ is then used in two applications: the recovery of highband (4-8 kHz) spectral envelopes for speech spectrum expansion and the recovery of lost narrowband spectral envelopes for voice over IP. Comparisons with alternative approaches like estimation and simple VQ-based schemes show that CVQ provides significant distortion reductions at very low bit rates. Subjective evaluations indicate that CVQ provides noticeable perceptual improvements over the alternative approaches
international conference on acoustics, speech, and signal processing | 2004
Yannis Agiomyrgiannakis; Yannis Stylianou
The paper addresses the problem of expanding the bandwidth of narrowband speech signals, focusing on the estimation of highband spectral envelopes. It is well known that there is not enough mutual information between the two bands. We show that this happens because narrowband spectral envelopes have a one-to-many relationship with highband spectral envelopes. A combined estimation/coding scheme for the missing spectral envelope is proposed, which employs this relationship to produce a high quality highband reconstruction, provided that there is an appropriate excitation. Subjective tests using the TIMIT database indicate that 134 bits/sec for the highband spectral envelope are adequate for a DCR (degradation category rating) score of 4.41. This is an improvement of 22.8% over a typical estimation of highband envelopes using the usual mapping functions, in terms of DCR score.
international conference on acoustics, speech, and signal processing | 2009
Yannis Agiomyrgiannakis; Olivier Rosec
Two ARX-LF-based source/filters models for speech signals are presented. A robust glottal inversion technique is used to deconvolve the signal into an excitation component and a filter component. The excitation component is further decomposed into an LF part and a residual part. The first model, referred to as the LF-vocoder, is a high quality vocoder that replaces the residual part with modulated noise. The second model uses a sinusoidal harmonic representation of the residual signal. The latter does not degrade the signal during analysis/synthesis and provides higher quality for small modification factors, while the former has the advantage of being a compact, fully parametric representation that is suitable for low-bit-rate speech coding as well as parametric speech synthesis applications.
conference of the international speech communication association | 2016
Heiga Zen; Yannis Agiomyrgiannakis; Niels Egberts; Fergus Henderson; Przemysław Szczepaniak
Acoustic models based on long short-term memory recurrent neural networks (LSTM-RNNs) were applied to statistical parametric speech synthesis (SPSS) and showed significant improvements in naturalness and latency over those based on hidden Markov models (HMMs). This paper describes further optimizations of LSTM-RNN-based SPSS for deployment on mobile devices; weight quantization, multi-frame inference, and robust inference using an {\epsilon}-contaminated Gaussian loss function. Experimental results in subjective listening tests show that these optimizations can make LSTM-RNN-based SPSS comparable to HMM-based SPSS in runtime speed while maintaining naturalness. Evaluations between LSTM-RNN- based SPSS and HMM-driven unit selection speech synthesis are also presented.
international conference on acoustics, speech, and signal processing | 2007
Athanasios Mouchtaris; Yannis Agiomyrgiannakis; Yannis Stylianou
Voice conversion methods have the objective of transforming speech spoken by a particular source speaker, so that it sounds as if spoken by a different target speaker. The majority of voice conversion methods is based on transforming the short-time spectral envelope of the source speaker, based on derived correspondences between the source and target vectors using training speech data from both speakers. These correspondences are usually obtained by segmenting the spectral vectors of one or both speakers into clusters, using soft (GMM-based) or hard (VQ-based) clustering. Here, we propose that voice conversion performance can be improved by taking advantage of the fact that often the relationship between the source and target vectors is one-to-many. In order to illustrate this, we propose that a VQ approach namely constrained vector quantization (CVQ), can be used for voice conversion. Results indicate that indeed such a relationship between the source and target data exists and can be exploited by following a CVQ-based function for voice conversion.
international conference on acoustics, speech, and signal processing | 2007
Yannis Agiomyrgiannakis; Yannis Stylianou
Harmonic sinusoidal representations of speech have proven to be useful in many speech processing tasks. This work focuses on the phase spectra of the harmonics and provides a methodology to analyze and subsequently to model the statistics of the harmonic phases. To do so, we propose the use of a wrapped Gaussian mixture model (WGMM), a model suitable for random variables that belong to circular spaces, and provide an expectation-maximization algorithm for training. The WGMM is then used to construct a phase quantizer. The quantizer is employed in a prototype variable rate narrow-band VoIP sinusoidal codec that is equivalent to iLBC in terms of PESQ-MOS, at ~13 kbps.
arXiv: Sound | 2016
Hideki Kawahara; Yannis Agiomyrgiannakis; Heiga Zen
This paper introduces a general and flexible framework for F0 and aperiodicity (additive non periodic component) analysis, specifically intended for high-quality speech synthesis and modification applications. The proposed framework consists of three subsystems: instantaneous frequency estimator and initial aperiodicity detector, F0 trajectory tracker, and F0 refinement and aperiodicity extractor. A preliminary implementation of the proposed framework substantially outperformed (by a factor of 10 in terms of RMS F0 estimation error) existing F0 extractors in tracking ability of temporally varying F0 trajectories. The front end aperiodicity detector consists of a complex-valued wavelet analysis filter with a highly selective temporal and spectral envelope. This front end aperiodicity detector uses a new measure that quantifies the deviation from periodicity. The measure is less sensitive to slow FM and AM and closely correlates with the signal to noise ratio.
international conference on acoustics, speech, and signal processing | 2011
Yannis Agiomyrgiannakis; Yannis Stylianou
Spectral envelopes of speech signals are typically obtained by making stationarity assumptions about the signal which are not always valid. The Adaptive Quasi-Harmonic Model (AQHM), a non-stationary signal model, is capable of capturing the time-varying quasi-harmonics in voiced speech. This paper suggests the use of AQHM in a multi-layer scheme which results in a high-resolution time-frequency representation of speech. This representation is then used for the recovery of the evolving spectral envelope and thus, a time-frequency spectral envelope estimation algorithm is introduced related to the Papoulis-Gerchberg algorithm for data extrapolation. Results on voiced speech sounds show that the estimated spectral envelopes are smoother than those estimated by state-of-the-art spectral envelope estimators, while maintaining the important spectral details of the speech spectrum.
international conference on acoustics, speech, and signal processing | 2006
Miltiadis Vasilakis; Yannis Agiomyrgiannakis; Yannis Stylianou
Harmonic models are commonly used in signal processing. The analysis of harmonic signals requires the solution of a symmetric Toeplitz system of equations. Levinson-based Toeplitz solvers have a O(n 2) complexity. This paper proposes an O(n) algorithm by encoding the inverse matrices required for the solution of the linear system to a few parameters in order to obtain an approximate solution for the harmonic model. For speech related applications, the proposed algorithm is 2-30 times faster than the Levinson algorithm, while degradation is minimal and memory requirements are very low