Theodoros Petsatodis
Aalborg University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Theodoros Petsatodis.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Theodoros Petsatodis; Christos Boukis; Fotios Talantzis; Zheng-Hua Tan; Ramjee Prasad
This paper proposes a robust voice activity detector (VAD) based on the observation that the distribution of speech captured with far-field microphones is highly varying, depending on the noise and reverberation conditions. The proposed VAD employs a convex combination scheme comprising three statistical distributions - a Gaussian, a Laplacian, and a two-sided Gamma - to effectively model captured speech. This scheme shows increased ability to adapt to dynamic acoustic environments. The contribution of each distribution to this convex combination is automatically adjusted based on the statistical characteristics of the instantaneous audio input. To further improve the performance of the system, an adaptive threshold is introduced, while a decision-smoothing scheme caters to the intra-frame correlation of speech signals. Extensive experiments under realistic scenarios support the proposed approach of combining several models for increased adaptation and performance.
international conference on digital signal processing | 2009
Theodoros Petsatodis; Aristodemos Pnevmatikakis; Christos Boukis
An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post-decision scheme. The Mel-Frequency Cepstral Coefficients and the vertical mouth opening are the chosen audio and visual features respectively, both augmented with their first-order derivatives. The proposed system is assessed using far-field recordings from four different speakers and under various levels of additive white Gaussian noise, to obtain a performance superior than that which each unimodal component alone can achieve.
Biomedical Signal Processing and Control | 2012
Charalampos Doukas; Theodoros Petsatodis; Christos Boukis; Ilias Maglogiannis
Abstract Results of clinical studies suggest that there is a relationship between breathing-related sleep disorders and behavioral disorder and health effects. Apnea is considered one of the major sleep disorders with great accession in population and significant impact on patients health. Symptoms include disruption of oxygenation, snoring, choking sensations, apneic episodes, poor concentration, memory loss, and daytime somnolence. Diagnosis of apnea and breath disorders involves monitoring patients biosignals and breath during sleep in specialized clinics requiring expensive equipment and technical personnel. This paper discusses the design and technical details of an integrated low-cost system capable for preliminary detection of sleep breath disorders at patients home utilizing patient sound signals. The paper describes the proposed architecture and the corresponding HW and SW modules, along with a preliminary evaluation.
international conference on digital signal processing | 2009
Theodoros Petsatodis; Aristodemos Pnevmatikakis; Fotios Talantzis; U. Diaz
Cognitive care for the elderly can be provided by training their cognitive skills in order to reduce age-related decline of cognitive capabilities. This is typically achieved with cognitive games that can benefit from computer interfaces. In this paper we present the design and use of a multi-touch surface that serves as a front-end for games designed to specifically support the decline in declarative and prospective memory. The selected games are a result of both user studies that identified a set of requirements for the design of the surface and previous research on this issue. The system can be potentially embedded on the surface of a table. It comprises of a modified Thin Film Transistor panel along with an acrylic surface. A video camera is used to capture the Frustrated Total Internal Reflection of fingers when these are subjected to infrared illuminators. Multi-touch functionality is achieved by a Kalman-based multiple target tracking algorithm that runs upon the feed of the camera and detects any number of fingers and their movement. Results indicate that users perceive the experience as a positive and functional process.
international conference on digital signal processing | 2009
Theodoros Petsatodis; Christos Boukis
An algorithm suitable for voice activity detection under reverberant conditions is proposed in this paper. Due to the use of far-filed microphones the proposed solution processes speech signals of highly-varying intensity and signal to noise ratio, that are contaminated with several echoes. The core of the system is a pair of Hidden Markov Models, that effectively model the speech presence and speech absence situations. To minimise mis-detections an adaptive threshold is used, while a hang-over scheme caters for the intra-frame correlation of speech signals. Experimental results conducted in a typical office room using a single far field microphone to support the analysis.
Journal of the Acoustical Society of America | 2013
Theodoros Petsatodis; Fotios Talantzis; Christos Boukis; Zheng-Hua Tan; Ramjee Prasad
Time delay estimation (TDE) is a fundamental component of speaker localization and tracking algorithms. Most of the existing systems are based on the generalized cross-correlation method assuming gaussianity of the source. It has been shown that the distribution of speech, captured with far-field microphones, is highly varying, depending on the noise and reverberation conditions. Thus the performance of TDE is expected to fluctuate depending on the underlying assumption for the speech distribution, being also subject to multi-path reflections and competitive background noise. This paper investigates the effect upon TDE when modeling the source signal with different speech-based distributions. An information theoretical TDE method indirectly encapsulating higher order statistics (HOS) formed the basis of this work. The underlying assumption of Gaussian distributed source has been replaced by that of generalized Gaussian distribution that allows evaluating the problem under a larger set of speech-shaped distributions, ranging from Gaussian to Laplacian and Gamma. Closed forms of the univariate and multivariate entropy expressions of the generalized Gaussian distribution are derived to evaluate the TDE. The results indicate that TDE based on the specific criterion is independent of the underlying assumption for the distribution of the source, for the same covariance matrix.
international conference on digital signal processing | 2013
Theodoros Petsatodis; Christos Boukis; Fotios Talantzis; Lazaros Polymenakos
Speech processing systems often operate in noisy and reverberant environments. Their operation is subject to the accuracy of the underlying noise reduction algorithm, that aims to reduce noise present in the signals that are captured by the employed microphones. Under adverse conditions a noise reduction scheme, failing to perform adequately, will produce results characterised by speech distortions (metallic or clipping voice) and/or fluctuating residual background noises, the result of inaccuracy in estimating the noise spectrum, known as musical noise. In this paper, performance enhancement when employing a statistical model based Voice Activity Detector (VAD) in combination with noise reduction, is presented in terms of residual noise suppression within silence intervals and short pauses during speech production. To this end, an efficient noise reduction architecture has been developed relying on cascading a previously presented presented denoising scheme. This way, we initially subtract primary noise and then consequently use the same technique, with specific parameter adjustments, to remove spectral subtraction artefacts such as musical noise or noise leftovers that were generated by the first stage. Simulations under various noise types and intensities indicate significant performance enhancement when employing the proposed system.
international conference on digital signal processing | 2013
Aristodemos Pnevmatikakis; Andreas Stergiou; Theodoros Petsatodis; Nikolaos Katsarakis
Particle filters allow for visual trackers with nonlinear measurements. In this paper we consider three different non-linear visual measurement cues, based on object detection, foreground segmentation and colour matching. Novel ways to obtain robust measurement likelihoods under a unified representation scheme are discussed, followed by a likelihood combination scheme for fusion. The resulting single and multi-cue particle filter trackers are compared in the scope of face tracking.
International Federation for Medical and Biological Engineering Proceedings | 2010
Charalampos Doukas; Theodoros Petsatodis; Ilias Maglogiannis
Apnea is considered one of the major sleep disorders with great accession in population and significant impact on patient’s health. Symptoms include disruption of oxygenation, snoring, choking sensations, apneic episodes, poor concentration, memory loss, and daytime somnolence. Diagnosis of apnea involves monitoring patient’s biosignals and breath during sleep in specialized clinics requiring expensive equipment and technical personnel. This paper discusses the design and technical details of a platform capable for preliminary detection of sleep apnea at patient’s home utilizing snore analysis.
Signal Processing (CIWSP 2013), 2013 Constantinides International Workshop on | 2013
Theodoros Petsatodis; Fotios Talantzis; Christos Boukis