Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Vassilis Pitsikalis is active.

Publication


Featured researches published by Vassilis Pitsikalis.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

George Papandreou; Athanassios Katsamanis; Vassilis Pitsikalis; Petros Maragos

While the accuracy of feature measurements heavily depends on changing environmental conditions, studying the consequences of this fact in pattern recognition tasks has received relatively little attention to date. In this paper, we explicitly take feature measurement uncertainty into account and show how multimodal classification and learning rules should be adjusted to compensate for its effects. Our approach is particularly fruitful in multimodal fusion scenarios, such as audiovisual speech recognition, where multiple streams of complementary time-evolving features are integrated. For such applications, provided that the measurement noise uncertainty for each feature stream can be estimated, the proposed framework leads to highly adaptive multimodal fusion rules which are easy and efficient to implement. Our technique is widely applicable and can be transparently integrated with either synchronous or asynchronous multimodal sequence integration architectures. We further show that multimodal fusion methods relying on stream weights can naturally emerge from our scheme under certain assumptions; this connection provides valuable insights into the adaptivity properties of our multimodal uncertainty compensation approach. We show how these ideas can be practically applied for audiovisual speech recognition. In this context, we propose improved techniques for person-independent visual feature extraction and uncertainty estimation with active appearance models, and also discuss how enhanced audio features along with their uncertainty estimates can be effectively computed. We demonstrate the efficacy of our approach in audiovisual speech recognition experiments on the CUAVE database using either synchronous or asynchronous multimodal integration models.


computer vision and pattern recognition | 2011

Advances in phonetics-based sub-unit modeling for transcription alignment and sign language recognition

Vassilis Pitsikalis; Stavros Theodorakis; Christian Vogler; Petros Maragos

We explore novel directions for incorporating phonetic transcriptions into sub-unit based statistical models for sign language recognition. First, we employ a new symbolic processing approach for converting sign language annotations, based on HamNoSys symbols, into structured sequences of labels according to the Posture-Detention-Transition-Steady Shift phonetic model. Next, we exploit these labels, and their correspondence with visual features to construct phonetics-based statistical sub-unit models. We also align these sequences, via the statistical sub-unit construction and decoding, to the visual data to extract time boundary information that they would lack otherwise. The resulting phonetic sub-units offer new perspectives for sign language analysis, phonetic modeling, and automatic recognition. We evaluate this approach via sign language recognition experiments on an extended Lemmas Corpus of Greek Sign Language, which results not only in improved performance compared to pure data-driven approaches, but also in meaningful phonetic sub-unit models that can be further exploited in interdisciplinary sign language analysis.


international conference on acoustics, speech, and signal processing | 2002

Speech analysis and feature extraction using chaotic models

Vassilis Pitsikalis; Petros Maragos

Nonlinear systems based on chaos theory can model various aspects of the nonlinear dynamic phenomena occuring during speech production. In this paper, we explore modem methods and algorithms from chaotic systems theory for modeling speech signals in a multidimensional phase space and for extracting nonlinear acoustic features. Further, we integrate these chaotic-type features with the standard linear ones (based on cepstrum) to develop a generalized hybrid set of short-time acoustic features for speech signals and demonstrate its efficacy by showing significant improvements in HMM-based word recognition.


IEEE Signal Processing Letters | 2006

Filtered Dynamics and Fractal Dimensions for Noisy Speech Recognition

Vassilis Pitsikalis; Petros Maragos

We explore methods from fractals and dynamical systems theory for robust processing and recognition of noisy speech. A speech signal is embedded in a multidimensional phase-space and is subsequently filtered exploiting aspects of its unfolded dynamics. Invariant measures (fractal dimensions) of the filtered signal are used as features in automatic speech recognition (ASR). We evaluate the new proposed features as well as the previously proposed multiscale fractal dimension via ASR experiments on the Aurora 2 database. The conducted experiments demonstrate relative improved word accuracy for the fractal features, especially at lower signal-to-noise ratio, when they are combined with the mel-frequency cepstral coefficients


international conference on acoustics, speech, and signal processing | 2010

Model-level data-driven sub-units for signs in videos of continuous Sign Language

Stavros Theodorakis; Vassilis Pitsikalis; Petros Maragos

We investigate the issue of sign language automatic phonetic subunit modeling, that is completely data driven and without any prior phonetic information. A first step of visual processing leads to simple and effective region-based visual features. Prior to the sub-unit modeling we propose to employ a pronunciation clustering step with respect to each sign. Afterwards, for each sign and pronunciation group we find the time segmentation at the hidden Markov model (HMM) level. The models employed refer to movements as a sequence of dominant hand positions. The constructed segments are exploited explicitly at the model level via hierarchical clustering of HMMs and lead to the data-driven movement sub-unit construction. The constructed movement sub-units are evaluated in qualitative analysis experiments on data from the Boston University (BU)- 400 American Sign Language corpus showing promising results.


multimedia signal processing | 2007

Multimodal Fusion and Learning with Uncertain Features Applied to Audiovisual Speech Recognition

George Papandreou; Athanassios Katsamanis; Vassilis Pitsikalis; Petros Maragos

We study the effect of uncertain feature measurements and show how classification and learning rules should be adjusted to compensate for it. Our approach is particularly fruitful in multimodal fusion scenarios, such as audio-visual speech recognition, where multiple streams of complementary features whose reliability is time-varying are integrated. For such applications, by taking the measurement noise uncertainty of each feature stream into account, the proposed framework leads to highly adaptive multimodal fusion rules for classification and learning which are widely applicable and easy to implement. We further show that previous multimodal fusion methods relying on stream weights fall under our scheme under certain assumptions; this provides novel insights into their applicability for various tasks and suggests new practical ways for estimating the stream weights adaptively. The potential of our approach is demonstrated in audio-visual speech recognition experiments.


european conference on computer vision | 2010

Hand tracking and affine shape-appearance handshape sub-units in continuous sign language recognition

Anastasios Roussos; Stavros Theodorakis; Vassilis Pitsikalis; Petros Maragos

We propose and investigate a framework that utilizes novel aspects concerning probabilistic and morphological visual processing for the segmentation, tracking and handshape modeling of the hands, which is used as front-end for sign language video analysis. Our ultimate goal is to explore the automatic Handshape Sub-Unit (HSU) construction and moreover the exploitation of the overall system in automatic sign language recognition (ASLR). We employ probabilistic skin color detection followed by the proposed morphological algorithms and related shape filtering for fast and reliable segmentation of hands and head. This is then fed to our hand tracking system which emphasizes robust handling of occlusions based on forward-backward prediction and incorporation of probabilistic constraints. The tracking is exploited by an Affine-invariant Modeling of hand Shape-Appearance images, offering a compact and descriptive representation of the hand configurations. We further propose that the handshape features extracted via the fitting of this model are utilized to construct in an unsupervised way basic HSUs. We first provide intuitive results on the HSU to sign mapping and further quantitatively evaluate the integrated system and the constructed HSUs on ASLR experiments at the sub-unit and sign level. These are conducted on continuous SL data from the BU400 corpus and investigate the effect of the involved parameters. The experiments indicate the effectiveness of the overall approach and especially for the modeling of handshapes when incorporated in the HSU-based framework showing promising results.


international conference on acoustics, speech, and signal processing | 2016

Multimodal human action recognition in assistive human-robot interaction

Isidoros Rodomagoulakis; Nikolaos Kardaris; Vassilis Pitsikalis; E. Mavroudi; Athanasios Katsamanis; Antigoni Tsiami; Petros Maragos

Within the context of assistive robotics we develop an intelligent interface that provides multimodal sensory processing capabilities for human action recognition. Human action is considered in multimodal terms, containing inputs such as audio from microphone arrays, and visual inputs from high definition and depth cameras. Exploring state-of-the-art approaches from automatic speech recognition, and visual action recognition, we multimodally recognize actions and commands. By fusing the unimodal information streams, we obtain the optimum multimodal hypothesis which is to be further exploited by the active mobility assistance robot in the framework of the MOBOT EU research project. Evidence from recognition experiments shows that by integrating multiple sensors and modalities, we increase multimodal recognition performance in the newly acquired challenging dataset involving elderly people while interacting with the assistive robot.


Journal of Machine Learning Research | 2013

Dynamic affine-invariant shape-appearance handshape features and classification in sign language videos

Anastasios Roussos; Stavros Theodorakis; Vassilis Pitsikalis; Petros Maragos

We propose the novel approach of dynamic affine-invariant shape-appearance model (Aff-SAM) and employ it for handshape classification and sign recognition in sign language (SL) videos. Aff-SAM offers a compact and descriptive representation of hand configurations as well as regularized model-fitting, assisting hand tracking and extracting handshape features. We construct SA images representing the hands shape and appearance without landmark points. We model the variation of the images by linear combinations of eigenimages followed by affine transformations, accounting for 3D hand pose changes and improving models compactness. We also incorporate static and dynamic handshape priors, offering robustness in occlusions, which occur often in signing. The approach includes an affine signer adaptation component at the visual level, without requiring training from scratch a new singer-specific model. We rather employ a short development data set to adapt the models for a new signer. Experiments on the Boston-University-400 continuous SL corpus demonstrate improvements on handshape classification when compared to other feature extraction approaches. Supplementary evaluations of sign recognition experiments, are conducted on a multi-signer, 100-sign data set, from the Greek sign language lemmas corpus. These explore the fusion with movement cues as well as signer adaptation of Aff-SAM to multiple signers providing promising results.


international conference on image processing | 2010

Affine-invariant modeling of shape-appearance images applied on sign language handshape classification

Anastasios Roussos; Stavros Theodorakis; Vassilis Pitsikalis; Petros Maragos

We propose a novel affine-invariant modeling of hand shape-appearance images, which offers a compact and descriptive representation of the hand configurations. Our approach combines: 1) A hybrid representation of both shape and appearance of the hand that models the handshapes without any landmark points. 2) Modeling of the shape-appearance images with a linear combination of variation images that is followed by an affine transformation, which accounts for modest pose variation. 3) Finally, an optimization based fitting process that results on the estimated variation image coefficients that are further employed as features. The proposed modeling is applied on handshapes from Sign Language video data after segmentation and tracking. It is evaluated on extensive experiments of handshape classification, which investigate the effect of the involved parameters and moreover provide a variety of comparisons to base-line approaches found in the literature. The results of at least 10.5% absolute improvement indicate the effectiveness of our approach in the handshape classification problem.

Collaboration


Dive into the Vassilis Pitsikalis's collaboration.

Top Co-Authors

Avatar

Petros Maragos

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Stavros Theodorakis

National and Kapodistrian University of Athens

View shared research outputs
Top Co-Authors

Avatar

Isidoros Rodomagoulakis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Costas S. Tzafestas

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Athanasios Katsamanis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Nikolaos Kardaris

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Athanassios Katsamanis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Anastasios Roussos

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Stavroula-Evita Fotinea

National Technical University of Athens

View shared research outputs
Researchain Logo
Decentralizing Knowledge