Petros Koutras
National Technical University of Athens
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Petros Koutras.
international conference on image processing | 2014
Kevis Maninis; Petros Koutras; Petros Maragos
This paper proposes a new visual framework for action recognition in videos, that consists of an energy detector coupled with a carefully designed multiband energy based filterbank. The tracking of video energy is performed using perceptually inspired 3D Gabor filters combined with ideas from Dominant Energy Analysis. Within this framework, we utilize different alternatives such as non-linear energy operators where actions are implicitly considered as manifestations of spatio-temporal oscillations in the dynamic visual stream. Texture and motion decomposition of actions through multiband filtering is the basis of our approach. This new energy-based saliency measure of action videos leads to the extraction of local spatio-temporal interest points that give promising results for the task of action recognition. Such interest points are processed further in order to formulate a robust representation of an action in a video. Theoretical formulation is supported by evaluation in two popular action databases, in which our method seems to outperform the state of the art.
international conference on image processing | 2015
Petros Koutras; Athanasia Zlatintsi; Elias Iosif; Athanasios Katsamanis; Petros Maragos; Alexandros Potamianos
In this paper, we present a new and improved synergistic approach to the problem of audio-visual salient event detection and movie summarization based on visual, audio and text modalities. Spatio-temporal visual saliency is estimated through a perceptually inspired frontend based on 3D (space, time) Gabor filters and frame-wise features are extracted from the saliency volumes. For the auditory salient event detection we extract features based on Teager-Kaiser Energy Operator, while text analysis incorporates part-of-speech tagging and affective modeling of single words on the movie subtitles. For the evaluation of the proposed system, we employ an elementary and non-parametric classification technique like KNN. Detection results are reported on the MovSum database, using objective evaluations against ground-truth denoting the perceptually salient events, and human evaluations of the movie summaries. Our evaluation verifies the appropriateness of the proposed methods compared to our baseline system. Finally, our newly proposed summarization algorithm produces summaries that consist of salient and meaningful events, also improving the comprehension of the semantics.
international conference on image processing | 2015
Petros Koutras; Petros Maragos
In this paper we demonstrate efficient methods for continuous estimation of eye gaze angles with application to sign language videos. The difficulty of the task lies on the fact that those videos contain images with low face resolution since they are recorded from distance. First, we proceed to the modeling of face and eyes region by training and fitting Global and Local Active Appearance Models (LAAM). Next, we propose a system for eye gaze estimation based on a machine learning approach. In the first stage of our method, we classify gaze into discrete classes using GMMs that are based either on the parameters of the LAAM, or on HOG descriptors for the eyes region. We also propose a method for computing gaze direction angles from GMM log-likelihoods. We qualitatively and quantitatively evaluate our methods on two sign language databases and compare with a state of the art geometric model of the eye based on LAAM landmarks, which provides an estimate in direction angles. Finally, we further evaluate our framework by getting ground truth data from an eye tracking system Our proposed methods, and especially the GMMs using LAAM parameters, demonstrate high accuracy and robustness even in challenging tasks.
Signal Processing-image Communication | 2015
Petros Koutras; Petros Maragos
The purpose of this paper is to demonstrate a perceptually based spatio-temporal computational framework for visual saliency estimation. We have developed a new spatio-temporal visual frontend based on biologically inspired 3D Gabor filters, which is applied on both the luminance and the color streams and produces spatio-temporal energy maps. These volumes are fused for computing a single saliency map and can detect spatio-temporal phenomena that static saliency models cannot find. We also provide a new movie database with eye-tracking annotation. We have evaluated our spatio-temporal saliency model on the widely used CRCNS-ORIG database as well as our new database using different fusion schemes and feature sets. The proposed spatio-temporal computational framework incorporates many ideas based on psychological evidences and yields significant improvements on spatio-temporal saliency estimation. HighlightsSpatio-temporal computational framework based on psychological evidences.Spatio-temporal and static energies by using the same multiscale 3D Gabor filterbank.Motion information in different scales for both luminance and color stream modalities.A new movie database with eye-tracking annotation.Significant improvements on spatio-temporal saliency estimation.
international conference on acoustics, speech, and signal processing | 2015
Petros Maragos; Petros Koutras
This paper introduces a theory for max-product systems by analyzing them as discrete-time nonlinear dynamical systems that obey a superposition of a weighted maximum type and evolve on nonlinear spaces which we call complete weighted lattices. Special cases of such systems have found applications in speech recognition as weighted finite-state transducers and in belief propagation on graphical models. Our theoretical approach establishes their representation in state and input-output spaces using monotone lattice operators, finds analytically their state and output responses using nonlinear convolutions, studies their stability, and provides optimal solutions to solving max-product matrix equations. Further, we apply these systems to extend the Viterbi algorithm in HMMs by adding control inputs and model cognitive processes such as detecting audio and visual salient events in multimodal video streams, which shows good performance as compared to human attention.
quality of multimedia experience | 2015
Athanasia Zlatintsi; Petros Koutras; Niki Efthymiou; Petros Maragos; Alexandros Potamianos; Katerina Pastra
In this paper we present a movie summarization system and we investigate what composes high quality movie summaries in terms of user experience evaluation. We propose state-of-the-art audio, visual and text techniques for the detection of perceptually salient events from movies. The evaluation of such computational models is usually based on the comparison of the similarity between the system-detected events and some ground-truth data. For this reason, we have developed the MovSum movie database, which includes sensory and semantic saliency annotation as well as cross-media relations, for objective evaluations. The automatically produced movie summaries were qualitatively evaluated, in an extensive human evaluation, in terms of informativeness and enjoyability accomplishing very high ratings up to 80% and 90%, respectively, which verifies the appropriateness of the proposed methods.
human robot interaction | 2017
Athanasia Zlatintsi; Isidoros Rodomagoulakis; Vassilis Pitsikalis; Petros Koutras; Nikolaos Kardaris; Xanthi S. Papageorgiou; Costas S. Tzafestas; Petros Maragos
We explore new aspects on assistive living via smart social human-robot interaction (HRI) involving automatic recognition of multimodal gestures and speech in a natural interface, providing social features in HRI. We discuss a whole framework of resources, including datasets and tools, briefly shown in two real-life use cases for elderly subjects: a multimodal interface of an assistive robotic rollator and an assistive bathing robot. We discuss these domain specific tasks, and open source tools, which can be used to build such HRI systems, as well as indicative results. Sharing such resources can open new perspectives in assistive HRI.
international conference on image processing | 2016
Georgia Panagiotaropoulou; Petros Koutras; Athanasios Katsamanis; Petros Maragos; Athanasia Zlatintsi; Athanassios Protopapas; Efstratios Karavasilis; Nikolaos Smyrnis
In this study, we make use of brain activation data to investigate the perceptual plausibility of a visual and an auditory model for visual and auditory saliency in video processing. These models have already been successfully employed in a number of applications. In addition, we experiment with parameters, modifications and suitable fusion schemes. As part of this work, fMRI data from complex video stimuli were collected, on which we base our analysis and results. The core part of the analysis involves the use of well-established methods for the manipulation of fMRI data and the examination of variability across brain responses of different individuals. Our results indicate a success in confirming the value of these saliency models in terms of perceptual plausibility.
Eurasip Journal on Image and Video Processing | 2017
Athanasia Zlatintsi; Petros Koutras; Georgios Evangelopoulos; Nikolaos Malandrakis; Niki Efthymiou; Katerina Pastra; Alexandros Potamianos; Petros Maragos
multidimensional signal processing workshop | 2018
Petros Koutras; Athanasia Zlatinsi; Petros Maragos
Collaboration
Dive into the Petros Koutras's collaboration.
Panagiotis Paraskevas Filntisis
National Technical University of Athens
View shared research outputs