Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stefanos D. Kollias is active.

Publication


Featured researches published by Stefanos D. Kollias.


IEEE Signal Processing Magazine | 2001

Emotion recognition in human-computer interaction

Roderick Cowie; Ellen Douglas-Cowie; Nicolas Tsapatsoulis; George N. Votsis; Stefanos D. Kollias; Winfried A. Fellenz; John Taylor

Two channels have been distinguished in human interaction: one transmits explicit messages, which may be about anything or nothing; the other transmits implicit messages about the speakers themselves. Both linguistics and technology have invested enormous efforts in understanding the first, explicit channel, but the second is not as well understood. Understanding the other partys emotions is one of the key tasks associated with the second, implicit channel. To tackle that task, signal processing and analysis techniques have to be developed, while, at the same time, consolidating psychological and linguistic analyses of emotion. This article examines basic issues in those areas. It is motivated by the PKYSTA project, in which we aim to develop a hybrid system capable of using information from faces and voices to recognize peoples emotions.


computer vision and pattern recognition | 2009

Dense saliency-based spatiotemporal feature points for action recognition

Konstantinos Rapantzikos; Yannis S. Avrithis; Stefanos D. Kollias

Several spatiotemporal feature point detectors have been used in video analysis for action recognition. Feature points are detected using a number of measures, namely saliency, cornerness, periodicity, motion activity etc. Each of these measures is usually intensity-based and provides a different trade-off between density and informativeness. In this paper, we use saliency for feature point detection in videos and incorporate color and motion apart from intensity. Our method uses a multi-scale volumetric representation of the video and involves spatiotemporal operations at the voxel level. Saliency is computed by a global minimization process constrained by pure volumetric constraints, each of them being related to an informative visual aspect, namely spatial proximity, scale and feature similarity (intensity, color, motion). Points are selected as the extrema of the saliency response and prove to balance well between density and informativeness. We provide an intuitive view of the detected points and visual comparisons against state-of-the-art space-time detectors. Our detector outperforms them on the KTH dataset using nearest-neighbor classifiers and ranks among the top using different classification frameworks. Statistics and comparisons are also performed on the more difficult Hollywood human actions (HOHA) dataset increasing the performance compared to current published results.


EURASIP Journal on Advances in Signal Processing | 2002

Parameterized facial expression synthesis based on MPEG-4

Amaryllis Raouzaiou; Nicolas Tsapatsoulis; Kostas Karpouzis; Stefanos D. Kollias

In the framework of MPEG-4, one can include applications where virtual agents, utilizing both textual and multisensory data, including facial expressions and nonverbal speech help systems become accustomed to the actual feelings of the user. Applications of this technology are expected in educational environments, virtual collaborative workplaces, communities, and interactive entertainment. Facial animation has gained much interest within the MPEG-4 framework; with implementation details being an open research area (Tekalp, 1999). In this paper, we describe a method for enriching human computer interaction, focusing on analysis and synthesis of primary and intermediate facial expressions (Ekman and Friesen (1978)). To achieve this goal, we utilize facial animation parameters (FAPs) to model primary expressions and describe a rule-based technique for handling intermediate ones. A relation between FAPs and the activation parameter proposed in classical psychological studies is established, leading to parameterized facial expression analysis and synthesis notions, compatible with the MPEG-4 standard.


IEEE Transactions on Circuits and Systems for Video Technology | 1998

Low bit-rate coding of image sequences using adaptive regions of interest

Nikolaos D. Doulamis; Anastasios D. Doulamis; Dimitrios Kalogeras; Stefanos D. Kollias

An adaptive algorithm for extracting foreground objects from background in videophone or videoconference applications is presented. The algorithm uses a neural network architecture that classifies the video frames in regions of interest (ROI) and non-ROI areas, also being able to automatically adapt its performance to scene changes. The algorithm is incorporated in motion-compensated discrete cosine transform (MC-DCT)-based coding schemes, allocating more bits to ROI than to non-ROI areas. Simulation results are presented, using the Claire and Trevor sequences, which show reconstructed images of better quality, as well as signal-to-noise ratio improvements of about 1.4 dB, compared to those achieved by standard MC-DCT encoders.


Multimedia Tools and Applications | 2009

Estimation of behavioral user state based on eye gaze and head pose--application in an e-learning environment

Stylianos Asteriadis; Paraskevi K. Tzouveli; Kostas Karpouzis; Stefanos D. Kollias

Most e-learning environments which utilize user feedback or profiles, collect such information based on questionnaires, resulting very often in incomplete answers, and sometimes deliberate misleading input. In this work, we present a mechanism which compiles feedback related to the behavioral state of the user (e.g. level of interest) in the context of reading an electronic document; this is achieved using a non-intrusive scheme, which uses a simple web camera to detect and track the head, eye and hand movements and provides an estimation of the level of interest and engagement with the use of a neuro-fuzzy network initialized from evidence from the idea of Theory of Mind and trained from expert-annotated data. The user does not need to interact with the proposed system, and can act as if she was not monitored at all. The proposed scheme is tested in an e-learning environment, in order to adapt the presentation of the content to the user profile and current behavioral state. Experiments show that the proposed system detects reading- and attention-related user states very effectively, in a testbed where children’s reading performance is tracked.


Computer Vision and Image Understanding | 1999

A Stochastic Framework for Optimal Key Frame Extraction from MPEG Video Databases

Yannis S. Avrithis; Anastasios D. Doulamis; Nikolaos D. Doulamis; Stefanos D. Kollias

A video content representation framework is proposed in this paper for extracting limited, but meaningful, information of video data, directly from the MPEG compressed domain. A hierarchical color and motion segmentation scheme is applied to each video shot, transforming the frame-based representation to a feature-based one. The scheme is based on a multiresolution implementation of the recursive shortest spanning tree (RSST) algorithm. Then, all segment features are gathered together using a fuzzy multidimensional histogram to reduce the possibility of classifying similar segments to different classes. Extraction of several key frames is performed for each shot in a content-based rate-sampling framework. Two approaches are examined for key frame extraction. The first is based on examination of the temporal variation of the feature vector trajectory; the second is based on minimization of a cross-correlation criterion of the video frames. For efficient implementation of the latter approach, a logarithmic search (along with a stochastic version) and a genetic algorithm are proposed. Experimental results are presented which illustrate the performance of the proposed techniques, using synthetic and real life MPEG video sequences.


IEEE Transactions on Circuits and Systems for Video Technology | 2007

Semantic Image Segmentation and Object Labeling

Thanos Athanasiadis; Phivos Mylonas; Yannis S. Avrithis; Stefanos D. Kollias

In this paper, we present a framework for simultaneous image segmentation and object labeling leading to automatic image annotation. Focusing on semantic analysis of images, it contributes to knowledge-assisted multimedia analysis and bridging the gap between semantics and low level visual features. The proposed framework operates at semantic level using possible semantic labels, formally represented as fuzzy sets, to make decisions on handling image regions instead of visual features used traditionally. In order to stress its independence of a specific image segmentation approach we have modified two well known region growing algorithms, i.e., watershed and recursive shortest spanning tree, and compared them to their traditional counterparts. Additionally, a visual context representation and analysis approach is presented, blending global knowledge in interpreting each object locally. Contextual information is based on a novel semantic processing methodology, employing fuzzy algebra and ontological taxonomic knowledge representation. In this process, utilization of contextual knowledge re-adjusts labeling results of semantic region growing, by means of fine-tuning membership degrees of detected concepts. The performance of the overall methodology is evaluated on a real-life still image dataset from two popular domains


IEEE Transactions on Circuits and Systems for Video Technology | 2000

Efficient summarization of stereoscopic video sequences

Nikolaos D. Doulamis; Anastasios D. Doulamis; Yannis S. Avrithis; Klimis S. Ntalianis; Stefanos D. Kollias

An efficient technique for summarization of stereoscopic video sequences is presented, which extracts a small but meaningful set of video frames using a content-based sampling algorithm. The proposed video-content representation provides the capability of browsing digital stereoscopic video sequences and performing more efficient content-based queries and indexing. Each stereoscopic video sequence is first partitioned into shots by applying a shot-cut detection algorithm so that frames (or stereo pairs) of similar visual characteristics are gathered together. Each shot is then analyzed using stereo-imaging techniques, and the disparity field, occluded areas, and depth map are estimated. A multiresolution implementation of the recursive shortest spanning tree (RSST) algorithm is applied for color and depth segmentation, while fusion of color and depth segments is employed for reliable video object extraction. In particular, color segments are projected onto depth segments so that video objects on the same depth plane are retained, while at the same time accurate object boundaries are extracted. Feature vectors are then constructed using multidimensional fuzzy classification of segment features including size, location, color, and depth. Shot selection is accomplished by clustering similar shots based on the generalized Lloyd-Max algorithm, while for a given shot, key frames are extracted using an optimization method for locating frames of minimally correlated feature vectors. For efficient implementation of the latter method, a genetic algorithm is used. Experimental results are presented, which indicate the reliable performance of the proposed scheme on real-life stereoscopic video sequences.


IEEE Transactions on Neural Networks | 2000

On-line retrainable neural networks: improving the performance of neural networks in image analysis problems

Anastasios D. Doulamis; Nikolaos D. Doulamis; Stefanos D. Kollias

A novel approach is presented in this paper for improving the performance of neural-network classifiers in image recognition, segmentation, or coding applications, based on a retraining procedure at the user level. The procedure includes: 1) a training algorithm for adapting the network weights to the current condition; 2) a maximum a posteriori (MAP) estimation procedure for optimally selecting the most representative data of the current environment as retraining data; and 3) a decision mechanism for determining when network retraining should be activated. The training algorithm takes into consideration both the former and the current network knowledge in order to achieve good generalization. The MAP estimation procedure models the network output as a Markov random field (MRF) and optimally selects the set of training inputs and corresponding desired outputs. Results are presented which illustrate the theoretical developments as well as the performance of the proposed approach in real-life experiments.


IEEE Transactions on Neural Networks | 2003

An adaptable neural-network model for recursive nonlinear traffic prediction and modeling of MPEG video sources

Anastasios D. Doulamis; Nikolaos D. Doulamis; Stefanos D. Kollias

Multimedia services and especially digital video is expected to be the major traffic component transmitted over communication networks [such as internet protocol (IP)-based networks]. For this reason, traffic characterization and modeling of such services are required for an efficient network operation. The generated models can be used as traffic rate predictors, during the network operation phase (online traffic modeling), or as video generators for estimating the network resources, during the network design phase (offline traffic modeling). In this paper, an adaptable neural-network architecture is proposed covering both cases. The scheme is based on an efficient recursive weight estimation algorithm, which adapts the network response to current conditions. In particular, the algorithm updates the network weights so that 1) the network output, after the adaptation, is approximately equal to current bit rates (current traffic statistics) and 2) a minimal degradation over the obtained network knowledge is provided. It can be shown that the proposed adaptable neural-network architecture simulates a recursive nonlinear autoregressive model (RNAR) similar to the notation used in the linear case. The algorithm presents low computational complexity and high efficiency in tracking traffic rates in contrast to conventional retraining schemes. Furthermore, for the problem of offline traffic modeling, a novel correlation mechanism is proposed for capturing the burstness of the actual MPEG video traffic. The performance of the model is evaluated using several real-life MPEG coded video sources of long duration and compared with other linear/nonlinear techniques used for both cases. The results indicate that the proposed adaptable neural-network architecture presents better performance than other examined techniques.

Collaboration


Dive into the Stefanos D. Kollias's collaboration.

Top Co-Authors

Avatar

Kostas Karpouzis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Anastasios D. Doulamis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Nikolaos D. Doulamis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Yannis S. Avrithis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Nicolas Tsapatsoulis

Cyprus University of Technology

View shared research outputs
Top Co-Authors

Avatar

Klimis S. Ntalianis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Amaryllis Raouzaiou

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Manolis Wallace

University of Peloponnese

View shared research outputs
Top Co-Authors

Avatar

Giorgos B. Stamou

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Paraskevi K. Tzouveli

National Technical University of Athens

View shared research outputs
Researchain Logo
Decentralizing Knowledge