Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yannis S. Avrithis is active.

Publication


Featured researches published by Yannis S. Avrithis.


european semantic web conference | 2005

Semantic annotation of images and videos for multimedia analysis

Stephan Bloehdorn; Kosmas Petridis; Carsten Saathoff; Nikos Simou; Vassilis Tzouvaras; Yannis S. Avrithis; Siegfried Handschuh; Yiannis Kompatsiaris; Steffen Staab; Michael G. Strintzis

Annotations of multimedia documents typically have been pursued in two different directions. Either previous approaches have focused on low level descriptors, such as dominant color, or they have focused on the content dimension and corresponding annotations, such as person or vehicle. In this paper, we present a software environment to bridge between the two directions. M-OntoMat-Annotizer allows for linking low level MPEG-7 visual descriptions to conventional Semantic Web ontologies and annotations. We use M-OntoMat-Annotizer in order to construct ontologies that include prototypical instances of high-level domain concepts together with a formal specification of corresponding visual descriptors. Thus, we formalize the interrelationship of high- and low-level multimedia concept descriptions allowing for new kinds of multimedia content analysis and reasoning.


computer vision and pattern recognition | 2009

Dense saliency-based spatiotemporal feature points for action recognition

Konstantinos Rapantzikos; Yannis S. Avrithis; Stefanos D. Kollias

Several spatiotemporal feature point detectors have been used in video analysis for action recognition. Feature points are detected using a number of measures, namely saliency, cornerness, periodicity, motion activity etc. Each of these measures is usually intensity-based and provides a different trade-off between density and informativeness. In this paper, we use saliency for feature point detection in videos and incorporate color and motion apart from intensity. Our method uses a multi-scale volumetric representation of the video and involves spatiotemporal operations at the voxel level. Saliency is computed by a global minimization process constrained by pure volumetric constraints, each of them being related to an informative visual aspect, namely spatial proximity, scale and feature similarity (intensity, color, motion). Points are selected as the extrema of the saliency response and prove to balance well between density and informativeness. We provide an intuitive view of the detected points and visual comparisons against state-of-the-art space-time detectors. Our detector outperforms them on the KTH dataset using nearest-neighbor classifiers and ranks among the top using different classification frameworks. Statistics and comparisons are also performed on the more difficult Hollywood human actions (HOHA) dataset increasing the performance compared to current published results.


international conference on computer vision | 2013

To Aggregate or Not to aggregate: Selective Match Kernels for Image Search

Giorgos Tolias; Yannis S. Avrithis; Hervé Jégou

This paper considers a family of metrics to compare images based on their local descriptors. It encompasses the VLAD descriptor and matching techniques such as Hamming Embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel. Finally, the representation underpinning this kernel is approximated, providing a large scale image search both precise and scalable, as shown by our experiments on several benchmarks.


IEEE Transactions on Circuits and Systems for Video Technology | 2007

Personalized Content Retrieval in Context Using Ontological Knowledge

David Vallet; Pablo Castells; Miriam Fernández; Phivos Mylonas; Yannis S. Avrithis

Personalized content retrieval aims at improving the retrieval process by taking into account the particular interests of individual users. However, not all user preferences are relevant in all situations. It is well known that human preferences are complex, multiple, heterogeneous, changing, even contradictory, and should be understood in context with the user goals and tasks at hand. In this paper, we propose a method to build a dynamic representation of the semantic context of ongoing retrieval tasks, which is used to activate different subsets of user interests at runtime, in a way that out-of-context preferences are discarded. Our approach is based on an ontology-driven representation of the domain of discourse, providing enriched descriptions of the semantics involved in retrieval actions and preferences, and enabling the definition of effective means to relate preferences and context


computer vision and pattern recognition | 2014

Locally Optimized Product Quantization for Approximate Nearest Neighbor Search

Yannis Kalantidis; Yannis S. Avrithis

We present a simple vector quantizer that combines low distortion with fast search and apply it to approximate nearest neighbor (ANN) search in high dimensional spaces. Leveraging the very same data structure that is used to provide non-exhaustive search, i.e., inverted lists or a multi-index, the idea is to locally optimize an individual product quantizer (PQ) per cell and use it to encode residuals. Local optimization is over rotation and space decomposition, interestingly, we apply a parametric solution that assumes a normal distribution and is extremely fast to train. With a reasonable space and time overhead that is constant in the data size, we set a new state-of-the-art on several public datasets, including a billion-scale one.


IEEE Transactions on Multimedia | 2013

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

Georgios Evangelopoulos; Athanasia Zlatintsi; Alexandros Potamianos; Petros Maragos; Konstantinos Rapantzikos; Georgios Skoumas; Yannis S. Avrithis

Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream. Aural or auditory saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color, and orientation. Textual or linguistic saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The individual saliency streams, obtained from modality-depended cues, are integrated in a multimodal saliency curve, modeling the time-varying perceptual importance of the composite video stream and signifying prevailing sensory events. The multimodal saliency representation forms the basis of a generic, bottom-up video summarization algorithm. Different fusion schemes are evaluated on a movie database of multimodal saliency annotations with comparative results provided across modalities. The produced summaries, based on low-level features and content-independent fusion and selection, are of subjectively high aesthetic and informative quality.


acm multimedia | 2010

Retrieving landmark and non-landmark images from community photo collections

Yannis S. Avrithis; Yannis Kalantidis; Giorgos Tolias; Evaggelos Spyrou

State of the art data mining and image retrieval in community photo collections typically focus on popular subsets, e.g. images containing landmarks or associated to Wikipedia articles. We propose an image clustering scheme that, seen as vector quantization compresses a large corpus of images by grouping visually consistent ones while providing a guaranteed distortion bound. This allows us, for instance, to represent the visual content of all thousands of images depicting the Parthenon in just a few dozens of scene maps and still be able to retrieve any single, isolated, non-landmark image like a house or graffiti on a wall. Starting from a geo-tagged dataset, we first group images geographically and then visually, where each visual cluster is assumed to depict different views of the the same scene. We align all views to one reference image and construct a 2D scene map by preserving details from all images while discarding repeating visual features. Our indexing, retrieval and spatial matching scheme then operates directly on scene maps. We evaluate the precision of the proposed method on a challenging one-million urban image dataset.


Computer Vision and Image Understanding | 1999

A Stochastic Framework for Optimal Key Frame Extraction from MPEG Video Databases

Yannis S. Avrithis; Anastasios D. Doulamis; Nikolaos D. Doulamis; Stefanos D. Kollias

A video content representation framework is proposed in this paper for extracting limited, but meaningful, information of video data, directly from the MPEG compressed domain. A hierarchical color and motion segmentation scheme is applied to each video shot, transforming the frame-based representation to a feature-based one. The scheme is based on a multiresolution implementation of the recursive shortest spanning tree (RSST) algorithm. Then, all segment features are gathered together using a fuzzy multidimensional histogram to reduce the possibility of classifying similar segments to different classes. Extraction of several key frames is performed for each shot in a content-based rate-sampling framework. Two approaches are examined for key frame extraction. The first is based on examination of the temporal variation of the feature vector trajectory; the second is based on minimization of a cross-correlation criterion of the video frames. For efficient implementation of the latter approach, a logarithmic search (along with a stochastic version) and a genetic algorithm are proposed. Experimental results are presented which illustrate the performance of the proposed techniques, using synthetic and real life MPEG video sequences.


IEEE Transactions on Circuits and Systems for Video Technology | 2007

Semantic Image Segmentation and Object Labeling

Thanos Athanasiadis; Phivos Mylonas; Yannis S. Avrithis; Stefanos D. Kollias

In this paper, we present a framework for simultaneous image segmentation and object labeling leading to automatic image annotation. Focusing on semantic analysis of images, it contributes to knowledge-assisted multimedia analysis and bridging the gap between semantics and low level visual features. The proposed framework operates at semantic level using possible semantic labels, formally represented as fuzzy sets, to make decisions on handling image regions instead of visual features used traditionally. In order to stress its independence of a specific image segmentation approach we have modified two well known region growing algorithms, i.e., watershed and recursive shortest spanning tree, and compared them to their traditional counterparts. Additionally, a visual context representation and analysis approach is presented, blending global knowledge in interpreting each object locally. Contextual information is based on a novel semantic processing methodology, employing fuzzy algebra and ontological taxonomic knowledge representation. In this process, utilization of contextual knowledge re-adjusts labeling results of semantic region growing, by means of fine-tuning membership degrees of detected concepts. The performance of the overall methodology is evaluated on a real-life still image dataset from two popular domains


IEEE Transactions on Circuits and Systems for Video Technology | 2000

Efficient summarization of stereoscopic video sequences

Nikolaos D. Doulamis; Anastasios D. Doulamis; Yannis S. Avrithis; Klimis S. Ntalianis; Stefanos D. Kollias

An efficient technique for summarization of stereoscopic video sequences is presented, which extracts a small but meaningful set of video frames using a content-based sampling algorithm. The proposed video-content representation provides the capability of browsing digital stereoscopic video sequences and performing more efficient content-based queries and indexing. Each stereoscopic video sequence is first partitioned into shots by applying a shot-cut detection algorithm so that frames (or stereo pairs) of similar visual characteristics are gathered together. Each shot is then analyzed using stereo-imaging techniques, and the disparity field, occluded areas, and depth map are estimated. A multiresolution implementation of the recursive shortest spanning tree (RSST) algorithm is applied for color and depth segmentation, while fusion of color and depth segments is employed for reliable video object extraction. In particular, color segments are projected onto depth segments so that video objects on the same depth plane are retained, while at the same time accurate object boundaries are extracted. Feature vectors are then constructed using multidimensional fuzzy classification of segment features including size, location, color, and depth. Shot selection is accomplished by clustering similar shots based on the generalized Lloyd-Max algorithm, while for a given shot, key frames are extracted using an optimization method for locating frames of minimally correlated feature vectors. For efficient implementation of the latter method, a genetic algorithm is used. Experimental results are presented, which indicate the reliable performance of the proposed scheme on real-life stereoscopic video sequences.

Collaboration


Dive into the Yannis S. Avrithis's collaboration.

Top Co-Authors

Avatar

Stefanos D. Kollias

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Evaggelos Spyrou

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Konstantinos Rapantzikos

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Yiannis Kompatsiaris

Information Technology Institute

View shared research outputs
Top Co-Authors

Avatar

Steffen Staab

University of Koblenz and Landau

View shared research outputs
Top Co-Authors

Avatar

Thanos Athanasiadis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Manolis Wallace

University of Peloponnese

View shared research outputs
Top Co-Authors

Avatar

Yannis Kalantidis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Ioannis Kompatsiaris

Information Technology Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge