Antonios Oikonomopoulos
Imperial College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Antonios Oikonomopoulos.
systems man and cybernetics | 2005
Antonios Oikonomopoulos; Ioannis Patras; Maja Pantic
This paper addresses the problem of human-action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time. The spatiotemporal salient points are detected by measuring the variations in the information content of pixel neighborhoods not only in space but also in time. An appropriate distance metric between two collections of spatiotemporal salient points is introduced, which is based on the chamfer distance and an iterative linear time-warping technique that deals with time expansion or time-compression issues. A classification scheme that is based on relevance vector machines and on the proposed distance measure is proposed. Results on real image sequences from a small database depicting people performing 19 aerobic exercises are presented.
IEEE Transactions on Image Processing | 2011
Antonios Oikonomopoulos; Ioannis Patras; Maja Pantic
In this paper we address the problem of localization and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic ensembles of feature descriptors. Evidence for the spatiotemporal localization of the activity is accumulated in a probabilistic spatiotemporal voting scheme. The local nature of the proposed voting framework allows us to deal with multiple activities taking place in the same scene, as well as with activities in the presence of clutter and occlusion. We use boosting in order to select characteristic ensembles per class. This leads to a set of class specific codebooks where each codeword is an ensemble of features. During training, we store the spatial positions of the codeword ensembles with respect to a set of reference points, as well as their temporal positions with respect to the start and end of the action instance. During testing, each activated codeword ensemble casts votes concerning the spatiotemporal position and extend of the action, using the information that was stored during training. Mean Shift mode estimation in the voting space provides the most probable hypotheses concerning the localization of the subjects at each frame, as well as the extend of the activities depicted in the image sequences. We present classification and localization results for a number of publicly available datasets, and for a number of sequences where there is a significant amount of clutter and occlusion.
international conference on multimedia and expo | 2005
Antonios Oikonomopoulos; Ioannis Patras; Maja Pantic
This paper addresses the problem of human action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time. We detect the spatiotemporal salient points by measuring changes in the information content of pixel neighborhoods not only in space but also in time. We introduce an appropriate distance metric between two collections of spatiotemporal salient points that is based on the Chamfer distance and an iterative linear time warping technique that deals with time expansion or time compression issues. We propose a classification scheme that is based on relevance vector machines and on the proposed distance measure. We present results on real image sequences from a small database depicting people performing 19 aerobic exercises.
computer vision and pattern recognition | 2006
Antonios Oikonomopoulos; Ioannis Patras; Maja Pantic
This paper addresses the problem of human action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time. We detect the spatiotemporal salient points by measuring the variations in the information content of pixel neighborhoods not only in space but also in time. We derive a suitable distance measure between the representations, which is based on the Chamfer distance, and we optimize this measure with respect to a number of temporal and scaling parameters. In this way we achieve invariance against scaling, while at the same time, we eliminate the temporal differences between the representations. We use Relevance Vector Machines (RVM) in order to address the classification problem. We propose new kernels for use by the RVM, which are specifically tailored to the proposed spatiotemporal salient point representation. The basis of these kernels is the optimized Chamfer distance of the previous step. We present results on real image sequences from a small database depicting people performing 19 aerobic exercises.
international joint conference on artificial intelligence | 2007
Antonios Oikonomopoulos; Ioannis Patras; Maja Pantic; Nikos Paragios
This work addresses the problem of human action recognition by introducing a representation of a human action as a collection of short trajectories that are extracted in areas of the scene with significant amount of visual activity. The trajectories are extracted by an auxiliary particle filtering tracking scheme that is initialized at points that are considered salient both in space and time. The spatiotemporal salient points are detected by measuring the variations in the information content of pixel neighborhoods in space and time. We implement an online background estimation algorithm in order to deal with inadequate localization of the salient points on the moving parts in the scene, and to improve the overall performance of the particle filter tracking scheme.We use a variant of the Longest Common Subsequence algorithm (LCSS) in order to compare different sets of trajectories corresponding to different actions. We use Relevance Vector Machines (RVM) in order to address the classification problem. We propose new kernels for use by the RVM, which are specifically tailored to the proposed representation of short trajectories. The basis of these kernels is the modified LCSS distance of the previous step. We present results on real image sequences from a small database depicting people performing 12 aerobic exercises.
advanced video and signal based surveillance | 2007
Antonios Oikonomopoulos; Maja Pantic
In this paper we propose a tracking scheme specifically tailored for tracking human body parts in cluttered scenes. We model the background and the human skin using Gaussian mixture models and we combine these estimates to localize the features to be tracked. We further use these estimates to determine the pixels which belong to the background and those which belong to the subjects skin and we incorporate this information in the observation model of the used tracking scheme. For handling self-occlusion (i.e., when one body part occludes another), we incorporate the information about the direction of the observed motion into the propagation model of the used tracking scheme. We demonstrate that the proposed method outperforms the conventional condensation and auxiliary particle filtering when the hands and the head are the tracked body features. For the purposes of human body gesture recognition, we use a variant of the longest common subsequence algorithm (LCSS) in order to acquire a distance measure between the acquired trajectories and we use this measure in order to define new kernels for a relevance vector machine (RVM) classification scheme. We present results on real image sequences from a small database depicting people performing 15 aerobic exercises.
International Journal of Synthetic Emotions | 2010
Antonios Oikonomopoulos; Ioannis Patras; Maja Pantic
In this paper we address the problem of activity detection in unsegmented image sequences. Our main contribution is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic ensembles of feature descriptors. Evidence for the spatiotemporal localization of the activity is accumulated in a probabilistic spatiotemporal voting scheme. We use boosting in order to select characteristic ensembles per class. This leads to a set of class specific codebooks where each codeword is an ensemble of features. During training, we store the spatial positions of the codeword ensembles with respect to a set of reference points, and their temporal positions with respect to the start and end of the action instance. During testing, each activated codeword casts votes concerning the spatiotemporal position and extend of the action, using the information stored during training. Mean Shift mode estimation in the voting space provides the most probable hypotheses concerning the localization of the subjects at each frame, as well as the extend of the activities depicted in the image sequences. We present experimental results for a number of publicly available datasets, that demonstrate the efficiency of the proposed method in localizing and classifying human activities.
international symposium on visual computing | 2013
Antonios Oikonomopoulos; Maja Pantic
In this paper we address the problem of human activity modelling and recognition by means of a hierarchical representation of mined dense spatiotemporal features. At each level of the hierarchy, the proposed method selects feature constellations that are increasingly discriminative and characteristic of a specific action category, by taking into account how frequently they occur in that action category versus the rest of the available action categories in the training dataset. Each feature constellation consists of n-tuples of features selected in the previous level of the hierarchy and lying within a small spatiotemporal neighborhood. We use spatiotemporal Local Steering Kernel LSK features as a basis for our representation, due to their ability and efficiency in capturing the local structure and dynamics of the underlying activities. The proposed method is able to detect activities in unconstrained videos, by back-projecting the activated features at the locations at which they were activated. We test the proposed method on two publicly available datasets, namely the KTH and YouTube datasets of human bodily actions. The acquired results demonstrate the effectiveness of the proposed method in recognising a wide variety of activities.
Image and Vision Computing | 2009
Antonios Oikonomopoulos; Maja Pantic; Ioannis Patras
computer vision and pattern recognition | 2009
Antonios Oikonomopoulos; Ioannis Patras; Maja Pantic