Pau Climent-Pérez
Kingston University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pau Climent-Pérez.
Expert Systems With Applications | 2012
Alexandrous Andre Chaaraoui; Pau Climent-Pérez; Francisco Flórez-Revuelta
Human Behaviour Analysis (HBA) is more and more being of interest for computer vision and artificial intelligence researchers. Its main application areas, like Video Surveillance and Ambient-Assisted Living (AAL), have been in great demand in recent years. This paper provides a review on HBA for AAL and ageing in place purposes focusing specially on vision techniques. First, a clearly defined taxonomy is presented in order to classify the reviewed works, which are consequently presented following a bottom-up abstraction and complexity order. At the motion level, pose and gaze estimation as well as basic human movement recognition are covered. Next, the mainly used action and activity recognition approaches are presented with examples of recent research works. Increasing the degree of semantics and the time interval involved in the HBA, finally the behaviour level is reached. Furthermore, useful tools and datasets are analysed in order to provide help for initiating projects.
Pattern Recognition Letters | 2013
Alexandros Andre Chaaraoui; Pau Climent-Pérez; Francisco Flórez-Revuelta
In this paper, a human action recognition method is presented in which pose representation is based on the contour points of the human silhouette and actions are learned by making use of sequences of multi-view key poses. Our contribution is twofold. Firstly, our approach achieves state-of-the-art success rates without compromising the speed of the recognition process and therefore showing suitability for online recognition and real-time scenarios. Secondly, dissimilarities among different actors performing the same action are handled by taking into account variations in shape (shifting the test data to the known domain of key poses) and speed (considering inconsistent time scales in the classification). Experimental results on the publicly available Weizmann, MuHAVi and IXMAS datasets return high and stable success rates, achieving, to the best of our knowledge, the best rate so far on the MuHAVi Novel Actor test.
Expert Systems With Applications | 2014
Alexandros Andre Chaaraoui; José Ramón Padilla-López; Pau Climent-Pérez; Francisco Flórez-Revuelta
Interest in RGB-D devices is increasing due to their low price and the wide range of possible applications that come along. These devices provide a marker-less body pose estimation by means of skeletal data consisting of 3D positions of body joints. These can be further used for pose, gesture or action recognition. In this work, an evolutionary algorithm is used to determine the optimal subset of skeleton joints, taking into account the topological structure of the skeleton, in order to improve the final success rate. The proposed method has been validated using a state-of-the-art RGB action recognition approach, and applying it to the MSR-Action3D dataset. Results show that the proposed algorithm is able to significantly improve the initial recognition rate and to yield similar or better success rates than the state-of-the-art methods.
Archive | 2013
Myo Thida; Yoke Leng Yong; Pau Climent-Pérez; How-Lung Eng; Paolo Remagnino
This chapter presents a review and systematic comparison of the state of the art on crowd video analysis. The rationale of our review is justified by a recent increase in intelligent video surveillance algorithms capable of analysing automati- cally visual streams of very crowded and cluttered scenes, such as those of airport concourses, railway stations, shopping malls and the like. Since the safety and se- curity of potentially very crowded public spaces have become a priority, computer vision researchers have focused their research on intelligent solutions. The aim of this chapter is to propose a critical review of existing literature pertaining to the au- tomatic analysis of complex and crowded scenes. The literature is divided into two broad categories: the macroscopic and the microscopic modelling approach. The effort is meant to provide a reference point for all computer vision practitioners cur- rently working on crowd analysis. We discuss the merits and weaknesses of various approaches for each topic and provide a recommendation on how existing methods can be improved.
HBU'12 Proceedings of the Third international conference on Human Behavior Understanding | 2012
Alexandros Andre Chaaraoui; Pau Climent-Pérez; Francisco Flórez-Revuelta
This paper presents a novel multi-view human action recognition approach based on a bag-of-key-poses. In the case of multi-view scenarios, it is especially difficult to perform accurate action recognition that still runs at an admissible recognition speed. The presented method aims to fill this gap by combining a silhouette-based pose representation with a simple, yet effective multi-view learning approach based on Model Fusion. Action classification is performed through efficient sequence matching and by the comparison of successive key poses which are evaluated on both feature similarity and match relevance. Experimentation on the MuHAVi dataset shows that the method outperforms currently available recognition rates and is exceptionally robust to actor-variance. Temporal evaluation confirms the methods suitability for real-time recognition.
mexican international conference on artificial intelligence | 2012
Pau Climent-Pérez; Alexandros Andre Chaaraoui; José Ramón Padilla-López; Francisco Flórez-Revuelta
The growth in interest in RGB-D devices (e.g. Microsoft Kinect or ASUS Xtion Pro) is based on their low price, as well as the wide range of possible applications. These devices can provide skeletal data consisting of 3D position, as well as orientation data, which can be further used for pose or action recognition. Data for 15 or 20 joints can be retrieved, depending on the libraries used. Recently, many datasets have been made available which allow the comparison of different action recognition approaches for diverse applications (e.g. gaming, Ambient-Assisted Living, etc.). In this work, a genetic algorithm is used to determine the contribution of each of the skeletons joints to the accuracy of an action recognition algorithm, thus using or ignoring the data from each joint depending on its relevance. The proposed method has been validated using a k-means-based action recognition approach and using the MSR-Action3D dataset for test. Results show the presented algorithm is able to improve the recognition rates while reducing the feature size.
advanced video and signal based surveillance | 2015
Giounona Tzanidou; Pau Climent-Pérez; Georg Hummel; Marc Schmitt; Peter Stütz; Dorothy Ndedi Monekosso; Paolo Remagnino
This work presents an approach to detect moving objects from Unmanned Aerial Vehicles (UAV). A common framework for most of the existing techniques is using image registration to warp consecutive frames as an ego-motion compensation step and applying frame differencing to detect the moving objects. Assuming a planar scene, we propose the exploitation of telemetry information available from Global Positioning and Inertial Navigation Systems (GPS/INS) to estimate a similarity transformation matrix that would map the image points from one frame to another. In this work, we show that the telemetry-based image registration combined with global registration methods produces more accurate results than the traditional image registration techniques in case of a scene with poor or no texture. To segment the moving objects, we employ the probabilistic background modelling method with mixture of Gaussian distributions.
international symposium on visual computing | 2014
Pau Climent-Pérez; Georgios Lazaridis; Georg Hummel; Martin Russ; Dorothy Ndedi Monekosso; Paolo Remagnino
Tracking from airborne cameras is very challenging, since most assumptions made for fixed cameras do not hold. Therefore, compensation of platform ego-motion is seen as a necessary pre-processing step. Most existing methods perform image registration or matching, which involves costly image transformations, and have a restricted operational range. In this paper, a novel ego-motion compensation approach is presented, that transforms the local search window of the visual tracker. This is much more computationally efficient, and can be applied regardless of the amount of texture in the background. Experiments with ground truth and tracker output data are conducted and show the validity of the approach.
ISAmI | 2012
Pau Climent-Pérez; Alexandros Andre Chaaraoui; Francisco Flórez-Revuelta
When novice researchers in the fields of Computer Vision and Human Behaviour Analysis/Understanding (HBA/HBU) initiate new projects applied to Ambient-Assisted Living (AAL) scenarios, a lack of specific, publicly available frameworks, tools and datasets is perceived. This work is an attempt to fill that particular gap, by presenting different field-related datasets—or benchmarks—, according to a taxonomy (which is also presented), and taking into account their availability as well as their relevance. Furthermore, it reviews and puts together a series of tools—either frameworks or pieces of software—that are at hand (although dispersed), which can ease the task. To end with the work, some conclusions are drawn about the reviewed tools, putting special emphasis in their generality and reliability.
international conference on pattern recognition | 2014
Pau Climent-Pérez; Dorothy Ndedi Monekosso; Paolo Remagnino
Track let plots (TPs) describe the motion patterns of a small crowd or a large group of people in a given short time span. This feature can be useful in the context of a Bag-of-Words modelling for the recognition of events or actions that unfold in the scene. This work describes a method where evidence from multiple viewpoints is combined. By obtaining this feature for each of the views, and synchronising the available video streams, a feature-level fusion method by concatenation can be effortlessly applied. The presented system is able to recognise specific events in large groups of people from multiple cameras, and to perform equally well as compared to the best single view available. Furthermore, the dimension of the concatenated feature can be reduced by one order of magnitude without loss of performance.