Pau Baiget
Autonomous University of Barcelona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pau Baiget.
Signal Processing-image Communication | 2008
Carles Fernández; Pau Baiget; Xavier Roca; Jordi Gonzílez
The integration of cognitive capabilities in computer vision systems requires both to enable high semantic expressiveness and to deal with high computational costs as large amounts of data are involved in the analysis. This contribution describes a cognitive vision system conceived to automatically provide high-level interpretations of complex real-time situations in outdoor and indoor scenarios, and to eventually maintain communication with casual end users in multiple languages. The main contributions are: (i) the design of an integrative multilevel architecture for cognitive surveillance purposes; (ii) the proposal of a coherent taxonomy of knowledge to guide the process of interpretation, which leads to the conception of a situation-based ontology; (iii) the use of situational analysis for content detection and a progressive interpretation of semantically rich scenes, by managing incomplete or uncertain knowledge, and (iv) the use of such an ontological background to enable multilingual capabilities and advanced end-user interfaces. Experimental results are provided to show the feasibility of the proposed approach.
KI '07 Proceedings of the 30th annual German conference on Advances in Artificial Intelligence | 2007
Carles Fernández Tena; Pau Baiget; F. Xavier Roca; Jordi Gonzàlez
This contribution addresses the generation of textual descriptions in several natural languages for evaluation of human behavior in video sequences. The problem is tackled by converting geometrical information extracted from videos of the scenario into predicates in fuzzy logic formalism, which facilitates the internal representations of the conceptual data and allows the temporal analysis of situations in a deterministic fashion, by means of Situation Graph Trees (SGTs). The results of the analysis are stored in structures proposed by the Discourse Representation Theory (DRT), which facilitate a subsequent generation of natural language text. This set of tools has been proved to be perfectly suitable for the specified purpose.
congress of the italian association for artificial intelligence | 2007
Carles Fernández; Pau Baiget; F. Xavier Roca; Jordi Gonzàlez
A Multimedia Surveillance System (MSS) is considered for automatically retrieving semantic content from complex outdoor scenes, involving both human behavior and traffic domains. To characterize the dynamic information attached to detected objects, we consider a deterministic modeling of spatio-temporal features based on abstraction processes towards fuzzy logic formalism. A situational analysis over conceptualized information will not only allow us to describe human actions within a scene, but also to suggest possible interpretations of the behaviors perceived, such as situations involving thefts or dangers of running over. Towards this end, the different levels of semantic knowledge implied throughout the process are also classified into a proposed taxonomy.
articulated motion and deformable objects | 2008
Pau Baiget; F. Xavier Roca; Jordi Gonzàlez
This paper describes a framework which exploits the use of computer animation to evaluate the performance of tracking algorithms. This can be achieved in two different, complementary strategies. On the one hand, augmented reality allows to gradually increasing the scene complexity by adding virtual agents into a real image sequence. On the other hand, the simulation of virtual environments involving autonomous agents provides with synthetic image sequences. These are used to evaluate several difficult tracking problems which are under research nowadays, such as performance processing long---time runs and the evaluation of sequences containing crowds of people and numerous occlusions. Finally, a general event---based evaluation metric is defined to measure whether the agents and actions in the scene given by the ground truth were correctly tracked by comparing two event lists. This metric is suitable to evaluate different tracking approaches where the underlying algorithm may be completely different.
Detection and Identification of Rare Audiovisual Cues | 2012
Pau Baiget; Carles Fernández; F. Xavier Roca; Jordi Gonzàlez
The recognition of abnormal behaviors in video sequences has raised as a hot topic in video understanding research. Particularly, an important challenge resides on automatically detecting abnormality. However, there is no convention about the types of anomalies that training data should derive. In surveillance, these are typically detected when new observations differ substantially from observed, previously learned behavior models, which represent normality. This paper focuses on properly defining anomalies within trajectory analysis: we propose a hierarchical representation conformed by Soft, Intermediate, and Hard Anomaly, which are identified from the extent and nature of deviation from learned models. Towards this end, a novel Gaussian Mixture Model representation of learned route patterns creates a probabilistic map of the image plane, which is applied to detect and classify anomalies in real-time. Our method overcomes limitations of similar existing approaches, and performs correctly even when the tracking is affected by different sources of noise. The reliability of our approach is demonstrated experimentally.
ambient intelligence | 2010
Carles Fernández; Pau Baiget; Xavier Roca; Jordi Gonzàlez
Publisher Summary This chapter addresses the incorporation of a module for natural language generation into an artificial cognitive vision system, to describe the most relevant events and behaviors observed in video sequences of a given domain. It also introduces the required stages to convert conceptual predicates into natural language texts. The introduction of natural language (NL) interfaces into vision systems has become popular for a broad range of applications. In particular, they have become an important part of surveillance systems, in which human behavior is represented by predefined sequences of events in given contexts. The analysis of human behaviors in image sequences is currently restricted to the generation of quantitative parameters describing where and when motion is being observed. However, recent trends in cognitive vision demonstrate that it is becoming necessary to exploit linguistic knowledge to incorporate abstraction and uncertainty in the analysis and thus enhance the semantic richness of reported descriptions. Toward this end, natural language generation will in the near future constitute a mandatory step toward intelligent user interfacing. Moreover, the characteristics of ambient intelligence demand this process to adapt to users and their context—for example, through multilingual capabilities. This chapter illustrates some experimental results to elucidate multilingual generation in various application domains.
Multimodal Interaction in Image and Video Applications | 2013
Marc Castelló; Jordi Gonzàlez; Ariel Amato; Pau Baiget; Carles Fernández; Josep M. Gonfaus; Ramón Alberto Mollineda; Marco Pedersoli; Nicolás Pérez de la Blanca; F. Xavier Roca
In this paper we present an example of a video surveillance application that exploits Multimodal Interactive (MI) technologies. The main objective of the so-called VID-Hum prototype was to develop a cognitive artificial system for both the detection and description of a particular set of human behaviours arising from real-world events. The main procedure of the prototype described in this chapter entails: (i) adaptation, since the system adapts itself to the most common behaviours (qualitative data) inferred from tracking (quantitative data) thus being able to recognize abnormal behaviors; (ii) feedback, since an advanced interface based on Natural Language understanding allows end-users the communicationwith the prototype by means of conceptual sentences; and (iii) multimodality, since a virtual avatar has been designed to describe what is happening in the scene, based on those textual interpretations generated by the prototype. Thus, the MI methodology has provided an adequate framework for all these cooperating processes.
Archive | 2008
Pau Baiget; Eric Sommerlade; Ian D. Reid; Jordi Gonzàlez
iberian conference on pattern recognition and image analysis | 2007
Pau Baiget; Carles Fernández; F. Xavier Roca; Jordi Gonzàlez
Archive | 2007
Pau Baiget; Joan Soto; Francesc Xavier Roca; Jordi Gonzàlez