Katerina Pastra
Systems Research Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Katerina Pastra.
Philosophical Transactions of the Royal Society B | 2012
Katerina Pastra; Yiannis Aloimonos
Language and action have been found to share a common neural basis and in particular a common ‘syntax’, an analogous hierarchical and compositional organization. While language structure analysis has led to the formulation of different grammatical formalisms and associated discriminative or generative computational models, the structure of action is still elusive and so are the related computational models. However, structuring action has important implications on action learning and generalization, in both human cognition research and computation. In this study, we present a biologically inspired generative grammar of action, which employs the structure-building operations and principles of Chomskys Minimalist Programme as a reference model. In this grammar, action terminals combine hierarchically into temporal sequences of actions of increasing complexity; the actions are bound with the involved tools and affected objects and are governed by certain goals. We show, how the tool role and the affected-object role of an entity within an action drives the derivation of the action syntax in this grammar and controls recursion, merge and move, the latter being mechanisms that manifest themselves not only in human language, but in human action too.
ieee international conference on automatic face gesture recognition | 2011
Christian Wallraven; Michael Schultze; Betty J. Mohler; Argiro Vatakis; Katerina Pastra
A good data corpus lies at the heart of progress in both perceptual/cognitive science and in computer vision. While there are a few datasets that deal with simple actions, creating a realistic corpus for complex, long action sequences that contains also human-human interactions has so far not been attempted to our knowledge. Here, we introduce such a corpus for (inter)action understanding that contains six everyday scenarios taking place in a kitchen / living-room setting. Each scenario was acted out several times by different pairs of actors and contains simple object interactions as well as spoken dialogue. In addition, each scenario was first recorded with several HD cameras and also with motion-capturing of the actors and several key objects. Having access to the motion capture data allows not only for kinematic analyses, but also allows for the production of realistic animations where all aspects of the scenario can be fully controlled. We also present results from a first series of perceptual experiments that show how humans are able to infer scenario classes, as well as individual actions and objects from computer animations of everyday situations. These results can serve as a benchmark for future computational approaches that begin to take on complex action understanding.
Scientific Data | 2016
Argiro Vatakis; Katerina Pastra
In the longstanding effort of defining object affordances, a number of resources have been developed on objects and associated knowledge. These resources, however, have limited potential for modeling and generalization mainly due to the restricted, stimulus-bound data collection methodologies adopted. To-date, therefore, there exists no resource that truly captures object affordances in a direct, multimodal, and naturalistic way. Here, we present the first such resource of ‘thinking aloud’, spontaneously-generated verbal and motoric data on object affordances. This resource was developed from the reports of 124 participants divided into three behavioural experiments with visuo-tactile stimulation, which were captured audiovisually from two camera-views (frontal/profile). This methodology allowed the acquisition of approximately 95 hours of video, audio, and text data covering: object-feature-action data (e.g., perceptual features, namings, functions), Exploratory Acts (haptic manipulation for feature acquisition/verification), gestures and demonstrations for object/feature/action description, and reasoning patterns (e.g., justifications, analogies) for attributing a given characterization. The wealth and content of the data make this corpus a one-of-a-kind resource for the study and modeling of object affordances.
quality of multimedia experience | 2015
Athanasia Zlatintsi; Petros Koutras; Niki Efthymiou; Petros Maragos; Alexandros Potamianos; Katerina Pastra
In this paper we present a movie summarization system and we investigate what composes high quality movie summaries in terms of user experience evaluation. We propose state-of-the-art audio, visual and text techniques for the detection of perceptually salient events from movies. The evaluation of such computational models is usually based on the comparison of the similarity between the system-detected events and some ground-truth data. For this reason, we have developed the MovSum movie database, which includes sensory and semantic saliency annotation as well as cross-media relations, for objective evaluations. The automatically produced movie summaries were qualitatively evaluated, in an extensive human evaluation, in terms of informativeness and enjoyability accomplishing very high ratings up to 80% and 90%, respectively, which verifies the appropriateness of the proposed methods.
acm multimedia | 2016
Marie-Francine Moens; Katerina Pastra; Kate Saenko; Tinne Tuytelaars
Multimodal information fusion both at the signal and the semantics levels is a core part in most multimedia applications, including multimedia indexing, retrieval, summarization and others. Early or late fusion of modality-specific processing results has been addressed in multimedia prototypes since their very early days, through various methodologies including rule-based approaches, information-theoretic models and machine learning. Vision and Language are two of the predominant modalities that are being fused and which have attracted special attention in international challenges with a long history of results, such as TRECVid, ImageClef and others. During the last decade, vision-language semantic integration has attracted attention from traditionally non-interdisciplinary research communities, such as Computer Vision and Natural Language Processing. This is due to the fact that one modality can greatly assist the processing of another providing cues for disambiguation, complementary information and noise/error filtering. The latest boom of deep learning methods has opened up new directions in joint modelling of visual and co-occurring verbal information in multimedia discourse. The workshop on Vision and Language Integration Meets Multimedia Fusion has been held during the workshop weekend of the ACM Multimedia 2016 Conference and the European Conference on Computer Vision (ECCV 2016) on October 16, 2016 in Amsterdam, the Netherlands. The proceedings contain seven selected long papers, which have been orally presented at the workshop, and three abstracts of the invited keynote speeches. The papers and abstracts discuss data collection, representation learning, deep learning approaches, matrix and tensor factorization methods and graph based clustering with regard to the fusion of multimedia data. A variety of applications is presented including image captioning, summarization of news, video hyperlinking, sub-shot segmentation of user generated video, cross-modal classification, cross-modal question-answering, and the detection of misleading metadata of user generated video. The workshop is organized and supported by the EU COST action iV&L Net, the European Network on Integrating Vision and Language: Combining Computer Vision and Language Processing for Advanced Search, Retrieval, Annotation and Description of Visual Data (IC 1307--2014-2018).
Archive | 2016
Marinos Kavouras; Margarita Kokla; Eleni Tomai; Athanasia Darra; Katerina Pastra
Spatial thinking has lately been acknowledged as an important ability both for sciences and for everyday life. There is a clear need for enhancing spatial thinking in education and engaging both educators and learners in more critical, inquiry-based teaching and learning methods. In this context, GEOTHNK project is a European effort to propose a scientifically grounded, technologically sustainable, and organizationally disruptive framework for the development of learning pathways for enhancing spatial thinking across education sectors and learning environments.
Seeing and Perceiving | 2012
Argiro Vatakis; Panagiotis Dimitrakis; Katerina Pastra
We often use tactile-input in order to recognize familiar objects and to acquire information about unfamiliar ones. We also use our hands to manipulate objects and utilize them as tools. However, research on object affordances has mainly been focused on visual-input and, thus, limiting the level of detail one can get about object features and uses. In addition to the limited multisensory-input, data on object affordances has also been hindered by limited participant input (e.g., naming task). In order to address the above mention limitations, we aimed at identifying a new methodology for obtaining undirected, rich information regarding people’s perception of a given object and the uses it can afford without necessarily viewing the particular object. Specifically, 40 participants were video-recorded in a three-block experiment. During the experiment, participants were exposed to pictures of objects, pictures of someone holding the objects, and the actual objects and they were allowed to provide unconstrained verbal responses on the description and possible uses of the stimuli presented. The stimuli presented were lithic tools given the: novelty, man-made design, design for specific use/action, and absence of functional knowledge and movement associations. The experiment resulted in a large linguistic database, which was linguistically analyzed following a response-based specification. Analysis of the data revealed significant contribution of visual- and tactile-input in naming and definition of object-attributes (color/condition/shape/size/texture/weight), while no significant tactile-information was obtained for object-features of material, visual-pattern, and volume. Overall, this new approach highlights the importance of multisensory-input in the study of object affordances.
Ai Magazine | 2012
Noa Agmon; Vikas Agrawal; David W. Aha; Yiannis Aloimonos; Donagh Buckley; Prashant Doshi; Christopher W. Geib; Floriana Grasso; Nancy Green; Benjamin Johnston; Burt Kaliski; Christopher Kiekintveld; Edith Law; Henry Lieberman; Ole J. Mengshoel; Ted Metzler; Joseph Modayil; Douglas W. Oard; Nilufer Onder; Barry O'Sullivan; Katerina Pastra; Doina Precup; Chris Reed; Sanem Sariel-Talay; Ted Selker; Lokendra Shastri; Satinder P. Singh; Stephen F. Smith; Siddharth Srivastava; Gita Sukthankar
The AAAI-11 workshop program was held Sunday and Monday, August 7–18, 2011, at the Hyatt Regency San Francisco in San Francisco, California USA. The AAAI-11 workshop program included 15 workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were Activity Context Representation: Techniques and Languages; Analyzing Microtext; Applied Adversarial Reasoning and Risk Modeling; Artificial Intelligence and Smarter Living: The Conquest of Complexity; AI for Data Center Management and Cloud Computing; Automated Action Planning for Autonomous Mobile Robots; Computational Models of Natural Argument; Generalized Planning; Human Computation; Human-Robot Interaction in Elder Care; Interactive Decision Theory and Game Theory; Language-Action Tools for Cognitive Artificial Agents: Integrating Vision, Action and Language; Lifelong Learning; Plan, Activity, and Intent Recognition; and Scalable Integration of Analytics and Visualization. This article presents short summaries of those events.
language resources and evaluation | 2010
Katerina Pastra; Christian Wallraven; Michael Schultze; A Vatakis; K Kaulard; Calzolari; Khalid Choukri; Bente Maegaard; Joseph Mariani; J.E.J.M. Odijk; Stelios Piperidis; Mike Rosner; Daniel Tapias
national conference on artificial intelligence | 2011
Katerina Pastra; Eirini Balta; Panagiotis Dimitrakis; Giorgos Karakatsiotis