Georg Layher | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Georg Layher is active.

Explore More

Publication

Featured researches published by Georg Layher.

affective computing and intelligent interaction | 2011

Multiple classifier systems for the classificatio of audio-visual emotional states

Michael Glodek; Stephan Tschechne; Georg Layher; Martin Schels; Tobias Brosch; Stefan Scherer; Markus Kächele; Miriam Schmidt; Heiko Neumann; Günther Palm; Friedhelm Schwenker

Research activities in the field of human-computer interaction increasingly addressed the aspect of integrating some type of emotional intelligence. Human emotions are expressed through different modalities such as speech, facial expressions, hand or body gestures, and therefore the classification of human emotions should be considered as a multimodal pattern recognition problem. The aim of our paper is to investigate multiple classifier systems utilizing audio and visual features to classify human emotional states. For that a variety of features have been derived. From the audio signal the fundamental frequency, LPCand MFCC coefficients, and RASTA-PLP have been used. In addition to that two types of visual features have been computed, namely form and motion features of intermediate complexity. The numerical evaluation has been performed on the four emotional labels Arousal, Expectancy, Power, Valence as defined in the AVEC data set. As classifier architectures multiple classifier systems are applied, these have been proven to be accurate and robust against missing and noisy data.

perception and interactive technologies | 2006

Detection of head pose and gaze direction for human-computer interaction

Ulrich Weidenbacher; Georg Layher; Pierre Bayerl; Heiko Neumann

In this contribution we extend existing methods for head pose estimation and investigate the use of local image phase for gaze detection. Moreover we describe how a small database of face images with given ground truth for head pose and gaze direction was acquired. With this database we compare two different computational approaches for extracting the head pose. We demonstrate that a simple implementation of the proposed methods without extensive training sessions or calibration is sufficient to accurately detect the head pose for human-computer interaction. Furthermore, we propose how eye gaze can be extracted based on the outcome of local filter responses and the detected head pose. In all, we present a framework where different approaches are combined to a single system for extracting information about the attentional state of a person.

Topics in Cognitive Science | 2014

Learning Representations of Animated Motion Sequences—A Neural Model

Georg Layher; Martin A. Giese; Heiko Neumann

The detection and categorization of animate motions is a crucial task underlying social interaction and perceptual decision making. Neural representations of perceived animate objects are partially located in the primate cortical region STS, which is a region that receives convergent input from intermediate-level form and motion representations. Populations of STS cells exist which are selectively responsive to specific animated motion sequences, such as walkers. It is still unclear how and to what extent form and motion information contribute to the generation of such representations and what kind of mechanisms are involved in the learning processes. The article develops a cortical model architecture for the unsupervised learning of animated motion sequence representations. We demonstrate how the model automatically selects significant motion patterns as well as meaningful static form prototypes characterized by a high degree of articulation. Such key poses are selectively reinforced during learning through a cross talk between the motion and form processing streams. Furthermore, we show how sequence-selective representations are learned in STS by fusing static form and motion input from the segregated bottom-up driving input streams. Cells in STS, in turn, feed their activities recurrently to their input sites along top-down signal pathways. We show how such learned feedback connections enable predictions about future input as anticipation generated by sequence-selective STS cells. Network simulations demonstrate the computational capacity of the proposed model by reproducing several experimental findings from neurosciences and by accounting for recent behavioral data.

Frontiers in Computational Neuroscience | 2015

Embodied learning of a generative neural model for biological motion perception and inference

Fabian Schrodt; Georg Layher; Heiko Neumann; Martin V. Butz

Although an action observation network and mirror neurons for understanding the actions and intentions of others have been under deep, interdisciplinary consideration over recent years, it remains largely unknown how the brain manages to map visually perceived biological motion of others onto its own motor system. This paper shows how such a mapping may be established, even if the biologically motion is visually perceived from a new vantage point. We introduce a learning artificial neural network model and evaluate it on full body motion tracking recordings. The model implements an embodied, predictive inference approach. It first learns to correlate and segment multimodal sensory streams of own bodily motion. In doing so, it becomes able to anticipate motion progression, to complete missing modal information, and to self-generate learned motion sequences. When biological motion of another person is observed, this self-knowledge is utilized to recognize similar motion patterns and predict their progress. Due to the relative encodings, the model shows strong robustness in recognition despite observing rather large varieties of body morphology and posture dynamics. By additionally equipping the model with the capability to rotate its visual frame of reference, it is able to deduce the visual perspective onto the observed person, establishing full consistency to the embodied self-motion encodings by means of active inference. In further support of its neuro-cognitive plausibility, we also model typical bistable perceptions when crucial depth information is missing. In sum, the introduced neural model proposes a solution to the problem of how the human brain may establish correspondence between observed bodily motion and its own motor system, thus offering a mechanism that supports the development of mirror neurons.

international conference on artificial neural networks | 2012

Recognizing human activities using a layered markov architecture

Michael Glodek; Georg Layher; Friedhelm Schwenker; Günther Palm

In the field of human computer interaction (HCI) the detection and classification of human activity patterns has become an important challenge. The problem can be understood as a specific problem of pattern recognition which addresses three topics, namely fusion of multiple modalities, spatio-temporal structures and a vast variety of pattern appearances the more abstract a pattern gets. In order to approach the problem, we propose a layered architecture which decomposes temporal patterns into elementary sub-patterns. Within each layer the patterns are detected using Markov models. The results of a layer are passed to the next successive layer such that on each layer the temporal granularity and the complexity of patterns increases. A dataset containing activities in an office scenario was recorded. The activities are decomposed to basic actions which are detected on the first layer. We evaluated a two-layered architecture using the dataset showing the feasibility of the approach.

computer vision and pattern recognition | 2017

Fully Convolutional Region Proposal Networks for Multispectral Person Detection

Daniel Konig; Michael Adam; Christian Jarvers; Georg Layher; Heiko Neumann; Michael Teutsch

Multispectral images that combine visual-optical (VIS) and infrared (IR) image information are a promising source of data for automatic person detection. Especially in automotive or surveillance applications, challenging conditions such as insufficient illumination or large distances between camera and object occur regularly and can affect image quality. This leads to weak image contrast or low object resolution. In order to detect persons under such conditions, we apply deep learning for effectively fusing the VIS and IR information in multispectral images. We present a novel multispectral Region Proposal Network (RPN) that is built up on the pre-trained very deep convolutional network VGG-16. The proposals of this network are further evaluated using a Boosted Decision Trees classifier in order to reduce potential false positive detections. With a log-average miss rate of 29:83% on the reasonable test set of the KAIST Multispectral Pedestrian Detection Benchmark, we improve the current state-of-the-art by about 18%.

international conference on artificial neural networks | 2012

Learning representations for animated motion sequence and implied motion recognition

Georg Layher; Martin A. Giese; Heiko Neumann

The detection and categorization of animate motions is a crucial task underlying social interaction and decision-making. Neural representations of perceived animate objects are built into cortical area STS which is a region of convergent input from intermediate level form and motion representations. Populations of STS cells exist which are selectively responsive to specific action sequences, such as walkers. It is still unclear how and to which extent form and motion information contribute to the generation of such representations and what kind of mechanisms are utilized for the learning processes. The paper develops a cortical model architecture for the unsupervised learning of animated motion sequence representations. We demonstrate how the model automatically selects significant motion patterns as well as meaningful static snapshot categories from continuous video input. Such keyposes correspond to articulated postures which are utilized in probing the trained network to impose implied motion perception from static views. We also show how sequence selective representations are learned in STS by fusing snapshot and motion input and how learned feedback connections enable making predictions about future input. Network simulations demonstrate the computational capacity of the proposed model.

international conference on image analysis and processing | 2011

Robust stereoscopic head pose estimation in human-computer interaction and a unified evaluation framework

Georg Layher; Hendrik Liebau; Robert Niese; Ayoub Al-Hamadi; Bernd Michaelis; Heiko Neumann

The automatic processing and estimation of view direction and head pose in interactive scenarios is an actively investigated research topic in the development of advanced human-computer or human-robot interfaces. Still, current state of the art approaches often make rigid assumptions concerning the scene illumination and viewing distance in order to achieve stable results. In addition, there is a lack of rigorous evaluation criteria to compare different computational vision approaches and to judge their flexibility. In this work, we make a step towards the employment of robust computational vision mechanisms to estimate the actors head pose and thus the direction of his focus of attention. We propose a domain specific mechanism based on learning to estimate stereo correspondences of image pairs. Furthermore, in order to facilitate the evaluation of computational vision results, we present a data generation framework capable of image synthesis under controlled pose conditions using an arbitrary camera setup with a free number of cameras. We show some computational results of our proposed mechanism as well as an evaluation based on the available reference data.

Frontiers in Neurorobotics | 2017

Real-Time Biologically Inspired Action Recognition from Key Poses Using a Neuromorphic Architecture

Georg Layher; Tobias Brosch; Heiko Neumann

Intelligent agents, such as robots, have to serve a multitude of autonomous functions. Examples are, e.g., collision avoidance, navigation and route planning, active sensing of its environment, or the interaction and non-verbal communication with people in the extended reach space. Here, we focus on the recognition of the action of a human agent based on a biologically inspired visual architecture of analyzing articulated movements. The proposed processing architecture builds upon coarsely segregated streams of sensory processing along different pathways which separately process form and motion information (Layher et al., 2014). Action recognition is performed in an event-based scheme by identifying representations of characteristic pose configurations (key poses) in an image sequence. In line with perceptual studies, key poses are selected unsupervised utilizing a feature-driven criterion which combines extrema in the motion energy with the horizontal and the vertical extendedness of a body shape. Per class representations of key pose frames are learned using a deep convolutional neural network consisting of 15 convolutional layers. The network is trained using the energy-efficient deep neuromorphic networks (Eedn) framework (Esser et al., 2016), which realizes the mapping of the trained synaptic weights onto the IBM Neurosynaptic System platform (Merolla et al., 2014). After the mapping, the trained network achieves real-time capabilities for processing input streams and classify input images at about 1,000 frames per second while the computational stages only consume about 70 mW of energy (without spike transduction). Particularly regarding mobile robotic systems, a low energy profile might be crucial in a variety of application scenarios. Cross-validation results are reported for two different datasets and compared to state-of-the-art action recognition approaches. The results demonstrate, that (I) the presented approach is on par with other key pose based methods described in the literature, which select key pose frames by optimizing classification accuracy, (II) compared to the training on the full set of frames, representations trained on key pose frames result in a higher confidence in class assignments, and (III) key pose representations show promising generalization capabilities in a cross-dataset evaluation.

international conference on development and learning | 2014

Modeling perspective-taking upon observation of 3D biological motion

Fabian Schrodt; Georg Layher; Heiko Neumann; Martin V. Butz

It appears that the mirror neuron system plays a crucial role when learning by imitation. However, it remains unclear how mirror neuron properties develop in the first place. A likely prerequisite for developing mirror neurons may be the capability to transform observed motion into a sufficiently self-centered frame of reference. We propose an artificial neural network (NN) model that implements such a transformation capability by a highly embodied approach: The model first learns to correlate and predict self-induced motion patterns by associating egocentric visual and proprioceptive perceptions. Once these predictions are sufficiently accurate, a robust and invariant recognition of observed biological motion becomes possible by allowing a self-supervised, error-driven adaption of the visual frame of reference. The NN is a modified, dynamic, adaptive resonance model, which features self-supervised learning and adjustment, neural field normalization, and information-driven neural noise adaptation. The developed architecture is evaluated with a simulated 3D humanoid walker with 12 body landmarks and 10 angular DOF. The model essentially shows how an internal frame of reference adaptation for deriving the perspective of another person can be acquired by first learning about the own bodily motion dynamics and by then exploiting this self-knowledge upon the observation of other, relative, biological motion patterns. The insights gained by the model may have significant implications for the development of social capabilities and respective impairments.

Explore More