Christoph Feichtenhofer

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christoph Feichtenhofer is active.

Explore More

Publication

Featured researches published by Christoph Feichtenhofer.

computer vision and pattern recognition | 2016

Convolutional Two-Stream Network Fusion for Video Action Recognition

Christoph Feichtenhofer; Axel Pinz; Andrew Zisserman

Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information. We study a number of ways of fusing ConvNet towers both spatially and temporally in order to best take advantage of this spatio-temporal information. We make the following findings: (i) that rather than fusing at the softmax layer, a spatial and temporal network can be fused at a convolution layer without loss of performance, but with a substantial saving in parameters, (ii) that it is better to fuse such networks spatially at the last convolutional layer than earlier, and that additionally fusing at the class prediction layer can boost accuracy, finally (iii) that pooling of abstract convolutional features over spatiotemporal neighbourhoods further boosts performance. Based on these studies we propose a new ConvNet architecture for spatiotemporal fusion of video snippets, and evaluate its performance on standard benchmarks where this architecture achieves state-of-the-art results.

computer vision and pattern recognition | 2017

Spatiotemporal Multiplier Networks for Video Action Recognition

Christoph Feichtenhofer; Axel Pinz; Richard P. Wildes

This paper presents a general ConvNet architecture for video action recognition based on multiplicative interactions of spacetime features. Our model combines the appearance and motion pathways of a two-stream architecture by motion gating and is trained end-to-end. We theoretically motivate multiplicative gating functions for residual networks and empirically study their effect on classification accuracy. To capture long-term dependencies we inject identity mapping kernels for learning temporal relationships. Our architecture is fully convolutional in spacetime and able to evaluate a video in a single forward pass. Empirical investigation reveals that our model produces state-of-the-art results on two standard action recognition datasets.

IEEE Signal Processing Letters | 2013

A Perceptual Image Sharpness Metric Based on Local Edge Gradient Analysis

Christoph Feichtenhofer; Hannes Fassold; Peter Schallauer

In this letter, a no-reference perceptual sharpness metric based on a statistical analysis of local edge gradients is presented. The method takes properties of the human visual system into account. Based on perceptual properties, a relationship between the extracted statistical features and the metric score is established to form a Perceptual Sharpness Index (PSI). A comparison with state-of-the-art metrics shows that the proposed method correlates highly with human perception and exhibits low computational complexity. In contrast to existing metrics, the PSI performs well for a wide range of blurriness and shows a high degree of invariance for different image contents.

computer vision and pattern recognition | 2014

Bags of Spacetime Energies for Dynamic Scene Recognition

Christoph Feichtenhofer; Axel Pinz; Richard P. Wildes

This paper presents a unified bag of visual word (BoW) framework for dynamic scene recognition. The approach builds on primitive features that uniformly capture spatial and temporal orientation structure of the imagery (e.g., video), as extracted via application of a bank of spatiotemporally oriented filters. Various feature encoding techniques are investigated to abstract the primitives to an intermediate representation that is best suited to dynamic scene representation. Further, a novel approach to adaptive pooling of the encoded features is presented that captures spatial layout of the scene even while being robust to situations where camera motion and scene dynamics are confounded. The resulting overall approach has been evaluated on two standard, publically available dynamic scene datasets. The results show that in comparison to a representative set of alternatives, the proposed approach outperforms the previous state-of-the-art in classification accuracy by 10%.

british machine vision conference | 2013

Spacetime Forests with Complementary Features for Dynamic Scene Recognition.

Christoph Feichtenhofer; Axel Pinz; Richard P. Wildes

This paper presents spacetime forests defined over complementary spatial and temporal features for recognition of naturally occurring dynamic scenes. The approach improves on the previous state-of-the-art in both classification and execution rates. A particular improvement is with increased robustness to camera motion, where previous approaches have experienced difficulty. There are three key novelties in the approach. First, a novel spacetime descriptor is employed that exploits the complementary nature of spatial and temporal information, as inspired by previous research on the role of orientation features in scene classification. Second, a forest-based classifier is used to learn a multi-class representation of the feature distributions. Third, the video is processed in temporal slices with scale matched preferentially to scene dynamics over camera motion. Slicing allows for temporal alignment to be handled as latent information in the classifier and for efficient, incremental processing. The integrated approach is evaluated empirically on two publically available datasets to document its outstanding performance.

international conference on rfid | 2014

Fusing RFID and computer vision for probabilistic tag localization

Michael Goller; Christoph Feichtenhofer; Axel Pinz

The combination of RFID and computer vision systems is an effective approach to mitigate the limited tag localization capabilities of current RFID deployments. In this paper, we present a hybrid RFID and computer vision system for localization and tracking of RFID tags. The proposed system combines the information from the two complementary sensor modalities in a probabilistic manner and provides a high degree of flexibility. In addition, we introduce a robust data association method which is crucial for the application in practical scenarios. To demonstrate the performance of the proposed system, we conduct a series of experiments in an article surveillance setup. This is a frequent application for RFID systems in retail where previous approaches solely based on RFID localization have difficulties due to false alarms triggered by stationary tags. Our evaluation shows that the fusion of RFID and computer vision provides robustness to false positive observations and allows for a reliable system operation.

computer vision and pattern recognition | 2017

Temporal Residual Networks for Dynamic Scene Recognition

Christoph Feichtenhofer; Axel Pinz; Richard P. Wildes

This paper combines three contributions to establish a new state-of-the-art in dynamic scene recognition. First, we present a novel ConvNet architecture based on temporal residual units that is fully convolutional in spacetime. Our model augments spatial ResNets with convolutions across time to hierarchically add temporal residuals as the depth of the network increases. Second, existing approaches to video-based recognition are categorized and a baseline of seven previously top performing algorithms is selected for comparative evaluation on dynamic scenes. Third, we introduce a new and challenging video database of dynamic scenes that more than doubles the size of those previously available. This dataset is explicitly split into two subsets of equal size that contain videos with and without camera motion to allow for systematic study of how this variable interacts with the defining dynamics of the scene per se. Our evaluations verify the particular strengths and weaknesses of the baseline algorithms with respect to various scene classes and camera motion parameters. Finally, our temporal ResNet boosts recognition performance and establishes a new state-of-the-art on dynamic scene recognition, as well as on the complementary task of action recognition.

computer vision and pattern recognition | 2015

Dynamically encoded actions based on spacetime saliency

Christoph Feichtenhofer; Axel Pinz; Richard P. Wildes

Human actions typically occur over a well localized extent in both space and time. Similarly, as typically captured in video, human actions have small spatiotemporal support in image space. This paper capitalizes on these observations by weighting feature pooling for action recognition over those areas within a video where actions are most likely to occur. To enable this operation, we define a novel measure of spacetime saliency. The measure relies on two observations regarding foreground motion of human actors: They typically exhibit motion that contrasts with that of their surrounding region and they are spatially compact. By using the resulting definition of saliency during feature pooling we show that action recognition performance achieves state-of-the-art levels on three widely considered action recognition datasets. Our saliency weighted pooling can be applied to essentially any locally defined features and encodings thereof. Additionally, we demonstrate that inclusion of locally aggregated spatiotemporal energy features, which efficiently result as a by-product of the saliency computation, further boosts performance over reliance on standard action recognition features alone.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016

Dynamic Scene Recognition with Complementary Spatiotemporal Features

Christoph Feichtenhofer; Axel Pinz; Richard P. Wildes

This paper presents Dynamically Pooled Complementary Features (DPCF), a unified approach to dynamic scene recognition that analyzes a short video clip in terms of its spatial, temporal and color properties. The complementarity of these properties is preserved through all main steps of processing, including primitive feature extraction, coding and pooling. In the feature extraction step, spatial orientations capture static appearance, spatiotemporal oriented energies capture image dynamics and color statistics capture chromatic information. Subsequently, primitive features are encoded into a mid-level representation that has been learned for the task of dynamic scene recognition. Finally, a novel dynamic spacetime pyramid is introduced. This dynamic pooling approach can handle both global as well as local motion by adapting to the temporal structure, as guided by pooling energies. The resulting system provides online recognition of dynamic scenes that is thoroughly evaluated on the two current benchmark datasets and yields best results to date on both datasets. In-depth analysis reveals the benefits of explicitly modeling feature complementarity in combination with the dynamic spacetime pyramid, indicating that this unified approach should be well-suited to many areas of video analysis.

neural information processing systems | 2016