Katerina Fragkiadaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Katerina Fragkiadaki is active.

Explore More

Publication

Featured researches published by Katerina Fragkiadaki.

computer vision and pattern recognition | 2016

Human Pose Estimation with Iterative Error Feedback

Joao Carreira; Pulkit Agrawal; Katerina Fragkiadaki; Jitendra Malik

Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing. Feedforward architectures can learn rich representations of the input space but do not explicitly model dependencies in the output spaces, that are quite structured for tasks such as articulated human pose estimation or object segmentation. Here we propose a framework that expands the expressive power of hierarchical feature extractors to encompass both input and output spaces, by introducing top-down feedback. Instead of directly predicting the outputs in one go, we use a self-correcting model that progressively changes an initial solution by feeding back error predictions, in a process we call Iterative Error Feedback (IEF). IEF shows excellent performance on the task of articulated pose estimation in the challenging MPII and LSP benchmarks, matching the state-of-the-art without requiring ground truth scale annotation.

computer vision and pattern recognition | 2012

Video segmentation by tracing discontinuities in a trajectory embedding

Katerina Fragkiadaki; Geng Zhang; Jianbo Shi

Our goal is to segment a video sequence into moving objects and the world scene. In recent work, spectral embedding of point trajectories based on 2D motion cues accumulated from their lifespans, has shown to outperform factorization and per frame segmentation methods for video segmentation. The scale and kinematic nature of the moving objects and the background scene determine how close or far apart trajectories are placed in the spectral embedding. Such density variations may confuse clustering algorithms, causing over-fragmentation of object interiors. Therefore, instead of clustering in the spectral embedding, we propose detecting discontinuities of embedding density between spatially neighboring trajectories. Detected discontinuities are strong indicators of object boundaries and thus valuable for video segmentation. We propose a novel embedding discretization process that recovers from over-fragmentations by merging clusters according to discontinuity evidence along inter-cluster boundaries. For segmenting articulated objects, we combine motion grouping cues with a center-surround saliency operation, resulting in “context-aware”, spatially coherent, saliency maps. Figure-ground segmentation obtained from saliency thresholding, provides object connectedness constraints that alter motion based trajectory affinities, by keeping articulated parts together and separating disconnected in time objects. Finally, we introduce Gabriel graphs as effective per frame superpixel maps for converting trajectory clustering to dense image segmentation. Gabriel edges bridge large contour gaps via geometric reasoning without over-segmenting coherent image regions. We present experimental results of our method that outperform the state-of-the-art in challenging motion segmentation datasets.

international conference on computer vision | 2015

Recurrent Network Models for Human Dynamics

Katerina Fragkiadaki; Sergey Levine; Panna Felsen; Jitendra Malik

We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers. We test instantiations of ERD architectures in the tasks of motion capture (mocap) generation, body pose labeling and body pose forecasting in videos. Our model handles mocap training data across multiple subjects and activity domains, and synthesizes novel motions while avoiding drifting for long periods of time. For human pose labeling, ERD outperforms a per frame body part detector by resolving left-right body part confusions. For video pose forecasting, ERD predicts body joint displacements across a temporal horizon of 400ms and outperforms a first order motion model based on optical flow. ERDs extend previous Long Short Term Memory (LSTM) models in the literature to jointly learn representations and their dynamics. Our experiments show such representation learning is crucial for both labeling and prediction in space-time. We find this is a distinguishing feature between the spatio-temporal visual domain in comparison to 1D text, speech or handwriting, where straightforward hard coded representations have shown excellent results when directly combined with recurrent units [31].

computer vision and pattern recognition | 2015

Learning to segment moving objects in videos

Katerina Fragkiadaki; Pablo Andrés Arbeláez; Panna Felsen; Jitendra Malik

We segment moving objects in videos by ranking spatio-temporal segment proposals according to “moving objectness”; how likely they are to contain a moving object. In each video frame, we compute segment proposals using multiple figure-ground segmentations on per frame motion boundaries. We rank them with a Moving Objectness Detector trained on image and motion fields to detect moving objects and discard over/under segmentations or background parts of the scene. We extend the top ranked segments into spatio-temporal tubes using random walkers on motion affinities of dense point trajectories. Our final tube ranking consistently outperforms previous segmentation methods in the two largest video segmentation benchmarks currently available, for any number of proposals. Further, our per frame moving object proposals increase the detection rate up to 7% over previous state-of-the-art static proposal methods.

computer vision and pattern recognition | 2011

Detection free tracking: Exploiting motion and topology for segmenting and tracking under entanglement

Katerina Fragkiadaki; Jianbo Shi

We propose a detection-free system for segmenting multiple interacting and deforming people in a video. People detectors often fail under close agent interaction, limiting the performance of detection based tracking methods. Motion information often fails to separate similarly moving agents or to group distinctly moving articulated body parts. We formulate video segmentation as graph partitioning in the trajectory domain. We classify trajectories as foreground or background based on trajectory saliencies, and use foreground trajectories as graph nodes. We incorporate object connectedness constraints into our trajectory weight matrix based on topology of foreground: we set repulsive weights between trajectories that belong to different connected components in any frame of their time intersection. Attractive weights are set between similarly moving trajectories. Information from foreground topology complements motion information and our spatiotemporal segments can be interpreted as connected moving entities rather than just trajectory groups of similar motion. All our cues are computed on trajectories and naturally encode large temporal context, which is crucial for resolving local in time ambiguities. We present results of our approach on challenging datasets outperforming by far the state of the art.

european conference on computer vision | 2012

Two-granularity tracking: mediating trajectory and detection graphs for tracking under occlusions

Katerina Fragkiadaki; Weiyu Zhang; Geng Zhang; Jianbo Shi

We propose a tracking framework that mediates grouping cues from two levels of tracking granularities, detection tracklets and point trajectories, for segmenting objects in crowded scenes. Detection tracklets capture objects when they are mostly visible. They may be sparse in time, may miss partially occluded or deformed objects, or contain false positives. Point trajectories are dense in space and time. Their affinities integrate long range motion and 3D disparity information, useful for segmentation. Affinities may leak though across similarly moving objects, since they lack model knowledge. We establish one trajectory and one detection tracklet graph, encoding grouping affinities in each space and associations across. Two-granularity tracking is cast as simultaneous detection tracklet classification and clustering (cl2) in the joint space of tracklets and trajectories. We solve cl2 by explicitly mediating contradictory affinities in the two graphs: Detection tracklet classification modifies trajectory affinities to reflect object specific dis-associations. Non-accidental grouping alignment between detection tracklets and trajectory clusters boosts or rejects corresponding detection tracklets, changing accordingly their classification.We show our model can track objects through sparse, inaccurate detections and persistent partial occlusions. It adapts to the changing visibility masks of the targets, in contrast to detection based bounding box trackers, by effectively switching between the two granularities according to object occlusions, deformations and background clutter.

computer vision and pattern recognition | 2013

Pose from Flow and Flow from Pose

Katerina Fragkiadaki; Han Hu; Jianbo Shi

Human pose detectors, although successful in localising faces and torsos of people, often fail with lower arms. With fast movements body motion estimation is often inaccurate. We build a segmentation-detection algorithm that mediates the information between body parts recognition, and multi-frame motion grouping to improve both pose detection and tracking. Motion of body parts, though not accurate, is often sufficient to segment them from their backgrounds. Such segmentations are crucial for extracting hard to detect body parts out of their interior body clutter. By matching these segments to exemplars we obtain pose labeled body segments. The pose labeled segments and corresponding articulated joints are used to improve the motion flow fields by proposing kinematically constrained affine displacements on body parts. The pose-based articulated motion model is shown to handle large limb rotations and displacements. Our algorithm can detect people under rare poses, frequently missed by pose detectors, showing the benefits of jointly reasoning about pose, segmentation and motion in videos.

Pattern Recognition Letters | 2016

The three R's of computer vision

Jitendra Malik; Pablo Andrés Arbeláez; Joao Carreira; Katerina Fragkiadaki; Ross B. Girshick; Georgia Gkioxari; Saurabh Gupta; Bharath Hariharan; Abhishek Kar; Shubham Tulsiani

Bidirectional interactions between recognition, reconstruction and re-organization are very important.Bottom-up grouping generates object candidates, which can be classified top down, following which the segmentations are refined again.Recognition of 3D objects benefits from a reconstruction of 3D structure.3D reconstruction benefits from object category-specific priors.Reconstruction of 3D structure from video data goes hand in hand with the reorganization of the scene. We argue for the importance of the interaction between recognition, reconstruction and re-organization, and propose that as a unifying framework for computer vision. In this view, recognition of objects is reciprocally linked to re-organization, with bottom-up grouping processes generating candidates, which can be classified using top down knowledge, following which the segmentations can be refined again. Recognition of 3D objects could benefit from a reconstruction of 3D structure, and 3D reconstruction can benefit from object category-specific priors. We also show that reconstruction of 3D structure from video data goes hand in hand with the reorganization of the scene. We demonstrate pipelined versions of two systems, one for RGB-D images, and another for RGB images, which produce rich 3D scene interpretations in this framework.

european conference on computer vision | 2010

Figure-ground image segmentation helps weakly-supervised learning of objects

Katerina Fragkiadaki; Jianbo Shi

Given a collection of images containing a common object, we seek to learn a model for the object without the use of bounding boxes or segmentation masks. In linguistics, a single document provides no information about location of the topics it contains. On the contrary, an image has a lot to tell us about where foreground and background topics lie. Extensive literature on modelling bottom-up saliency and pop-out aims at predicting eye fixations and allocation of visual attention in a single image, prior to any recognition of content.Most salient image parts are likely to capture image foreground. We propose a novel probabilistic model, shape and figure-ground aware model (sFGmodel) that exploits bottom-up image saliency to compute an informative prior on segment topic assignments. Our model exploits both figure-ground organization in each image separately, as well as feature re-occurrence across the image collection. Since we use image dependent topic prior, during model learning we optimize a conditional likelihood of the image collection given the image bottom-up saliency information. Our discriminative framework can tolerate larger intraclass variability of objects with fewer training data. We iterate between bottom-up figure-ground image organization and model parameter learning by accumulating image statistics from the entire image collection. The model learned influences later image figure-ground labelling. We present results of our approach on diverse datasets showing great improvement over generative probabilistic models that do not exploit image saliency, indicating the suitability of our model for weakly-supervised visual organization.

medical image computing and computer assisted intervention | 2012

Structural-Flow Trajectories for Unravelling 3D Tubular Bundles

Katerina Fragkiadaki; Weiyu Zhang; Jianbo Shi; Elena Bernardis

We cast segmentation of 3D tubular structures in a bundle as partitioning of structural-flow trajectories. Traditional 3D segmentation algorithms aggregate local pixel correlations incrementally along a 3D stack. In contrast, structural-flow trajectories establish long range pixel correspondences and their affinities propagate grouping cues across the entire volume simultaneously, from informative to non-informative places. Segmentation by trajectory clustring recovers from persistent ambiguities caused by faint boundaries or low contrast, common in medical images. Trajectories are computed by linking successive registration fields, each one registering pairs of consecutive slices of the 3D stack. We show our method effectively unravels densely packed tubular structures, without any supervision or 3D shape priors, outperforming previous 2D and 3D segmentation algorithms.

Explore More