Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where George Papandreou is active.

Publication


Featured researches published by George Papandreou.


international conference on computer vision | 2015

Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation

George Papandreou; Liang-Chieh Chen; Kevin P. Murphy; Alan L. Yuille

Deep convolutional neural networks (DCNNs) trained on a large number of images with strong pixel-level annotations have recently significantly pushed the state-of-art in semantic image segmentation. We study the more challenging problem of learning DCNNs for semantic image segmentation from either (1) weakly annotated training data such as bounding boxes or image-level labels or (2) a combination of few strongly labeled and many weakly labeled images, sourced from one or multiple datasets. We develop Expectation-Maximization (EM) methods for semantic image segmentation model training under these weakly supervised and semi-supervised settings. Extensive experimental evaluation shows that the proposed techniques can learn models delivering competitive results on the challenging PASCAL VOC 2012 image segmentation benchmark, while requiring significantly less annotation effort. We share source code implementing the proposed system at https://bitbucket.org/deeplab/deeplab-public.


IEEE Transactions on Image Processing | 2007

Multigrid Geometric Active Contour Models

George Papandreou; Petros Maragos

Geometric active contour models are very popular partial differential equation-based tools in image analysis and computer vision. We present a new multigrid algorithm for the fast evolution of level-set-based geometric active contours and compare it with other established numerical schemes. We overcome the main bottleneck associated with most numerical implementations of geometric active contours, namely the need for very small time steps to avoid instability, by employing a very stable fully 2-D implicit-explicit time integration numerical scheme. The proposed scheme is more accurate and has improved rotational invariance properties compared with alternative split schemes, particularly when big time steps are utilized. We then apply properly designed multigrid methods to efficiently solve the occurring sparse linear system. The combined algorithm allows for the rapid evolution of the contour and convergence to its final configuration after very few iterations. Image segmentation experiments demonstrate the efficiency and accuracy of the method


international conference on computer vision | 2015

Im2Calories: Towards an Automated Mobile Vision Food Diary

Austin Myers; Nick Johnston; Vivek Rathod; Anoop Korattikara; Alexander N. Gorban; Nathan Silberman; Sergio Guadarrama; George Papandreou; Jonathan Huang; Kevin P. Murphy

We present a system which can recognize the contents of your meal from a single image, and then predict its nutritional contents, such as calories. The simplest version assumes that the user is eating at a restaurant for which we know the menu. In this case, we can collect images offline to train a multi-label classifier. At run time, we apply the classifier (running on your phone) to predict which foods are present in your meal, and we lookup the corresponding nutritional facts. We apply this method to a new dataset of images from 23 different restaurants, using a CNN-based classifier, significantly outperforming previous work. The more challenging setting works outside of restaurants. In this case, we need to estimate the size of the foods, as well as their labels. This requires solving segmentation and depth / volume estimation from a single image. We present CNN-based approaches to these problems, with promising preliminary results.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

George Papandreou; Athanassios Katsamanis; Vassilis Pitsikalis; Petros Maragos

While the accuracy of feature measurements heavily depends on changing environmental conditions, studying the consequences of this fact in pattern recognition tasks has received relatively little attention to date. In this paper, we explicitly take feature measurement uncertainty into account and show how multimodal classification and learning rules should be adjusted to compensate for its effects. Our approach is particularly fruitful in multimodal fusion scenarios, such as audiovisual speech recognition, where multiple streams of complementary time-evolving features are integrated. For such applications, provided that the measurement noise uncertainty for each feature stream can be estimated, the proposed framework leads to highly adaptive multimodal fusion rules which are easy and efficient to implement. Our technique is widely applicable and can be transparently integrated with either synchronous or asynchronous multimodal sequence integration architectures. We further show that multimodal fusion methods relying on stream weights can naturally emerge from our scheme under certain assumptions; this connection provides valuable insights into the adaptivity properties of our multimodal uncertainty compensation approach. We show how these ideas can be practically applied for audiovisual speech recognition. In this context, we propose improved techniques for person-independent visual feature extraction and uncertainty estimation with active appearance models, and also discuss how enhanced audio features along with their uncertainty estimates can be effectively computed. We demonstrate the efficacy of our approach in audiovisual speech recognition experiments on the CUAVE database using either synchronous or asynchronous multimodal integration models.


computer vision and pattern recognition | 2016

Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform

Liang-Chieh Chen; Jonathan T. Barron; George Papandreou; Kevin P. Murphy; Alan L. Yuille

Deep convolutional neural networks (CNNs) are the backbone of state-of-art semantic image segmentation systems. Recent work has shown that complementing CNNs with fully-connected conditional random fields (CRFs) can significantly enhance their object localization accuracy, yet dense CRF inference is computationally expensive. We propose replacing the fully-connected CRF with domain transform (DT), a modern edge-preserving filtering method in which the amount of smoothing is controlled by a reference edge map. Domain transform filtering is several times faster than dense CRF inference and we show that it yields comparable semantic segmentation results, accurately capturing object boundaries. Importantly, our formulation allows learning the reference edge map from intermediate CNN features instead of using the image gradient magnitude as in standard DT filtering. This produces task-specific edges in an end-to-end trainable system optimizing the target semantic segmentation quality.


international conference on computer vision | 2011

Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models

George Papandreou; Alan L. Yuille

We propose a novel way to induce a random field from an energy function on discrete labels. It amounts to locally injecting noise to the energy potentials, followed by finding the global minimum of the perturbed energy function. The resulting Perturb-and-MAP random fields harness the power of modern discrete energy minimization algorithms, effectively transforming them into efficient random sampling algorithms, thus extending their scope beyond the usual deterministic setting. In this fashion we can enjoy the benefits of a sound probabilistic framework, such as the ability to represent the solution uncertainty or learn model parameters from training data, while completely bypassing costly Markov-chain Monte-Carlo procedures typically associated with discrete label Gibbs Markov random fields (MRFs). We study some interesting theoretical properties of the proposed model in juxtaposition to those of Gibbs MRFs and address the issue of principled design of the perturbation process. We present experimental results in image segmentation and scene labeling that illustrate the new qualitative aspects and the potential of the proposed model for practical computer vision applications.


computer vision and pattern recognition | 2008

Adaptive and constrained algorithms for inverse compositional Active Appearance Model fitting

George Papandreou; Petros Maragos

Parametric models of shape and texture such as active appearance models (AAMs) are diverse tools for deformable object appearance modeling and have found important applications in both image synthesis and analysis problems. Among the numerous algorithms that have been proposed for AAM fitting, those based on the inverse-compositional image alignment technique have recently received considerable attention due to their potential for high efficiency. However, existing fitting algorithms perform poorly when used in conjunction with models exhibiting significant appearance variation, such as AAMs trained on multiple-subject human face images. We introduce two enhancements to inverse-compositional AAM matching algorithms in order to overcome this limitation. First, we propose fitting algorithm adaptation, by means of (a) fitting matrix adjustment and (b) AAM mean template update. Second, we show how prior information can be incorporated and constrain the AAM fitting process. The inverse-compositional nature of the algorithm allows efficient implementation of these enhancements. Both techniques substantially improve AAM fitting performance, as demonstrated with experiments on publicly available multi-face datasets.


IEEE Transactions on Image Processing | 2009

Bayesian Inference on Multiscale Models for Poisson Intensity Estimation: Applications to Photon-Limited Image Denoising

Stamatios Lefkimmiatis; Petros Maragos; George Papandreou

We present an improved statistical model for analyzing Poisson processes, with applications to photon-limited imaging. We build on previous work, adopting a multiscale representation of the Poisson process in which the ratios of the underlying Poisson intensities (rates) in adjacent scales are modeled as mixtures of conjugate parametric distributions. Our main contributions include: 1) a rigorous and robust regularized expectation-maximization (EM) algorithm for maximum-likelihood estimation of the rate-ratio density parameters directly from the noisy observed Poisson data (counts); 2) extension of the method to work under a multiscale hidden Markov tree model (HMT) which couples the mixture label assignments in consecutive scales, thus modeling interscale coefficient dependencies in the vicinity of image edges; 3) exploration of a 2-D recursive quad-tree image representation, involving Dirichlet-mixture rate-ratio densities, instead of the conventional separable binary-tree image representation involving beta-mixture rate-ratio densities; and 4) a novel multiscale image representation, which we term Poisson-Haar decomposition, that better models the image edge structure, thus yielding improved performance. Experimental results on standard images with artificially simulated Poisson noise and on real photon-limited images demonstrate the effectiveness of the proposed techniques.


computer vision and pattern recognition | 2017

Towards Accurate Multi-person Pose Estimation in the Wild

George Papandreou; Tyler Zhu; Nori Kanazawa; Alexander Toshev; Jonathan Tompson; Chris Bregler; Kevin P. Murphy

We propose a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task. It is a simple, yet powerful, top-down approach consisting of two stages. In the first stage, we predict the location and scale of boxes which are likely to contain people, for this we use the Faster RCNN detector. In the second stage, we estimate the keypoints of the person potentially contained in each proposed bounding box. For each keypoint type we predict dense heatmaps and offsets using a fully convolutional ResNet. To combine these outputs we introduce a novel aggregation procedure to obtain highly localized keypoint predictions. We also use a novel form of keypoint-based Non-Maximum-Suppression (NMS), instead of the cruder box-level NMS, and a novel form of keypoint-based confidence score estimation, instead of box-level scoring. Trained on COCO data alone, our final system achieves average precision of 0.649 on the COCO test-dev set and the 0.643 test-standard sets, outperforming the winner of the 2016 COCO keypoints challenge and other recent state-of-art. Further, by using additional in-house labeled data we obtain an even higher average precision of 0.685 on the test-dev set and 0.673 on the test-standard set, more than 5% absolute improvement compared to the previous best performing method on the same dataset.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

Face Active Appearance Modeling and Speech Acoustic Information to Recover Articulation

Athanassios Katsamanis; George Papandreou; Petros Maragos

We are interested in recovering aspects of vocal tracts geometry and dynamics from speech, a problem referred to as speech inversion. Traditional audio-only speech inversion techniques are inherently ill-posed since the same speech acoustics can be produced by multiple articulatory configurations. To alleviate the ill-posedness of the audio-only inversion process, we propose an inversion scheme which also exploits visual information from the speakers face. The complex audiovisual-to-articulatory mapping is approximated by an adaptive piecewise linear model. Model switching is governed by a Markovian discrete process which captures articulatory dynamic information. Each constituent linear mapping is effectively estimated via canonical correlation analysis. In the described multimodal context, we investigate alternative fusion schemes which allow interaction between the audio and visual modalities at various synchronization levels. For facial analysis, we employ active appearance models (AAMs) and demonstrate fully automatic face tracking and visual feature extraction. Using the AAM features in conjunction with audio features such as Mel frequency cepstral coefficients (MFCCs) or line spectral frequencies (LSFs) leads to effective estimation of the trajectories followed by certain points of interest in the speech production system. We report experiments on the QSMT and MOCHA databases which contain audio, video, and electromagnetic articulography data recorded in parallel. The results show that exploiting both audio and visual modalities in a multistream hidden Markov model based scheme clearly improves performance relative to either audio or visual-only estimation.

Collaboration


Dive into the George Papandreou's collaboration.

Top Co-Authors

Avatar

Petros Maragos

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Alan L. Yuille

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Athanassios Katsamanis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Vassilis Pitsikalis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Stavros Tsogkas

École des ponts ParisTech

View shared research outputs
Researchain Logo
Decentralizing Knowledge