Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dimitris N. Metaxas is active.

Publication


Featured researches published by Dimitris N. Metaxas.


IEEE Transactions on Image Processing | 2011

A Level Set Method for Image Segmentation in the Presence of Intensity Inhomogeneities With Application to MRI

Chunming Li; Rui Huang; Zhaohua Ding; J.C. Gatenby; Dimitris N. Metaxas; John C. Gore

Intensity inhomogeneity often occurs in real-world images, which presents a considerable challenge in image segmentation. The most widely used image segmentation algorithms are region-based and typically rely on the homogeneity of the image intensities in the regions of interest, which often fail to provide accurate segmentation results due to the intensity inhomogeneity. This paper proposes a novel region-based method for image segmentation, which is able to deal with intensity inhomogeneities in the segmentation. First, based on the model of images with intensity inhomogeneities, we derive a local intensity clustering property of the image intensities, and define a local clustering criterion function for the image intensities in a neighborhood of each point. This local clustering criterion function is then integrated with respect to the neighborhood center to give a global criterion of image segmentation. In a level set formulation, this criterion defines an energy in terms of the level set functions that represent a partition of the image domain and a bias field that accounts for the intensity inhomogeneity of the image. Therefore, by minimizing this energy, our method is able to simultaneously segment the image and estimate the bias field, and the estimated bias field can be used for intensity inhomogeneity correction (or bias correction). Our method has been validated on synthetic images and real images of various modalities, with desirable performance in the presence of intensity inhomogeneities. Experiments show that our method is more robust to initialization, faster and more accurate than the well-known piecewise smooth model. As an application, our method has been used for segmentation and bias correction of magnetic resonance (MR) images with promising results.


computer vision and pattern recognition | 2005

Discriminative density propagation for 3D human motion estimation

Cristian Sminchisescu; Atul Kanaujia; Zhiguo Li; Dimitris N. Metaxas

We describe a mixture density propagation algorithm to estimate 3D human motion in monocular video sequences based on observations encoding the appearance of image silhouettes. Our approach is discriminative rather than generative, therefore it does not require the probabilistic inversion of a predictive observation model. Instead, it uses a large human motion capture data-base and a 3D computer graphics human model in order to synthesize training pairs of typical human configurations together with their realistically rendered 2D silhouettes. These are used to directly learn to predict the conditional state distributions required for 3D body pose tracking and thus avoid using the generative 3D model for inference (the learned discriminative predictors can also be used, complementary, as importance samplers in order to improve mixing or initialize generative inference algorithms). We aim for probabilistically motivated tracking algorithms and for models that can represent complex multivalued mappings common in inverse, uncertain perception inferences. Our paper has three contributions: (1) we establish the density propagation rules for discriminative inference in continuous, temporal chain models; (2) we propose flexible algorithms for learning multimodal state distributions based on compact, conditional Bayesian mixture of experts models; and (3) we demonstrate the algorithms empirically on real and motion capture-based test sequences and compare against nearest-neighbor and regression methods.


Medical Image Analysis | 2012

Towards robust and effective shape prior modeling: sparse shape composition

Dimitris N. Metaxas; Shaoting Zhang

Organ shape plays an important role in various clinical practices, e.g., diagnosis, surgical planning and treatment evaluation. It is usually derived from low level appearance cues in medical images. However, due to diseases and imaging artifacts, low level appearance cues might be weak or misleading. In this situation, shape priors become critical to infer and refine the shape derived by image appearances. Effective modeling of shape priors is challenging because: (1) shape variation is complex and cannot always be modeled by a parametric probability distribution; (2) a shape instance derived from image appearance cues (input shape) may have gross errors; and (3) local details of the input shape are difficult to preserve if they are not statistically significant in the training data. In this paper we propose a novel Sparse Shape Composition model (SSC) to deal with these three challenges in a unified framework. In our method, a sparse set of shapes in the shape repository is selected and composed together to infer/refine an input shape. The a priori information is thus implicitly incorporated on-the-fly. Our model leverages two sparsity observations of the input shape instance: (1) the input shape can be approximately represented by a sparse linear combination of shapes in the shape repository; (2) parts of the input shape may contain gross errors but such errors are sparse. Our model is formulated as a sparse learning problem. Using L1 norm relaxation, it can be solved by an efficient expectation-maximization (EM) type of framework. Our method is extensively validated on two medical applications, 2D lung localization in X-ray images and 3D liver segmentation in low-dose CT scans. Compared to state-of-the-art methods, our model exhibits better performance in both studies.


Computer Vision and Image Understanding | 2006

Conditional models for contextual human motion recognition

Cristian Sminchisescu; Atul Kanaujia; Dimitris N. Metaxas

We present algorithms for recognizing human motion in monocular video sequences, based on discriminative conditional random field (CRF) and maximum entropy Markov models (MEMM). Existing approaches to this problem typically use generative (joint) structures like the hidden Markov model (HMM). Therefore they have to make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate overlapping features or long term contextual dependencies in the observation sequence. In contrast, conditional models like the CRFs seamlessly represent contextual dependencies, support efficient, exact inference using dynamic programming, and their parameters can be trained using convex optimization. We introduce conditional graphical models as complementary tools for human motion recognition and present an extensive set of experiments that show how these typically outperform HMMs in classifying not only diverse human activities like walking, jumping. running, picking or dancing, but also for discriminating among subtle motion styles like normal walk and wander walk


eurographics | 2004

High Resolution Acquisition, Learning and Transfer of Dynamic 3‐D Facial Expressions

Yang Wang; Xiaolei Huang; Chan-Su Lee; Song Zhang; Zhiguo Li; Dimitris Samaras; Dimitris N. Metaxas; Ahmed M. Elgammal; Peisen Huang

Synthesis and re‐targeting of facial expressions is central to facial animation and often involves significant manual work in order to achieve realistic expressions, due to the difficulty of capturing high quality dynamic expression data. In this paper we address fundamental issues regarding the use of high quality dense 3‐D data samples undergoing motions at video speeds, e.g. human facial expressions. In order to utilize such data for motion analysis and re‐targeting, correspondences must be established between data in different frames of the same faces as well as between different faces. We present a data driven approach that consists of four parts: 1) High speed, high accuracy capture of moving faces without the use of markers, 2) Very precise tracking of facial motion using a multi‐resolution deformable mesh, 3) A unified low dimensional mapping of dynamic facial motion that can separate expression style, and 4) Synthesis of novel expressions as a combination of expression styles. The accuracy and resolution of our method allows us to capture and track subtle expression details. The low dimensional representation of motion data in a unified embedding for all the subjects in the database allows for learning the most discriminating characteristics of each individuals expressions as that persons “expression style”. Thus new expressions can be synthesized, either as dynamic morphing between individuals, or as expression transfer from a source face to a target face, as demonstrated in a series of experiments.


computer vision and pattern recognition | 2003

Using multiple cues for hand tracking and model refinement

Shan Lu; Dimitris N. Metaxas; Dimitris Samaras; John Oliensis

We present a model based approach to the integration of multiple cues for tracking high degree of freedom articulated motions and model refinement. We then apply it to the problem of hand tracking using a single camera sequence. Hand tracking is particularly challenging because of occlusions, shading variations, and the high dimensionality of the motion. The novelty of our approach is in the combination of multiple sources of information, which come from edges, optical flow, and shading information in order to refine the model during tracking. We first use a previously formulated generalized version of the gradient-based optical flow constraint, that includes shading flow i.e., the variation of the shading of the object as it rotates with respect to the light source. Using this model we track its complex articulated motion in the presence of shading changes. We use a forward recursive dynamic model to track the motion in response to data derived 3D forces applied to the model. However, due to inaccurate initial shape, the generalized optical flow constraint is violated. We use the error in the generalized optical flow equation to compute generalized forces that correct the model shape at each step. The effectiveness of our approach is demonstrated with experiments on a number of different hand motions with shading changes, rotations and occlusions of significant parts of the hand.


international conference on computer vision | 2013

Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model

Xiang Yu; Junzhou Huang; Shaoting Zhang; Wang Yan; Dimitris N. Metaxas

This paper addresses the problem of facial landmark localization and tracking from a single camera. We present a two-stage cascaded deformable shape model to effectively and efficiently localize facial landmarks with large head pose variations. For face detection, we propose a group sparse learning method to automatically select the most salient facial landmarks. By introducing 3D face shape model, we use procrustes analysis to achieve pose-free facial landmark initialization. For deformation, the first step uses mean-shift local search with constrained local model to rapidly approach the global optimum. The second step uses component-wise active contours to discriminatively refine the subtle shape variation. Our framework can simultaneously handle face detection, pose-free landmark localization and tracking in real time. Extensive experiments are conducted on both laboratory environmental face databases and face-in-the-wild databases. All results demonstrate that our approach has certain advantages over state-of-the-art methods in handling pose variations.


computer vision and pattern recognition | 2009

Video object segmentation by hypergraph cut

Yuchi Huang; Qingshan Liu; Dimitris N. Metaxas

In this paper, we present a new framework of video object segmentation, in which we formulate the task of extracting prominent objects from a scene as the problem of hypergraph cut. We initially over-segment each frame in the sequence, and take the over-segmented image patches as the vertices in the graph. Different from the traditional pairwise graph structure, we build a novel graph structure, hypergraph, to represent the complex spatio-temporal neighborhood relationship among the patches. We assign each patch with several attributes that are computed from the optical flow and the appearance-based motion profile, and the vertices with the same attribute value is connected by a hyperedge. Through all the hyperedges, not only the complex non-pairwise relationships between the patches are described, but also their merits are integrated together organically. The task of video object segmentation is equivalent to the hypergraph partition, which can be solved by the hypergraph cut algorithm. The effectiveness of the proposed method is demonstrated by extensive experiments on nature scenes.


computer vision and pattern recognition | 2007

Boosting Coded Dynamic Features for Facial Action Units and Facial Expression Recognition

Peng Yang; Qingshan Liu; Dimitris N. Metaxas

It is well known that how to extract dynamical features is a key issue for video based face analysis. In this paper, we present a novel approach of facial action units (AU) and expression recognition based on coded dynamical features. In order to capture the dynamical characteristics of facial events, we design the dynamical haar-like features to represent the temporal variations of facial events. Inspired by the binary pattern coding, we further encode the dynamic haar-like features into binary pattern features, which are useful to construct weak classifiers for boosting learning. Finally the Adaboost is performed to learn a set of discriminating coded dynamic features for facial active units and expression recognition. Experiments on the CMU expression database and our own facial AU database show its encouraging performance.


computer vision and pattern recognition | 2006

Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference

Cristian Sminchisescu; Atul Kanaujia; Dimitris N. Metaxas

We present an algorithm for jointly learning a consistent bidirectional generative-recognition model that combines top-down and bottom-up processing for monocular 3d human motion reconstruction. Learning progresses in alternative stages of self-training that optimize the probability of the image evidence: the recognition model is tunned using samples from the generative model and the generative model is optimized to produce inferences close to the ones predicted by the current recognition model. At equilibrium, the two models are consistent. During on-line inference, we scan the image at multiple locations and predict 3d human poses using the recognition model. But this implicitly includes one-shot generative consistency feedback. The framework provides a uniform treatment of human detection, 3d initialization and 3d recovery from transient failure. Our experimental results show that this procedure is promising for the automatic reconstruction of human motion in more natural scene settings with background clutter and occlusion.

Collaboration


Dive into the Dimitris N. Metaxas's collaboration.

Top Co-Authors

Avatar

Shaoting Zhang

University of North Carolina at Charlotte

View shared research outputs
Top Co-Authors

Avatar

Leon Axel

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Qingshan Liu

Nanjing University of Information Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Junzhou Huang

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge