Is this you? Create Your Porfile

Michalis Raptis

University of California, Los Angeles

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michalis Raptis is active.

Explore More

Publication

Featured researches published by Michalis Raptis.

symposium on computer animation | 2011

Real-time classification of dance gestures from skeleton animation

Michalis Raptis; Darko Kirovski; Hugues Hoppe

We present a real-time gesture classification system for skeletal wireframe motion. Its key components include an angular representation of the skeleton designed for recognition robustness under noisy input, a cascaded correlation-based classifier for multivariate time-series data, and a distance metric based on dynamic time-warping to evaluate the difference in motion between an acquired gesture and an oracle for the matching gesture. While the first and last tools are generic in nature and could be applied to any gesture-matching scenario, the classifier is conceived based on the assumption that the input motion adheres to a known, canonical time-base: a musical beat. On a benchmark comprising 28 gesture classes, hundreds of gesture instances recorded using the XBOX Kinect platform and performed by dozens of subjects for each gesture class, our classifier has an average accuracy of 96:9%, for approximately 4-second skeletal motion recordings. This accuracy is remarkable given the input noise from the real-time depth sensor.

computer vision and pattern recognition | 2012

Discovering discriminative action parts from mid-level video representations

Michalis Raptis; Iasonas Kokkinos; Stefano Soatto

We describe a mid-level approach for action recognition. From an input video, we extract salient spatio-temporal structures by forming clusters of trajectories that serve as candidates for the parts of an action. The assembly of these clusters into an action class is governed by a graphical model that incorporates appearance and motion constraints for the individual parts and pairwise constraints for the spatio-temporal dependencies among them. During training, we estimate the model parameters discriminatively. During classification, we efficiently match the model to a video using discrete optimization. We validate the models classification ability in standard benchmark datasets and illustrate its potential to support a fine-grained analysis that not only gives a label to a video, but also identifies and localizes its constituent parts.

computer vision and pattern recognition | 2013

Poselet Key-Framing: A Model for Human Activity Recognition

Michalis Raptis; Leonid Sigal

In this paper, we develop a new model for recognizing human actions. An action is modeled as a very sparse sequence of temporally local discriminative key frames - collections of partial key-poses of the actor(s), depicting key states in the action sequence. We cast the learning of key frames in a max-margin discriminative framework, where we treat key frames as latent variables. This allows us to (jointly) learn a set of most discriminative key frames while also learning the local temporal context between them. Key frames are encoded using a spatially-localizable pose let-like representation with HoG and BoW components learned from weak annotations, we rely on structured SVM formulation to align our components and mine for hard negatives to boost localization performance. This results in a model that supports spatio-temporal localization and is insensitive to dropped frames or partial observations. We show classification performance that is competitive with the state of the art on the benchmark UT-Interaction dataset and illustrate that our model outperforms prior methods in an on-line streaming setting.

International Journal of Computer Vision | 2012

Sparse Occlusion Detection with Optical Flow

Alper Ayvaci; Michalis Raptis; Stefano Soatto

We tackle the problem of detecting occluded regions in a video stream. Under assumptions of Lambertian reflection and static illumination, the task can be posed as a variational optimization problem, and its solution approximated using convex minimization. We describe efficient numerical schemes that reach the global optimum of the relaxed cost functional, for any number of independently moving objects, and any number of occlusion layers. We test the proposed algorithm on benchmark datasets, expanded to enable evaluation of occlusion detection performance, in addition to optical flow.

european conference on computer vision | 2012

No bias left behind: covariate shift adaptation for discriminative 3d pose estimation

Makoto Yamada; Leonid Sigal; Michalis Raptis

Discriminative, or (structured) prediction, methods have proved effective for variety of problems in computer vision; a notable example is 3D monocular pose estimation. All methods to date, however, relied on an assumption that training (source) and test (target) data come from the same underlying joint distribution. In many real cases, including standard datasets, this assumption is flawed. In presence of training set bias, the learning results in a biased model whose performance degrades on the (target) test set. Under the assumption of covariate shift we propose an unsupervised domain adaptation approach to address this problem. The approach takes the form of training instance re-weighting, where the weights are assigned based on the ratio of training and test marginals evaluated at the samples. Learning with the resulting weighted training samples, alleviates the bias in the learned models. We show the efficacy of our approach by proposing weighted variants of Kernel Regression (KR) and Twin Gaussian Processes (TGP). We show that our weighted variants outperform their un-weighted counterparts and improve on the state-of-the-art performance in the public (HumanEva) dataset.

computer vision and pattern recognition | 2010

Spike train driven dynamical models for human actions

Michalis Raptis; Kamil Wnuk; Stefano Soatto

We investigate dynamical models of human motion that can support both synthesis and analysis tasks. Unlike coarser discriminative models that work well when action classes are nicely separated, we seek models that have fine-scale representational power and can therefore model subtle differences in the way an action is performed. To this end, we model an observed action as an (unknown) linear time-invariant dynamical model of relatively small order, driven by a sparse bounded input signal. Our motivating intuition is that the time-invariant dynamics will capture the unchanging physical characteristics of an actor, while the inputs used to excite the system will correspond to a causal signature of the action being performed. We show that our model has sufficient representational power to closely approximate large classes of non-stationary actions with significantly reduced complexity. We also show that temporal statistics of the inferred input sequences can be compared in order to recognize actions and detect transitions between them.

international conference on computer vision | 2012

SuperFloxels: a mid-level representation for video sequences

Avinash Ravichandran; Chaohui Wang; Michalis Raptis; Stefano Soatto

We describe an approach for grouping trajectories extracted from a video that preserves motion discontinuities due, for instance, to occlusions, but not color or intensity boundaries. Our method takes as input trajectories with variable length and onset time, and outputs a membership function as well as an indicator function denoting the exemplar trajectory of each group. This can be used for several applications such as compression, segmentation, and background removal.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2015

Cross-Domain Matching with Squared-Loss Mutual Information

Makoto Yamada; Leonid Sigal; Michalis Raptis; Machiko Toyoda; Yi Chang; Masashi Sugiyama

The goal of cross-domain matching (CDM) is to find correspondences between two sets of objects in different domains in an unsupervised way. CDM has various interesting applications, including photo album summarization where photos are automatically aligned into a designed frame expressed in the Cartesian coordinate system, and temporal alignment which aligns sequences such as videos that are potentially expressed using different features. In this paper, we propose an information-theoretic CDM framework based on squared-loss mutual information (SMI). The proposed approach can directly handle non-linearly related objects/sequences with different dimensions, with the ability that hyper-parameters can be objectively optimized by cross-validation. We apply the proposed method to several real-world problems including image matching, unpaired voice conversion, photo album summarization, cross-feature video and cross-domain video-to-mocap alignment, and Kinect-based action recognition, and experimentally demonstrate that the proposed method is a promising alternative to state-of-the-art CDM methods.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Covariate Shift Adaptation for Discriminative 3D Pose Estimation

Makoto Yamada; Leonid Sigal; Michalis Raptis

Discriminative, or (structured) prediction, methods have proved effective for variety of problems in computer vision; a notable example is 3D monocular pose estimation. All methods to date, however, relied on an assumption that training (source) and test (target) data come from the same underlying joint distribution. In many real cases, including standard data sets, this assumption is flawed. In the presence of training set bias, the learning results in a biased model whose performance degrades on the (target) test set. Under the assumption of covariate shift, we propose an unsupervised domain adaptation approach to address this problem. The approach takes the form of training instance reweighting, where the weights are assigned based on the ratio of training and test marginals evaluated at the samples. Learning with the resulting weighted training samples alleviates the bias in the learned models. We show the efficacy of our approach by proposing weighted variants of kernel regression (KR) and twin Gaussian processes (TGP). We show that our weighted variants outperform their unweighted counterparts and improve on the state-of-the-art performance in the public (HumanEva) data set.

european conference on computer vision | 2010