Konstantinos G. Derpanis
Ryerson University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Konstantinos G. Derpanis.
computer vision and pattern recognition | 2010
Konstantinos G. Derpanis; Mikhail Sizintsev; Kevin J. Cannons; Richard P. Wildes
This paper addresses action spotting, the spatiotemporal detection and localization of human actions in video. A novel compact local descriptor of video dynamics in the context of action spotting is introduced based on visual spacetime oriented energy measurements. This descriptor is efficiently computed directly from raw image intensity data and thereby forgoes the problems typically associated with flow-based features. An important aspect of the descriptor is that it allows for the comparison of the underlying dynamics of two spacetime video segments irrespective of spatial appearance, such as differences induced by clothing, and with robustness to clutter. An associated similarity measure is introduced that admits efficient exhaustive search for an action template across candidate video sequences. Empirical evaluation of the approach on a set of challenging natural videos suggests its efficacy.
computer vision and pattern recognition | 2016
Xiaowei Zhou; Menglong Zhu; Spyridon Leonardos; Konstantinos G. Derpanis; Kostas Daniilidis
This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence. Here, two cases are considered: (i) the image locations of the human joints are provided and (ii) the image locations of joints are unknown. In the former case, a novel approach is introduced that integrates a sparsity-driven 3D geometric prior and temporal smoothness. In the latter case, the former case is extended by treating the image locations of the joints as latent variables to take into account considerable uncertainties in 2D joint locations. A deep fully convolutional network is trained to predict the uncertainty maps of the 2D joint locations. The 3D pose estimates are realized via an Expectation-Maximization algorithm over the entire sequence, where it is shown that the 2D joint location uncertainties can be conveniently marginalized out during inference. Empirical evaluation on the Human3.6M dataset shows that the proposed approaches achieve greater 3D pose estimation accuracy over state-of-the-art baselines. Further, the proposed approach outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset.
international conference on image processing | 2005
Konstantinos G. Derpanis; Jacob M. Gryn
This paper details the construction of three-dimensional separable steerable filters. The approach presented is an extension of the construction of two-dimensional separable steerable filters outlined in W.T. Freeman and E.H. Adelson (1991). Additionally, three-dimensional separable steerable filters, both continuous and discrete versions, for the second derivative of the Gaussian and its Hilbert transform are reported. Experimental evaluation demonstrates that the errors in the constructed separable filters are negligible.
computer vision and pattern recognition | 2012
Konstantinos G. Derpanis; Matthieu Lecce; Kostas Daniilidis; Richard P. Wildes
Natural scene classification is a fundamental challenge in computer vision. By far, the majority of studies have limited their scope to scenes from single image stills and thereby ignore potentially informative temporal cues. The current paper is concerned with determining the degree of performance gain in considering short videos for recognizing natural scenes. Towards this end, the impact of multiscale orientation measurements on scene classification is systematically investigated, as related to: (i) spatial appearance, (ii) temporal dynamics and (iii) joint spatial appearance and dynamics. These measurements in visual space, x-y, and spacetime, x-y-t, are recovered by a bank of spatiotemporal oriented energy filters. In addition, a new data set is introduced that contains 420 image sequences spanning fourteen scene categories, with temporal scene information due to objects and surfaces decoupled from camera-induced ones. This data set is used to evaluate classification performance of the various orientation-related representations, as well as state-of-the-art alternatives. It is shown that a notable performance increase is realized by spatiotemporal approaches in comparison to purely spatial or purely temporal methods.
computer vision and pattern recognition | 2010
Konstantinos G. Derpanis; Richard P. Wildes
This paper addresses the challenge of recognizing dynamic textures based on their observed visual dynamics. Typically, the term dynamic texture is used with reference to image sequences of various natural processes that exhibit stochastic dynamics (e.g., smoke, water and windblown vegetation); although, it applies equally well to images of simpler dynamics when analyzed in terms of aggregate region properties (e.g., uniform motion of elements in traffic video). In this paper, a novel approach to dynamic texture representation and an associated recognition method are proposed. The approach pursued here recognizes dynamic textures based on matching distributions (histograms) of spacetime orientation structure. Empirical evaluation on a standard database with controls to remove the effects of identical viewpoint demonstrates that the proposed approach achieves superior performance over alternative state-of-the-art methods.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012
Konstantinos G. Derpanis; Richard P. Wildes
This paper is concerned with the representation and recognition of the observed dynamics (i.e., excluding purely spatial appearance cues) of spacetime texture based on a spatiotemporal orientation analysis. The term “spacetime texture” is taken to refer to patterns in visual spacetime, (x,y,t), that primarily are characterized by the aggregate dynamic properties of elements or local measurements accumulated over a region of spatiotemporal support, rather than in terms of the dynamics of individual constituents. Examples include image sequences of natural processes that exhibit stochastic dynamics (e.g., fire, water, and windblown vegetation) as well as images of simpler dynamics when analyzed in terms of aggregate region properties (e.g., uniform motion of elements in imagery, such as pedestrians and vehicular traffic). Spacetime texture representation and recognition is important as it provides an early means of capturing the structure of an ensuing image stream in a meaningful fashion. Toward such ends, a novel approach to spacetime texture representation and an associated recognition method are described based on distributions (histograms) of spacetime orientation structure. Empirical evaluation on both standard and original image data sets shows the promise of the approach, including significant improvement over alternative state-of-the-art approaches in recognizing the same pattern from different viewpoints.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013
Konstantinos G. Derpanis; Mikhail Sizintsev; Kevin J. Cannons; Richard P. Wildes
This paper provides a unified framework for the interrelated topics of action spotting, the spatiotemporal detection and localization of human actions in video, and action recognition, the classification of a given video into one of several predefined categories. A novel compact local descriptor of video dynamics in the context of action spotting and recognition is introduced based on visual spacetime oriented energy measurements. This descriptor is efficiently computed directly from raw image intensity data and thereby forgoes the problems typically associated with flow-based features. Importantly, the descriptor allows for the comparison of the underlying dynamics of two spacetime video segments irrespective of spatial appearance, such as differences induced by clothing, and with robustness to clutter. An associated similarity measure is introduced that admits efficient exhaustive search for an action template, derived from a single exemplar video, across candidate video sequences. The general approach presented for action spotting and recognition is amenable to efficient implementation, which is deemed critical for many important applications. For action spotting, details of a real-time GPU-based instantiation of the proposed approach are provided. Empirical evaluation of both action spotting and action recognition on challenging datasets suggests the efficacy of the proposed approach, with state-of-the-art performance documented on standard datasets.
international conference on computer vision | 2013
Weiyu Zhang; Menglong Zhu; Konstantinos G. Derpanis
This paper presents a novel approach for analyzing human actions in non-scripted, unconstrained video settings based on volumetric, x-y-t, patch classifiers, termed actemes. Unlike previous action-related work, the discovery of patch classifiers is posed as a strongly-supervised process. Specifically, key point labels (e.g., position) across space time are used in a data-driven training process to discover patches that are highly clustered in the space time key point configuration space. To support this process, a new human action dataset consisting of challenging consumer videos is introduced, where notably the action label, the 2D position of a set of key points and their visibilities are provided for each video frame. On a novel input video, each acteme is used in a sliding volume scheme to yield a set of sparse, non-overlapping detections. These detections provide the intermediate substrate for segmenting out the action. For action classification, the proposed representation shows significant improvement over state-of-the-art low-level features, while providing spatiotemporal localization as additional output, which sheds further light into detailed action understanding.
computer vision and pattern recognition | 2008
Mikhail Sizintsev; Konstantinos G. Derpanis
Histograms represent a popular means for feature representation. This paper is concerned with the problem of exhaustive histogram-based image search. Several standard histogram construction methods are explored, including the conventional approach, Huangpsilas method, and the state-of-the-art integral histogram. In addition, we present a novel multiscale histogram-based search algorithm, termed the distributive histogram, that can be evaluated exhaustively in a fast and memory efficient manner. An extensive systematic empirical evaluation is presented that explores the computational and storage consequences of altering the search image and histogram bin sizes. Experiments reveal up to an eight-fold decrease in computation time and hundreds- to thousands-fold decrease of memory use of the proposed distributive histogram in comparison to the integral histogram. Finally, we conclude with a discussion on the relative merits between the various approaches considered in the paper.
european conference on computer vision | 2004
Konstantinos G. Derpanis; Richard P. Wildes; John K. Tsotsos
An approach to recognizing hand gestures from a monocular temporal sequence of images is presented. Of particular concern is the representation and recognition of hand movements that are used in single handed American Sign Language (ASL). The approach exploits previous linguistic analysis of manual languages that decompose dynamic gestures into their static and dynamic components. The first level of decomposition is in terms of three sets of primitives, hand shape, location and movement. Further levels of decomposition involve the lexical and sentence levels and are part of our plan for future work. We propose and demonstrate that given a monocular gesture sequence, kinematic features can be recovered from the apparent motion that provide distinctive signatures for 14 primitive movements of ASL. The approach has been implemented in software and evaluated on a database of 592 gesture sequences with an overall recognition rate of 86.00% for fully automated processing and 97.13% for manually initialized processing.