Mikhail Sizintsev
York University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mikhail Sizintsev.
computer vision and pattern recognition | 2010
Konstantinos G. Derpanis; Mikhail Sizintsev; Kevin J. Cannons; Richard P. Wildes
This paper addresses action spotting, the spatiotemporal detection and localization of human actions in video. A novel compact local descriptor of video dynamics in the context of action spotting is introduced based on visual spacetime oriented energy measurements. This descriptor is efficiently computed directly from raw image intensity data and thereby forgoes the problems typically associated with flow-based features. An important aspect of the descriptor is that it allows for the comparison of the underlying dynamics of two spacetime video segments irrespective of spatial appearance, such as differences induced by clothing, and with robustness to clutter. An associated similarity measure is introduced that admits efficient exhaustive search for an action template across candidate video sequences. Empirical evaluation of the approach on a set of challenging natural videos suggests its efficacy.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013
Konstantinos G. Derpanis; Mikhail Sizintsev; Kevin J. Cannons; Richard P. Wildes
This paper provides a unified framework for the interrelated topics of action spotting, the spatiotemporal detection and localization of human actions in video, and action recognition, the classification of a given video into one of several predefined categories. A novel compact local descriptor of video dynamics in the context of action spotting and recognition is introduced based on visual spacetime oriented energy measurements. This descriptor is efficiently computed directly from raw image intensity data and thereby forgoes the problems typically associated with flow-based features. Importantly, the descriptor allows for the comparison of the underlying dynamics of two spacetime video segments irrespective of spatial appearance, such as differences induced by clothing, and with robustness to clutter. An associated similarity measure is introduced that admits efficient exhaustive search for an action template, derived from a single exemplar video, across candidate video sequences. The general approach presented for action spotting and recognition is amenable to efficient implementation, which is deemed critical for many important applications. For action spotting, details of a real-time GPU-based instantiation of the proposed approach are provided. Empirical evaluation of both action spotting and action recognition on challenging datasets suggests the efficacy of the proposed approach, with state-of-the-art performance documented on standard datasets.
computer vision and pattern recognition | 2008
Mikhail Sizintsev; Konstantinos G. Derpanis
Histograms represent a popular means for feature representation. This paper is concerned with the problem of exhaustive histogram-based image search. Several standard histogram construction methods are explored, including the conventional approach, Huangpsilas method, and the state-of-the-art integral histogram. In addition, we present a novel multiscale histogram-based search algorithm, termed the distributive histogram, that can be evaluated exhaustively in a fast and memory efficient manner. An extensive systematic empirical evaluation is presented that explores the computational and storage consequences of altering the search image and histogram bin sizes. Experiments reveal up to an eight-fold decrease in computation time and hundreds- to thousands-fold decrease of memory use of the proposed distributive histogram in comparison to the integral histogram. Finally, we conclude with a discussion on the relative merits between the various approaches considered in the paper.
International Journal of Computer Vision | 2007
James H. Elder; Simon J. D. Prince; Yuqian Hou; Mikhail Sizintsev; E. Olevskiy
We address the problem of localizing and obtaining high-resolution footage of the people present in a scene. We propose a biologically-inspired solution combining pre-attentive, low-resolution sensing for detection with shiftable, high-resolution, attentive sensing for confirmation and further analysis.The detection problem is made difficult by the unconstrained nature of realistic environments and human behaviour, and the low resolution of pre-attentive sensing. Analysis of human peripheral vision suggests a solution based on integration of relatively simple but complementary cues. We develop a Bayesian approach involving layered probabilistic modeling and spatial integration using a flexible norm that maximizes the statistical power of both dense and sparse cues. We compare the statistical power of several cues and demonstrate the advantage of cue integration. We evaluate the Bayesian cue integration method for human detection on a labelled surveillance database and find that it outperforms several competing methods based on conjunctive combinations of classifiers (e.g., Adaboost). We have developed a real-time version of our pre-attentive human activity sensor that generates saccadic targets for an attentive foveated vision system. Output from high-resolution attentive detection algorithms and gaze state parameters are fed back as statistical priors and combined with pre-attentive cues to determine saccadic behaviour. The result is a closed-loop system that fixates faces over a 130 deg field of view, allowing high-resolution capture of facial video over a large dynamic scene.
Image and Vision Computing | 2010
Mikhail Sizintsev; Richard P. Wildes
This paper presents methods for efficient recovery of accurate binocular disparity estimates in the vicinity of 3D surface discontinuities. Of particular concern are methods that impact coarse-to-fine, local block-based matching as it forms the basis of the fastest and the most resource efficient stereo computation procedures. A novel coarse-to-fine refinement procedure that adapts match window support across scale to ameliorate corruption of disparity estimates near boundaries is presented. Extensions are included to account for half-occlusions and colour uniformity. Empirical results show that incorporation of these advances in the standard coarse-to-fine, block matching framework reduces disparity errors by more than a factor of two, while performing little extra computation, preserving low complexity and the parallel/pipeline nature of the framework. Moreover, the proposed advances prove to be beneficial for CTF global matchers as well.
computer vision and pattern recognition | 2009
Mikhail Sizintsev; Richard P. Wildes
Spatiotemporal stereo is concerned with the recovery of the 3D structure of a dynamic scene from a temporal sequence of multiview images. This paper presents a novel method for computing temporally coherent disparity maps from a sequence of binocular images through an integrated consideration of image spacetime structure and without explicit recovery of motion. The approach is based on matching spatiotemporal quadric elements (stequels) between views, as it is shown that this matching primitive provides a natural way to encapsulate both local spatial and temporal structure for disparity estimation. Empirical evaluation with laboratory based imagery with ground truth and more typical natural imagery shows that the approach provides considerable benefit in comparison to alternative methods for enforcing temporal coherence in disparity estimation.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012
Mikhail Sizintsev; Richard P. Wildes
This paper is concerned with the recovery of temporally coherent estimates of 3D structure and motion of a dynamic scene from a sequence of binocular stereo images. A novel approach is presented based on matching of spatiotemporal quadric elements (stequels) between views, as this primitive encapsulates both spatial and temporal image structure for 3D estimation. Match constraints are developed for bringing stequels into correspondence across binocular views. With correspondence established, temporally coherent disparity estimates are obtained without explicit motion recovery. Further, the matched stequels also will be shown to support direct recovery of scene flow estimates. Extensive algorithmic evaluation with ground truth data incorporated in both local and global correspondence paradigms shows the considerable benefit of using stequels as a matching primitive and its advantages in comparison to alternative methods of enforcing temporal coherence in disparity estimation. Additional experiments document the usefulness of stequel matching for 3D scene flow estimation.
canadian conference on computer and robot vision | 2008
Mikhail Sizintsev
Dense stereo algorithms rely on matching over a range of disparities. To speed up the search and reduce match ambiguity, processing can be embedded in the hierarchical, or coarse-to-fine (CTF), framework using image pyramids. However, this technique is limited when resolving thin structures, as they are poorly represented at coarser scales. In this paper we exploit alternative pyramid and search space techniques. We propose matching with the Magnitude-extended Laplacian Pyramid (MeLP) - a generalization of the Laplacian pyramid that explicitly encodes the energy magnitude component of the band-passed images. In essence, MeLP effectively encodes fine scale details in low resolution images, which allows for accurate recovery of thin structures during CTF processing. Furthermore, transparencies can be resolved for common cases when spatial frequency structure is locally different for each layer. Algorithmic instantiations for local block matching and global Graph Cuts formulations are presented. Extensive experimental evaluation demonstrates the benefits of the proposed techniques.
british machine vision conference | 2006
Mikhail Sizintsev; Richard P. Wildes
This paper presents methods for recovering accurate binocular disparity estimates in the vicinity of 3D surface discontinuities. Of particular concern are methods that impact coarse-to-fine, block matching as it forms the basis of the fastest and resource efficient disparity estimation procedures. Two advances are put forth. First, a novel approach to coarse-to-fine processing is presented that adapts match window support across scale to ameliorate corruption of disparity estimates near 3D boundaries. Second, a novel formulation of half-occlusion cues within the coarse-to-fine, block matching framework is described to inhibit false matches that can arise in regions near occlusions. Empirical results show that incorporation of these advances in coarse-to-fine, block matching reduces disparity errors by more than a factor of two, while performing little extra computation.
international conference on image processing | 2007
Konstantinos G. Derpanis; Erich T. H. Leung; Mikhail Sizintsev
In this paper, we place the integral image-based approach for multi-scale feature construction, popularized by Viola and Jones, into a common framework of understanding. The integral image within this framework represents space variant image filtering with the zero-order B-spline. Given this framework, we propose efficiently computable higher-order B-spline image features based on generalized integral images that have the potential to be more accurate, yet efficient as compared to previous integral image-based efforts.