Jonathan Alon
Boston University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jonathan Alon.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2009
Jonathan Alon; Vassilis Athitsos; Quan Yuan; Stan Sclaroff
Within the context of hand gesture recognition, spatiotemporal gesture segmentation is the task of determining, in a video sequence, where the gesturing hand is located and when the gesture starts and ends. Existing gesture recognition methods typically assume either known spatial segmentation or known temporal segmentation, or both. This paper introduces a unified framework for simultaneously performing spatial segmentation, temporal segmentation, and recognition. In the proposed framework, information flows both bottom-up and top-down. A gesture can be recognized even when the hand location is highly ambiguous and when information about when the gesture begins and ends is unavailable. Thus, the method can be applied to continuous image streams where gestures are performed in front of moving, cluttered backgrounds. The proposed method consists of three novel contributions: a spatiotemporal matching algorithm that can accommodate multiple candidate hand detections in every frame, a classifier-based pruning framework that enables accurate and early rejection of poor matches to gesture models, and a subgesture reasoning algorithm that learns which gesture models can falsely match parts of other longer gestures. The performance of the approach is evaluated on two challenging applications: recognition of hand-signed digits gestured by users wearing short-sleeved shirts, in front of a cluttered background, and retrieval of occurrences of signs of interest in a video database containing continuous, unsegmented signing in American sign language (ASL).
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008
Vassilis Athitsos; Jonathan Alon; Stan Sclaroff; George Kollios
This paper describes BoostMap, a method for efficient nearest neighbor retrieval under computationally expensive distance measures. Database and query objects are embedded into a vector space in which distances can be measured efficiently. Each embedding is treated as a classifier that predicts for any three objects X, A, B whether X is closer to A or to B. It is shown that a linear combination of such embedding-based classifiers naturally corresponds to an embedding and a distance measure. Based on this property, the BoostMap method reduces the problem of embedding construction to the classical boosting problem of combining many weak classifiers into an optimized strong classifier. The classification accuracy of the resulting strong classifier is a direct measure of the amount of nearest neighbor structure preserved by the embedding. An important property of BoostMap is that the embedding optimization criterion is equally valid in both metric and nonmetric spaces. Performance is evaluated in databases of hand images, handwritten digits, and time series. In all cases, BoostMap significantly improves retrieval efficiency with small losses in accuracy compared to brute-force search. Moreover, BoostMap significantly outperforms existing nearest neighbor retrieval methods such as Lipschitz embeddings, FastMap, and VP-trees.
workshop on applications of computer vision | 2005
Jonathan Alon; Vassilis Athitsos; Quan Yuan; Stan Sclaroff
A method for the simultaneous localization and recognition of dynamic hand gestures is proposed. At the core of this method is a dynamic space-time warping (DSTW) algorithm, that aligns a pair of query and model gestures in both space and time. For every frame of the query sequence, feature detectors generate multiple hand region candidates. Dynamic programming is then used to compute both a global matching cost, which is used to recognize the query gesture, and a warping path, which aligns the query and model sequences in time, and also finds the best hand candidate region in every query frame. The proposed framework includes translation invariant recognition of gestures, a desirable property for many HCI systems. The performance of the approach is evaluated on a dataset of hand signed digits gestured by people wearing short sleeve shirts, in front of a background containing other non-hand skin-colored objects. The algorithm simultaneously localizes the gesturing hand and recognizes the hand-signed digit. Although DSTW is illustrated in a gesture recognition setting, the proposed algorithm is a general method for matching time series, that allows for multiple candidate feature vectors to be extracted at each time step.
computer vision and pattern recognition | 2005
Vassilis Athitsos; Jonathan Alon; Stan Sclaroff
This paper proposes a method for efficient nearest neighbor classification in non-Euclidean spaces with computationally expensive similarity/distance measures. Efficient approximations of such measures are obtained using the BoostMap algorithm, which produces embeddings into a real vector space. A modification to the BoostMap algorithm is proposed, which uses an optimization cost that is more appropriate when our goal is classification accuracy as opposed to nearest neighbor retrieval accuracy. Using the modified algorithm, multiple approximate nearest neighbor classifiers are obtained, that provide a wide range of trade-offs between accuracy and efficiency. The approximations are automatically combined to form a cascade classifier, which applies the slower and more accurate approximations only to the hardest cases. The proposed method is experimentally evaluated in the domain of handwritten digit recognition using shape context matching. Results on the MNIST database indicate that a speed-up of two to three orders of magnitude is gained over brute force search, with minimal losses in classification accuracy.
international conference on computer vision | 2005
Jonathan Alon; Vassilis Athitsos; Stan Sclaroff
Gesture spotting is the challenging task of locating the start and end frames of the video stream that correspond to a gesture of interest, while at the same time rejecting non-gesture motion patterns. This paper proposes a new gesture spotting and recognition algorithm that is based on the continuous dynamic programming (CDP) algorithm, and runs in real-time. To make gesture spotting efficient a pruning method is proposed that allows the system to evaluate a relatively small number of hypotheses compared to CDP. Pruning is implemented by a set of model-dependent classifiers, that are learned from training examples. To make gesture spotting more accurate a subgesture reasoning process is proposed that models the fact that some gesture models can falsely match parts of other longer gestures. In our experiments, the proposed method with pruning and subgesture modeling is an order of magnitude faster and 18% more accurate compared to the original CDP algorithm.
pervasive technologies related to assistive environments | 2008
Alexandra Stefan; Vassilis Athitsos; Jonathan Alon; Stan Sclaroff
Gestures are a natural means of communication between humans, and also a natural modality for human-computer interaction. Automatic recognition of gestures using computer vision is an important task in many real-world applications, such as sign language recognition, computer games control, virtual reality, intelligent homes, and assistive environments. In order for a gesture recognition system to be robust and deployable in non-laboratory settings, the system needs to be able to operate in complex scenes, with complicated backgrounds and multiple moving and skin-colored objects. In this paper we propose an approach for improving gesture recognition performance in such complex environments. The key idea is to integrate a face detection module into the gesture recognition system, and use the face location and size to make gesture recognition invariant to scale and translation. Our experiments demonstrate the significant advantages of the proposed method over alternative computer vision methods for gesture recognition.
computer vision and pattern recognition | 2000
Jonathan Alon; Stan Sclaroff
A specialized formulation of Azarbayejani and Pentlands (1995) framework for recursive recovery of motion, structure and focal length from feature correspondences tracked through an image sequence is presented. The specialized formulation addresses the case where all tracked points lie on a plane. This planarity constraint reduces the dimension of the original state vector, and consequently the number of feature points needed to estimate the state. Experiments with synthetic data and real imagery illustrate the system performance. The experiments confirm that the specialized formulation provides improved accuracy, stability to observation noise, and rate of convergence in estimation for the case where the tracked points lie on a plane.
international conference on document analysis and recognition | 2005
Jonathan Alon; Vassilis Athitsos; Stan Sclaroff
Nearest neighbor classifiers are simple to implement, yet they can model complex non-parametric distributions, and provide state-of-the-art recognition accuracy in OCR databases. At the same time, they may be too slow for practical character recognition, especially when they rely on similarity measures that require computationally expensive pair-wise alignments between characters. This paper proposes an efficient method for computing an approximate similarity score between two characters based on their exact alignment to a small number of prototypes. The proposed method is applied to both online and offline character recognition, where similarity is based on widely used and computationally expensive alignment methods, i.e., dynamic time warping and the Hungarian method respectively. In both cases significant recognition speedup is obtained at the expense of only a minor increase in recognition error.
international conference on document analysis and recognition | 2005
Stan Sclaroff; Margrit Betke; George Kollios; Jonathan Alon; Vassilis Athitsos; Rui Li; John J. Magee; Tai-Peng Tian
An overview of research in automated gesture spotting, tracking and recognition by the Image and Video Computing Group at Boston University is given. Approaches for localization and tracking human hands in video, estimation of hand shape and upper body pose; tracking head and facial motion, as well as efficient spotting and recognition of specific gestures in video streams are summarized. Methods for efficient dimensionality reduction of gesture time series, boosting of classifiers for nearest neighbor search in pose space, and model-based pruning of gesture alignment hypotheses are described. Algorithms are demonstrated in three domains: American sign language, hand signals like those employed by flight-directors on airport runways, and gesture-based interfaces for severely disabled users. The methods described are general and can be applied in other domains that require efficient detection and analysis of patterns in time-series, images or video.
Proceedings of SPIE | 1999
Jonathan Alon; Stan Sclaroff
We present a framework for estimating 3D relative structure (shape) and motion given objects undergoing non-rigid deformation as observed from a fixed camera, under perspective projection. Deforming surfaces are approximated as piece-wise planar, and piece-wise rigid. Robust registration methods allow tracking of corresponding image patches from view to view and recovery of 3D shape despite occlusions, discontinuities, and varying illumination conditions. Many relatively small planar/rigid image patch trackers are scattered throughout the image; resulting estimates of structure and motion at each patch are combined over local neighborhoods via an oriented particle systems formulation. Preliminary experiments have been conducted on real image sequences of deforming objects and on synthetic sequences where ground truth is known.