Vassilis Athitsos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vassilis Athitsos is active.

Explore More

Publication

Featured researches published by Vassilis Athitsos.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2000

Fast, reliable head tracking under varying illumination: an approach based on registration of texture-mapped 3D models

M. La Cascia; Stan Sclaroff; Vassilis Athitsos

An improved technique for 3D head tracking under varying illumination conditions is proposed. The head is modeled as a texture mapped cylinder. Tracking is formulated as an image registration problem in the cylinders texture map image. The resulting dynamic texture map provides a stabilized view of the face that can be used as input to many existing 2D techniques for face recognition, facial expressions analysis, lip reading, and eye tracking. To solve the registration problem in the presence of lighting variation and head motion, the residual error of registration is modeled as a linear combination of texture warping templates and orthogonal illumination templates. Fast and stable on-line tracking is achieved via regularized, weighted least squares minimization of the registration error. The regularization term tends to limit potential ambiguities that arise in the warping and illumination templates. It enables stable tracking over extended sequences. Tracking does not require a precise initial fit of the model; the system is initialized automatically using a simple 2D face detector. The only assumption is that the target is facing the camera in the first frame of the sequence. The formulation is tailored to take advantage of texture mapping hardware available in many workstations, PCs, and game consoles. The non-optimized implementation runs at about 15 frames per second on a SGI O2 graphic workstation. Extensive experiments evaluating the effectiveness of the formulation are reported. The sensitivity of the technique to illumination, regularization parameters, errors in the initial positioning and internal camera parameters are analyzed. Examples and applications of tracking are reported.

computer vision and pattern recognition | 2003

Estimating 3D hand pose from a cluttered image

Vassilis Athitsos; Stan Sclaroff

A method is proposed that can generate a ranked list of plausible three-dimensional hand configurations that best match an input image. Hand pose estimation is formulated as an image database indexing problem, where the closest matches for an input hand image are retrieved from a large database of synthetic hand images. In contrast to previous approaches, the system can function in the presence of clutter, thanks to two novel clutter-tolerant indexing methods. First, a computationally efficient approximation of the image-to-model chamfer distance is obtained by embedding binary edge images into a high-dimensional Euclidean space. Second, a general-purpose, probabilistic line matching method identifies those line segment correspondences between model and input images that are the least likely to have occurred by chance. The performance of this clutter tolerant approach is demonstrated in quantitative experiments with hundreds of real hand images.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2009

A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation

Jonathan Alon; Vassilis Athitsos; Quan Yuan; Stan Sclaroff

Within the context of hand gesture recognition, spatiotemporal gesture segmentation is the task of determining, in a video sequence, where the gesturing hand is located and when the gesture starts and ends. Existing gesture recognition methods typically assume either known spatial segmentation or known temporal segmentation, or both. This paper introduces a unified framework for simultaneously performing spatial segmentation, temporal segmentation, and recognition. In the proposed framework, information flows both bottom-up and top-down. A gesture can be recognized even when the hand location is highly ambiguous and when information about when the gesture begins and ends is unavailable. Thus, the method can be applied to continuous image streams where gestures are performed in front of moving, cluttered backgrounds. The proposed method consists of three novel contributions: a spatiotemporal matching algorithm that can accommodate multiple candidate hand detections in every frame, a classifier-based pruning framework that enables accurate and early rejection of poor matches to gesture models, and a subgesture reasoning algorithm that learns which gesture models can falsely match parts of other longer gestures. The performance of the approach is evaluated on two challenging applications: recognition of hand-signed digits gestured by users wearing short-sleeved shirts, in front of a cluttered background, and retrieval of occurrences of signs of interest in a video database containing continuous, unsegmented signing in American sign language (ASL).

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2004

Skin color-based video segmentation under time-varying illumination

Leonid Sigal; Stan Sclaroff; Vassilis Athitsos

A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin-color (HSV) histogram over time. Histograms are dynamically updated based on feedback from the current segmentation and predictions of the Markov model. The evolution of the skin-color distribution at each frame is parameterized by translation, scaling, and rotation in color space. Consequent changes in geometric parameterization of the distribution are propagated by warping and resampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using maximum likelihood estimation and also evolve over time. The accuracy of the new dynamic skin color segmentation algorithm is compared to that obtained via a static color model. Segmentation accuracy is evaluated using labeled ground-truth video sequences taken from staged experiments and popular movies. An overall increase in segmentation accuracy of up to 24 percent is observed in 17 out of 21 test sequences. In all but one case, the skin-color classification rates for our system were higher, with background classification rates comparable to those of the static segmentation.

international conference on computer vision | 2001

3D hand pose reconstruction using specialized mappings

Rómer Rosales; Vassilis Athitsos; Leonid Sigal; Stan Sclaroff

A system for recovering 3D hand pose from monocular color sequences is proposed. The system employs a non-linear supervised learning framework, the specialized mappings architecture (SMA), to map image features to likely 3D hand poses. The SMAs fundamental components are a set of specialized forward mapping functions, and a single feedback matching function. The forward functions are estimated directly from training data, which in our case are examples of hand joint configurations and their corresponding visual features. The joint angle data in the training set is obtained via a CyberGlove, a glove with 22 sensors that monitor the angular motions of the palm and fingers. In training, the visual features are generated using a computer graphics module that renders the hand from arbitrary viewpoints given the 22 joint angles. The viewpoint is encoded by two real values, therefore 24 real values represent a hand pose. We test our system both on synthetic sequences and on sequences taken with a color camera. The system automatically detects and tracks both bands of the user, calculates the appropriate features, and estimates the 3D hand joint angles and viewpoint from those features. Results are encouraging given the complexity of the task.

ieee international conference on automatic face and gesture recognition | 2002

An appearance-based framework for 3D hand shape classification and camera viewpoint estimation

Vassilis Athitsos; Stan Sclaroff

An appearance-based framework for 3D hand shape classification and simultaneous camera viewpoint estimation is presented. Given an input image of a segmented hand, the most similar matches from a large database of synthetic hand images are retrieved. The ground-truth labels of those matches, containing hand-shape and camera-viewpoint information, are returned by the system as estimates for the input image. Database retrieval is done hierarchically, by first quickly rejecting the vast majority of all database views, and then ranking the remaining candidates in order of similarity to the input. Four different similarity measures are employed, based on edge location, edge orientation, finger location and geometric moments, respectively.

1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries | 1997

Distinguishing photographs and graphics on the World Wide Web

Vassilis Athitsos; Michael J. Swain; Charles H. Frankel

When we search for images in multimedia documents, we often have in mind specific image types that we are interested in; examples are photographs, graphics, maps, cartoons, portraits of people, and so on. This paper describes an automated system that classifies Web images as photographs or graphics. The design of the system is based on statistical observations about the image content of the two types, as well as learning techniques which make use of the vast amount of training data available on the Web. Text associated with the image can be used to further improve the accuracy of the classification. The system is used as a part of Webseer, an image search engine for the Web

pervasive technologies related to assistive environments | 2011

Comparing gesture recognition accuracy using color and depth information

Paul Doliotis; Alexandra Stefan; Christopher McMurrough; David Eckhard; Vassilis Athitsos

In human-computer interaction applications, gesture recognition has the potential to provide a natural way of communication between humans and machines. The technology is becoming mature enough to be widely available to the public and real-world computer vision applications start to emerge. A typical example of this trend is the gaming industry and the launch of Microsofts new camera: the Kinect. Other domains, where gesture recognition is needed, include but are not limited to: sign language recognition, virtual reality environments and smart homes. A key challenge for such real-world applications is that they need to operate in complex scenes with cluttered backgrounds, various moving objects and possibly challenging illumination conditions. In this paper we propose a method that accommodates such challenging conditions by detecting the hands using scene depth information from the Kinect. On top of our detector we employ a dynamic programming method for recognizing gestures, namely Dynamic Time Warping (DTW). Our method is translation and scale invariant which is a desirable property for many HCI systems. We have tested the performance of our approach on a digits recognition system. All experimental datasets include hand signed digits gestures but our framework can be generalized to recognize a wider range of gestures.

international conference on data engineering | 2008

Nearest Neighbor Retrieval Using Distance-Based Hashing

Vassilis Athitsos; Michalis Potamias; Panagiotis Papapetrou; George Kollios

A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficient approximate nearest neighbor retrieval. Hashing methods, such as locality sensitive hashing (LSH), have been successfully applied for similarity indexing in vector spaces and string spaces under the Hamming distance. The key novelty of the hashing technique proposed here is that it can be applied to spaces with arbitrary distance measures, including non-metric distance measures. First, we describe a domain-independent method for constructing a family of binary hash functions. Then, we use these functions to construct multiple multibit hash tables. We show that the LSH formalism is not applicable for analyzing the behavior of these tables as index structures. We present a novel formulation, that uses statistical observations from sample data to analyze retrieval accuracy and efficiency for the proposed indexing method. Experiments on several real-world data sets demonstrate that our method produces good trade-offs between accuracy and efficiency, and significantly outperforms VP-trees, which are a well-known method for distance-based indexing.

computer vision and pattern recognition | 2012

ChaLearn gesture challenge: Design and first results

Isabelle Guyon; Vassilis Athitsos; Pat Jangyodsuk; Ben Hamner; Hugo Jair Escalante

We organized a challenge on gesture recognition: http://gesture.chalearn.org. We made available a large database of 50,000 hand and arm gestures videorecorded with a Kinect™ camera providing both RGB and depth images. We used the Kaggle platform to automate submissions and entry evaluation. The focus of the challenge is on “one-shot-learning”, which means training gesture classifiers from a single video clip example of each gesture. The data are split into subtasks, each using a small vocabulary of 8 to 12 gestures, related to a particular application domain: hand signals used by divers, finger codes to represent numerals, signals used by referees, marchalling signals to guide vehicles or aircrafts, etc. We limited the problem to single users for each task and to the recognition of short sequences of gestures punctuated by returning the hands to a resting position. This situation is encountered in computer interface applications, including robotics, education, and gaming. The challenge setting fosters progress in transfer learning by providing for training a large number of sub-tasks related to, but different from the tasks on which the competitors are tested.

Explore More