Stan Sclaroff | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stan Sclaroff is active.

Explore More

Publication

Featured researches published by Stan Sclaroff.

International Journal of Computer Vision | 1996

Photobook: content-based manipulation of image databases

Alex Pentland; Rosalind W. Picard; Stan Sclaroff

We describe the Photobook system, which is a set of interactive tools for browsing and searching images and image sequences. These query tools differ from those used in standard image databases in that they make direct use of the image content rather than relying on text annotations. Direct search on image content is made possible by use of semantics-preserving image compression, which reduces images to a small set of perceptually-significant coefficients. We discuss three types of Photobook descriptions in detail: one that allows search based on appearance, one that uses 2-D shape, and a third that allows search based on textural properties. These image content descriptions can be combined with each other and with text-based descriptions to provide a sophisticated browsing and search capability. In this paper we demonstrate Photobook on databases containing images of people, video keyframes, hand tools, fish, texture swatches, and 3-D medical data.

applied imagery pattern recognition workshop | 1995

Photobook: tools for content-based manipulation of image databases

Alex Pentland; Rosalind W. Picard; Stan Sclaroff

We describe the Photobook system, which is a set of interactive tools for browsing and searching images and image sequences. These tools differ from those used in standard image databases in that they make direct use of the image content rather than relying on annotations. Direct search on image content is made possible by use of semantics-preserving image compression, which reduces images to a small set of perceptually significant coefficients. We describe three Photobook tools in particular: one that allows search based on gray-level appearance, one that uses 2-D shape, and a third that allows search based on textural properties.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2000

Fast, reliable head tracking under varying illumination: an approach based on registration of texture-mapped 3D models

M. La Cascia; Stan Sclaroff; Vassilis Athitsos

An improved technique for 3D head tracking under varying illumination conditions is proposed. The head is modeled as a texture mapped cylinder. Tracking is formulated as an image registration problem in the cylinders texture map image. The resulting dynamic texture map provides a stabilized view of the face that can be used as input to many existing 2D techniques for face recognition, facial expressions analysis, lip reading, and eye tracking. To solve the registration problem in the presence of lighting variation and head motion, the residual error of registration is modeled as a linear combination of texture warping templates and orthogonal illumination templates. Fast and stable on-line tracking is achieved via regularized, weighted least squares minimization of the registration error. The regularization term tends to limit potential ambiguities that arise in the warping and illumination templates. It enables stable tracking over extended sequences. Tracking does not require a precise initial fit of the model; the system is initialized automatically using a simple 2D face detector. The only assumption is that the target is facing the camera in the first frame of the sequence. The formulation is tailored to take advantage of texture mapping hardware available in many workstations, PCs, and game consoles. The non-optimized implementation runs at about 15 frames per second on a SGI O2 graphic workstation. Extensive experiments evaluating the effectiveness of the formulation are reported. The sensitivity of the technique to illumination, regularization parameters, errors in the initial positioning and internal camera parameters are analyzed. Examples and applications of tracking are reported.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1995

Modal matching for correspondence and recognition

Stan Sclaroff; Alex Pentland

Modal matching is a new method for establishing correspondences and computing canonical descriptions. The method is based on the idea of describing objects in terms of generalized symmetries, as defined by each objects eigenmodes. The resulting modal description is used for object recognition and categorization, where shape similarities are expressed as the amounts of modal deformation energy needed to align the two objects. In general, modes provide a global-to-local ordering of shape deformation and thus allow for selecting which types of deformations are used in object alignment and comparison. In contrast to previous techniques, which required correspondence to be computed with an initial or prototype shape, modal matching utilizes a new type of finite element formulation that allows for an objects eigenmodes to be computed directly from available image information. This improved formulation provides greater generality and accuracy, and is applicable to data of any dimensionality. Correspondence results with 2D contour and point feature data are shown, and recognition experiments with 2D images of hand tools and airplanes are described. >

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1991

Closed-form solutions for physically based shape modeling and recognition

Alex Pentland; Stan Sclaroff

The authors present a closed-form, physically based solution for recovering a three-dimensional (3-D) solid model from collections of 3-D surface measurements. Given a sufficient number of independent measurements, the solution is overconstrained and unique except for rotational symmetries. The proposed approach is based on the finite element method (FEM) and parametric solid modeling using implicit functions. This approach provides both the convenience of parametric modeling and the expressiveness of the physically based mesh formulation and, in addition, can provide great accuracy at physical simulation. A physically based object-recognition method that allows simple, closed-form comparisons of recovered 3-D solid models is presented. The performance of these methods is evaluated using both synthetic range data with various signal-to-noise ratios and using laser rangefinder data. >

Computer Vision and Image Understanding | 1999

Unifying textual and visual cues for content-based image retrieval on the World Wide Web

Stan Sclaroff; Marco La Cascia; Saratendu Sethi; Leonid Taycher

A system is proposed that combines textual and visual statistics in a single index vector for content-based search of a WWW image database. Textual statistics are captured in vector form using latent semantic indexing based on text in the containing HTML document. Visual statistics are captured in vector form using color and orientation histograms. By using an integrated approach, it becomes possible to take advantage of possible statistical couplings between the content of the document (latent semantic content) and the contents of images (visual statistics). The combined approach allows improved performance in conducting content-based search. Search performance experiments are reported for a database containing 350,000 images collected from the WWW.

computer vision and pattern recognition | 2003

Estimating 3D hand pose from a cluttered image

Vassilis Athitsos; Stan Sclaroff

A method is proposed that can generate a ranked list of plausible three-dimensional hand configurations that best match an input image. Hand pose estimation is formulated as an image database indexing problem, where the closest matches for an input hand image are retrieved from a large database of synthetic hand images. In contrast to previous approaches, the system can function in the presence of clutter, thanks to two novel clutter-tolerant indexing methods. First, a computationally efficient approximation of the image-to-model chamfer distance is obtained by embedding binary edge images into a high-dimensional Euclidean space. Second, a general-purpose, probabilistic line matching method identifies those line segment correspondences between model and input images that are the least likely to have occurred by chance. The performance of this clutter tolerant approach is demonstrated in quantitative experiments with hundreds of real hand images.

1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries | 1997

ImageRover: a content-based image browser for the World Wide Web

Stan Sclaroff; L. Taycher; M. La Cascia

ImageRover is a search-by-image-content navigation tool for the World Wide Web (WWW). To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that selects the distance metrics that are appropriate for a particular query

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2009

A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation

Jonathan Alon; Vassilis Athitsos; Quan Yuan; Stan Sclaroff

Within the context of hand gesture recognition, spatiotemporal gesture segmentation is the task of determining, in a video sequence, where the gesturing hand is located and when the gesture starts and ends. Existing gesture recognition methods typically assume either known spatial segmentation or known temporal segmentation, or both. This paper introduces a unified framework for simultaneously performing spatial segmentation, temporal segmentation, and recognition. In the proposed framework, information flows both bottom-up and top-down. A gesture can be recognized even when the hand location is highly ambiguous and when information about when the gesture begins and ends is unavailable. Thus, the method can be applied to continuous image streams where gestures are performed in front of moving, cluttered backgrounds. The proposed method consists of three novel contributions: a spatiotemporal matching algorithm that can accommodate multiple candidate hand detections in every frame, a classifier-based pruning framework that enables accurate and early rejection of poor matches to gesture models, and a subgesture reasoning algorithm that learns which gesture models can falsely match parts of other longer gestures. The performance of the approach is evaluated on two challenging applications: recognition of hand-signed digits gestured by users wearing short-sleeved shirts, in front of a cluttered background, and retrieval of occurrences of signs of interest in a video database containing continuous, unsegmented signing in American sign language (ASL).

international conference on computer vision | 2013

Saliency Detection: A Boolean Map Approach

Jianming Zhang; Stan Sclaroff

A novel Boolean Map based Saliency (BMS) model is proposed. An image is characterized by a set of binary images, which are generated by randomly thresholding the images color channels. Based on a Gestalt principle of figure-ground segregation, BMS computes saliency maps by analyzing the topological structure of Boolean maps. BMS is simple to implement and efficient to run. Despite its simplicity, BMS consistently achieves state-of-the-art performance compared with ten leading methods on five eye tracking datasets. Furthermore, BMS is also shown to be advantageous in salient object detection.

Explore More