Jeffrey Mark Siskind | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeffrey Mark Siskind is active.

Explore More

Publication

Featured researches published by Jeffrey Mark Siskind.

Cognition | 1996

A computational study of cross-situational techniques for learning word-to-meaning mappings

Jeffrey Mark Siskind

This paper presents a computational study of part of the lexical-acquisition task faced by children, namely the acquisition of word-to-meaning mappings. It first approximates this task as a formal mathematical problem. It then presents an implemented algorithm for solving this problem, illustrating its operation on a small example. This algorithm offers one precise interpretation of the intuitive notions of cross-situational learning and the principle of contrast applied between words in an utterance. It robustly learns a homonymous lexicon despite noisy multi-word input, in the presence of referential uncertainty, with no prior knowledge that is specific to the language being learned. Computational simulations demonstrate the robustness of this algorithm and illustrate how algorithms based on cross-situational learning and the principle of contrast might be able to solve lexical-acquisition problems of the size faced by children, under weak, worst-case assumptions about the type and quantity of data available.

Cognition | 2001

The Role of Exposure to Isolated Words in Early Vocabulary Development.

Michael R. Brent; Jeffrey Mark Siskind

Fluent speech contains no known acoustic analog of the blank spaces between printed words. Early research presumed that word learning is driven primarily by exposure to isolated words. In the last decade there has been a shift to the view that exposure to isolated words is unreliable and plays little if any role in early word learning. This study revisits the role of isolated words. The results show (a) that isolated words are a reliable feature of speech to infants, (b) that they include a variety of word types, many of which are repeated in close temporal proximity, (c) that a substantial fraction of the words infants produce are words that mothers speak in isolation, and (d) that the frequency with which a child hears a word in isolation predicts whether that word will be learned better than the childs total frequency of exposure to that word. Thus, exposure to isolated words may significantly facilitate vocabulary development at its earliest stages.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2003

Image segmentation with ratio cut

Song Wang; Jeffrey Mark Siskind

This paper proposes a new cost function, cut ratio, for segmenting images using graph-based methods. The cut ratio is defined as the ratio of the corresponding sums of two different weights of edges along the cut boundary and models the mean affinity between the segments separated by the boundary per unit boundary length. This new cost function allows the image perimeter to be segmented, guarantees that the segments produced by bipartitioning are connected, and does not introduce a size, shape, smoothness, or boundary-length bias. The latter allows it to produce segmentations where boundaries are aligned with image edges. Furthermore, the cut-ratio cost function allows efficient iterated region-based segmentation as well as pixel-based segmentation. These properties may be useful for some image-segmentation applications. While the problem of finding a minimum ratio cut in an arbitrary graph is NP-hard, one can find a minimum ratio cut in the connected planar graphs that arise during image segmentation in polynomial time. While the cut ratio, alone, is not sufficient as a baseline method for image segmentation, it forms a good basis for an extended method of image segmentation when combined with a small number of standard techniques. We present an implemented algorithm for finding a minimum ratio cut, prove its correctness, discuss its application to image segmentation, and present the results of segmenting a number of medical and natural images using our techniques.

Journal of Artificial Intelligence Research | 2001

Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic

Jeffrey Mark Siskind

This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005

Salient closed boundary extraction with ratio contour

Song Wang; Toshiro Kubota; Jeffrey Mark Siskind; Jun Wang

We present ratio contour, a novel graph-based method for extracting salient closed boundaries from noisy images. This method operates on a set of boundary fragments that are produced by edge detection. Boundary extraction identifies a subset of these fragments and connects them sequentially to form a closed boundary with the largest saliency. We encode the Gestalt laws of proximity and continuity in a novel boundary-saliency measure based on the relative gap length and average curvature when connecting fragments to form a closed boundary. This new measure attempts to remove a possible bias toward short boundaries. We present a polynomial-time algorithm for finding the most-salient closed boundary. We also present supplementary preprocessing steps that facilitate the application of ratio contour to real images. We compare ratio contour to two closely related methods for extracting closed boundaries: Elder and Zuckers method based on the shortest-path algorithm and Williams and Thornbers method based on spectral analysis and a strongly-connected-components algorithm. This comparison involves both theoretic analysis and experimental evaluation on both synthesized data and real images.

Artificial Intelligence Review | 1995

Grounding language in perception

Jeffrey Mark Siskind

This paper describes an implemented computer program that recognizes the occurrence of simple spatial motion events in simulated video input. The program receives an animated line-drawing as input and produces as output a semantic representation of the events occurring in that animation. This paper suggests that the notions ofsupport, contact, andattachment are crucial to specifying many simple spatial motion event types and presents a logical notation for describing classes of events that incorporates such notions as primitives. It then suggests that the truth values of such primitives can be recovered from perceptual input by a process of counterfactual simulation, predicting the effect of hypothetical changes to the world on the immediate future. Finally, it suggests that such counterfactual simulation is performed using knowledge of naive physical constraints such assubstantiality, continuity, gravity, andground plane. This paper describes the algorithms that incorporate these ideas in the program and illustrates the operation of the program on sample input.

computer vision and pattern recognition | 2013

Recognize Human Activities from Partially Observed Videos

Yu Cao; Daniel Paul Barrett; Andrei Barbu; Siddharth Narayanaswamy; Haonan Yu; Aaron Michaux; Yuewei Lin; Sven J. Dickinson; Jeffrey Mark Siskind; Song Wang

Recognizing human activities in partially observed videos is a challenging problem and has many practical applications. When the unobserved subsequence is at the end of the video, the problem is reduced to activity prediction from unfinished activity streaming, which has been studied by many researchers. However, in the general case, an unobserved subsequence may occur at any time by yielding a temporal gap in the video. In this paper, we propose a new method that can recognize human activities from partially observed videos in the general case. Specifically, we formulate the problem into a probabilistic framework: 1) dividing each activity into multiple ordered temporal segments, 2) using spatiotemporal features of the training video samples in each segment as bases and applying sparse coding (SC) to derive the activity likelihood of the test video sample at each segment, and 3) finally combining the likelihood at each segment to achieve a global posterior for the activities. We further extend the proposed method to include more bases that correspond to a mixture of segments with different temporal lengths (MSSC), which can better represent the activities with large intra-class variations. We evaluate the proposed methods (SC and MSSC) on various real videos. We also evaluate the proposed methods on two special cases: 1) activity prediction where the unobserved subsequence is at the end of the video, and 2) human activity recognition on fully observed videos. Experimental results show that the proposed methods outperform existing state-of-the-art comparison methods.

Computer Vision and Image Understanding | 1997

The Computational Perception of Scene Dynamics

Richard Mann; Allan D. Jepson; Jeffrey Mark Siskind

Understanding observations of interacting objects requires one to reason about qualitative scene dynamics. For example, on observing a hand lifting a can, we may infer that an “active” hand is applying an upwards force (by grasping) to lift a “passive” can. We present an implemented computational theory that derives such dynamic descriptions directly from camera input. Our approach is based on an analysis of the Newtonian mechanics of a simplified scene model. Interpretations are expressed in terms of assertions about the kinematic and dynamic properties of the scene. The feasibility of interpretations relative to Newtonian mechanics is determined by a reduction to linear programming. Finally, to select plausible interpretations, multiple feasible solutions are compared using a preference hierarchy. We provide computational examples to demonstrate that our model is sufficiently rich to describe a wide variety of image sequences.

european conference on computer vision | 1996

Computational Perception of Scene Dynamics

Richard Mann; Allan D. Jepson; Jeffrey Mark Siskind

Understanding observations of image sequences requires one to reason about qualitative scene dynamics. For example, on observing a hand lifting a cup, we may infer that an active hand is applying an upwards force (by grasping) on a passive cup. In order to perform such reasoning, we require an ontology that describes object properties and the generation and transfer of forces in the scene. Such an ontology should include, for example: the presence of gravity, the presence of a ground plane, whether objects are active or passive, whether objects are contacting and/or attached to other objects, and so on. In this work we make these ideas precise by presenting an implemented computational system that derives symbolic force-dynamic descriptions from video sequences. nOur approach to scene dynamics is based on an analysis of the Newtonian mechanics of a simplified scene model. The critical requirement is that, given image sequences, we can obtain estimates for the shape and motion of the objects in the scene. To do this, we assume that the objects can be approximated by a two-dimensional layered scene model. The input to our system consists of a set of polygonal outlines along with estimates for their velocities and accelerations, obtained from a view-based tracker. Given such input, we present a system that extracts force-dynamic descriptions for the image sequence. We provide computational examples to demonstrate that our ontology is sufficiently rich to describe a wide variety of image sequences. nThis work makes three central contributions. First, we provide an ontology suitable for describing object properties and the generation and transfer of forces in the scene. Second, we provide a computational procedure to test the feasibility of such interpretations by reducing the problem to a feasibility test in linear programming. Finally, we provide a theory of preference ordering between multiple interpretations along with an efficient computational procedure to determine maximal elements in such orderings.

european conference on computer vision | 1996

A Maximum-Likelihood Approach to Visual Event Classification

Jeffrey Mark Siskind; Quaid Morris

This paper presents a novel framework, based on maximum likelihood, for training models to recognise simple spatial-motion events, such as those described by the verbs pick up, put down, push, pull, drop, and throw, and classifying novel observations into previously trained classes. The model that we employ does not presuppose prior recognition or tracking of 3D object pose, shape, or identity. We describe our general framework for using maximum-likelihood techniques for visual event classification, the details of the generative model that we use to characterise observations as instances of event types, and the implemented computational techniques used to support training and classification for this generative model. We conclude by illustrating the operation of our implementation on a small example.

Explore More