Yaron Caspi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yaron Caspi is active.

Explore More

Publication

Featured researches published by Yaron Caspi.

computer vision and pattern recognition | 2008

Summarizing visual data using bidirectional similarity

Denis Simakov; Yaron Caspi; Eli Shechtman; Michal Irani

We propose a principled approach to summarization of visual data (images or video) based on optimization of a well-defined similarity measure. The problem we consider is re-targeting (or summarization) of image/video data into smaller sizes. A good ldquovisual summaryrdquo should satisfy two properties: (1) it should contain as much as possible visual information from the input data; (2) it should introduce as few as possible new visual artifacts that were not in the input data (i.e., preserve visual coherence). We propose a bi-directional similarity measure which quantitatively captures these two requirements: Two signals S and T are considered visually similar if all patches of S (at multiple scales) are contained in T, and vice versa. The problem of summarization/re-targeting is posed as an optimization problem of this bi-directional similarity measure. We show summarization results for image and video data. We further show that the same approach can be used to address a variety of other problems, including automatic cropping, completion and synthesis of visual data, image collage, object removal, photo reshuffling and more.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2002

Spatio-temporal alignment of sequences

Yaron Caspi; Michal Irani

This paper studies the problem of sequence-to-sequence alignment, namely, establishing correspondences in time and in space between two different video sequences of the same dynamic scene. The sequences are recorded by uncalibrated video cameras which are either stationary or jointly moving, with fixed (but unknown) internal parameters and relative intercamera external parameters. Temporal variations between image frames (such as moving objects or changes in scene illumination) are powerful cues for alignment, which cannot be exploited by standard image-to-image alignment techniques. We show that, by folding spatial and temporal cues into a single alignment framework, situations which are inherently ambiguous for traditional image-to-image alignment methods, are often uniquely resolved by sequence-to-sequence alignment. Furthermore, the ability to align and integrate information across multiple video sequences both in time and in space gives rise to new video applications that are not possible when only image-to-image alignment is used.

International Journal of Computer Vision | 2006

Feature-Based Sequence-to-Sequence Matching

Yaron Caspi; Denis Simakov; Michal Irani

This paper studies the problem of matching two unsynchronized video sequences of the same dynamic scene, recorded by different stationary uncalibrated video cameras. The matching is done both in time and in space, where the spatial matching can be modeled by a homography (for 2D scenarios) or by a fundamental matrix (for 3D scenarios). Our approach is based on matching space-time trajectories of moving objects, in contrast to matching interest points (e.g., corners), as done in regular feature-based image-to-image matching techniques. The sequences are matched in space and time by enforcing consistent matching of all points along corresponding space-time trajectories.By exploiting the dynamic properties of these space-time trajectories, we obtain sub-frame temporal correspondence (synchronization) between the two video sequences. Furthermore, using trajectories rather than feature-points significantly reduces the combinatorial complexity of the spatial point-matching problem when the search space is large. This benefit allows for matching information across sensors in situations which are extremely difficult when only image-to-image matching is used, including: (a) matching under large scale (zoom) differences, (b) very wide base-line matching, and (c) matching across different sensing modalities (e.g., IR and visible-light cameras). We show examples of recovering homographies and fundamental matrices under such conditions.

International Journal of Computer Vision | 2002

Aligning Non-Overlapping Sequences

Yaron Caspi; Michal Irani

This paper shows how two image sequences that have no spatial overlap between their fields of view can be aligned both in time and in space. Such alignment is possible when the two cameras are attached closely together and are moved jointly in space. The common motion induces “similar” changes over time within the two sequences. This correlated temporal behavior, is used to recover the spatial and temporal transformations between the two sequences. The requirement of “consistent appearance” in standard image alignment techniques is therefore replaced by “consistent temporal behavior”, which is often easier to satisfy.This approach to alignment can be used not only for aligning non-overlapping sequences, but also for handling other cases that are inherently difficult for standard image alignment techniques. We demonstrate applications of this approach to three real-world problems: (i) alignment of non-overlapping sequences for generating wide-screen movies, (ii) alignment of images (sequences) obtained at significantly different zooms, for surveillance applications, and, (iii) multi-sensor image alignment for multi-sensor fusion.

The Visual Computer | 2006

Dynamic stills and clip trailers

Yaron Caspi; Anat Axelrod; Yasuyuki Matsushita; Alon Gamliel

We propose a method for generating visual summaries of video. It reduces browsing time, minimizes screen-space utilization, while preserving the crux of the video content and the sensation of motion. The outputs are images or short clips, denoted as dynamic stills or clip trailers, respectively. The method selects informative poses out of extracted video objects. Optimal rotations and transparency supports visualization of an increased number of poses, leading to concise activity visualization. Our method addresses previously avoided scenarios, e.g., activities occurring in one place, or scenes with non-static background. We demonstrate and evaluate the method for various types of videos.

Biochemical Society Transactions | 2010

Ancient machinery embedded in the contemporary ribosome

Matthew J. Belousoff; Chen Davidovich; Ella Zimmerman; Yaron Caspi; Itai Wekselman; Lin Rozenszajn; Tal Shapira; Ofir Sade-Falk; Leena Taha; Anat Bashan; Manfred S. Weiss; Ada Yonath

Structural analysis, supported by biochemical, mutagenesis and computational evidence, indicates that the peptidyltransferase centre of the contemporary ribosome is a universal symmetrical pocket composed solely of rRNA. This pocket seems to be a relic of the proto-ribosome, an ancient ribozyme, which was a dimeric RNA assembly formed from self-folded RNA chains of identical, similar or different sequences. This could have occurred spontaneously by gene duplication or gene fusion. This pocket-like entity was capable of autonomously catalysing various reactions, including peptide bond formation and non-coded or semi-coded amino acid polymerization. Efforts toward the structural definition of the early entity capable of genetic decoding involve the crystallization of the small ribosomal subunit of a bacterial organism harbouring a single functional rRNA operon.

computer vision and pattern recognition | 2004

Capturing image structure with probabilistic index maps

Nebojsa Jojic; Yaron Caspi

One of the major problems in modeling images for vision tasks is that images with very similar structure may locally have completely different appearance, e.g., images taken under different illumination conditions, or the images of pedestrians with different clothing. While there have been many successful attempts to address these problems in application-specific settings, we believe that underlying a large set of problems in vision is a representational deficiency of intensity-derived local measurements that are the basis of most efficient models. We argue that interesting structure in images is better captured when the image is defined as a matrix whose entries are discrete indices to a separate palette of possible intensities, colors or other features, much like the image representation often used to save on storage. In order to model the variability in images, we define an image class not by a single index map, but by a probability distribution over the index maps, which can be automatically estimated from the data, and which we call probabilistic index maps. The existing algorithms can be adapted to work with this representation, as we illustrate in this paper on the example of transformation-invariant clustering and background subtraction. Furthermore, the probabilistic index map representation leads to algorithms with computational costs proportional to either the size of the palette or the log of the size of the palette, making the cost of significantly increased invariance to non-structural changes quite bearable.

computer vision and pattern recognition | 2006

Vertical Parallax from Moving Shadows

Yaron Caspi; Michael Werman

This paper presents a method for capturing and computing 3D parallax. 3D parallax, as used here, refers to vertical offset from the ground plane, height. The method is based on analyzing shadows of vertical poles (e.g., a tall building’s contour) that sweep the object. Unlike existing beam-scanning approaches, such as shadow or structured light, that recover the distance of a point from the camera, our approach measures the height from the ground plane directly. Previous methods compute the distance from the camera using triangulation between rays outgoing from the light-source and the camera. Such a triangulation is difficult when the objects are far from the camera, and requires accurate knowledge of the light source position. In contrast, our approach intersects two (unknown) planes generated separately by two casting objects. This omits the need to precompute the location of the light source. Furthermore, it allows a moving light source to be used. The proposed setup is particularly useful when the camera cannot directly face the scene or when the object is far away from the camera. A good example is an urban scene captured by a single webcam.

international conference on multimedia and expo | 2006

Scalability of Multimedia Applications on Next-Generation Processors

Guy Amit; Yaron Caspi; Ran Vitale; Adi T. Pinhas

In the near future, the majority of personal computers are expected to have several processing units. This is referred to as core multiprocessing (CMP). Furthermore, each of the computation units will be capable of running multiple hardware threads. To benefit from the additional processing power, application developers should multithread their software. This paper studies the scalability (expected speedup factor) of multimedia applications and provides guidelines for proper utilization of these new multi-core platforms. In particular, we discuss the decomposition method, load balancing, synchronization primitives, interaction with the operating system and hardware issues such as cache hierarchy and memory bandwidth. Our results are based on analysis of several state-of-the-art applications, including H.264 video encoding, panoramic image stitching and dense optical-flow estimation. We demonstrate how to multithread them properly, and report scalability results on several next-generation multi-core platforms

international conference on computer graphics and interactive techniques | 2006

Interactive video exploration using pose slices

Anat Axelrod; Yaron Caspi; Alon Gamliel; Yasuyuki Matsushita

The availability of video content is rapidly growing. In contrast, our time for selecting and watching videos remains limited. This work presents an interactive video browser that addresses two inherent bottlenecks in video browsing: user time and screen space. To alleviate both limitations while preserving the crux of the original visual information, we track and extract video objects, and fuse selected instantaneous objects’ appearances, denoted as pose slices, into a single display. We also develop rendering techniques that maximize the visibility of the activity in the clip by interactively controlling the viewing angles, the number of instantaneous objects and transparency values. We discuss the method’s benefits and limitations, and illustrate its usability using a variety of home videos, sports clips and movie DVDs. The new visualization technique is a generalization of an informative timeline and a synopsis mosaic, corresponding to 90◦ and 0◦ rotation in our visualization respectively. In addition, displaying all poses sequentially pose-by-pose results in a regular video playback. Thus, it could also be viewed as bridging the gap between still and video.

Explore More