John R. Kender
Columbia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by John R. Kender.
international conference on automatic face and gesture recognition | 1996
Rick Kjeldsen; John R. Kender
This paper describes the techniques used to separate the hand from a cluttered background in a gesture recognition system. Target colors are identified using a histogram-like structure called a Color Predicate, which is trained in real-time using a novel algorithm. Running on standard PC hardware, the segmentation is of sufficient speed and quality to support an interactive user interface. The method has shown its flexibility in a range of different office environments, segmenting users with many different skin-tones. Variations have been applied to other problems including finding face candidates in video sequences.
european conference on computer vision | 2002
Aya Aner; John R. Kender
We present an approach for compact video summaries that allows fast and direct access to video data. The video is segmented into shots and, in appropriate video genres, into scenes, using previously proposed methods. A new concept that supports the hierarchical representation of video is presented, and is based on physical setting and camera locations. We use mosaics to represent and cluster shots, and detect appropriate mosaics to represent scenes. In contrast to approaches to video indexing which are based on key-frames, our efficient mosaic-based scene representation allows fast clustering of scenes into physical settings, as well as further comparison of physical settings across videos. This enables us to detect plots of different episodes in situation comedies and serves as a basis for indexing whole video sequences. In sports videos where settings are not as well defined, our approachallo ws classifying shots for characteristic event detection. We use a novel method for mosaic comparison and create a highly compact non-temporal representation of video. This representation allows accurate comparison of scenes across different videos and serves as a basis for indexing video libraries.
acm multimedia | 2005
Alexander Haubold; John R. Kender
We investigate methods of segmenting, visualizing, and indexing presentation videos by both audio and visual data. The audio track is segmented by speaker, and augmented with key phrases which are extracted using an Automatic Speech Recognizer (ASR). The video track is segmented by visual dissimilarities and changes in speaker gesturing, and augmented by representative key frames. An interactive user interface combines a visual representation of audio, video, text, key frames, and allows the user to navigate presentation videos. User studies with 176 students of varying knowledge were conducted on 7.5 hours of student presentation video (32 presentations). Tasks included searching for various portions of presentations, both known and unknown to students, and summarizing presentations given the annotations. The results are favorable towards the video summaries and the interface, suggesting faster responses by a factor of 20% compared to having access to the actual video. Accuracy of responses remained the same on average. Follow-up surveys present a number of suggestions towards improving the interface, such as the incorporation of automatic speaker clustering and identification, and the display of an abstract topological view of the presentation. Surveys also show alternative contexts in which students would like to use the tool in the classroom environment.
acm multimedia | 2011
Lexing Xie; Apostol Natsev; John R. Kender; Matthew L. Hill; John R. Smith
We propose visual memes, or frequently reposted short video segments, for tracking large-scale video remix in social media. Visual memes are extracted by novel and highly scalable detection algorithms that we develop, with over 96% precision and 80% recall. We monitor real-world events on YouTube, and we model interactions using a graph model over memes, with people and content as nodes, and meme postings as links. This allows us to define several measures of influence. These abstractions, using more than two million video shots from several large-scale event datasets, enable us to quantify and efficiently extract several important observations: over half of the videos contain re-mixed content, which appears rapidly; video view counts, particularly high ones, are poorly correlated with the virality of content; the influence of traditional news media versus citizen journalists varies from event to event; iconic single images of an event are easily extracted; and content that will have long lifespan can be predicted within a day after it first appears. Visual memes can be applied to a number of social media scenarios: brand monitoring, social buzz tracking, ranking content and users, among others.
international symposium on multimedia | 2004
Tiecheng Liu; John R. Kender
E-learning is an emerging new education approach that augments learning experiences by integrating multimedia and network technologies. As an integrated part of e-learning, the lecture videos captured in classrooms contain most of the instructional content. Effective use of these videos, however, remains a challenging task. This paper reviews previous research work on capturing, analyzing, indexing, and retrieval of lecture (instructional) videos, and introduces on-going research efforts related with instructional videos. This paper compares instructional video to other video genres and addresses its special issues and difficulties. We present the current challenges in content-based indexing and retrieval of instructional videos. Improving these techniques for lecture videos has significant educational and social benefits.
Graphical Models \/graphical Models and Image Processing \/computer Vision, Graphics, and Image Processing | 1983
Steven A. Shafer; Takeo Kanade; John R. Kender
Abstract Mackworths gradient space has proved to be a useful tool for image understanding. However, descriptions of its important properties have been somewhat scattered in the literature. The fundamental properties of the gradient space under orthography and perspective, and for curved surfaces, are developed and summarized. While largely a recounting of previously published results, there are a number of new observations, particularly concerning the gradient space and perspective projection. In addition, the definition and use of vector gradients as well as surface gradients provides concise notation for several results. The properties explored include the orthographic and perspective projections themselves; the definition of gradients; the gradient space consequences of vectors (edges) belonging to one or more surfaces, and of several vectors being contained on a single surface; and the relationships between vanishing points, vanishing lines, and the gradient space. The paper is intended as a study guide for learning about the gradient space, as well as a reference for researchers working with gradient space.
international conference on multimedia and expo | 2007
Alexander Haubold; John R. Kender
We introduce a novel and inexpensive approach for the temporal alignment of speech to highly imperfect transcripts from automatic speech recognition (ASR). Transcripts are generated for extended lecture and presentation videos, which in some cases feature more than 30 speakers with different accents, resulting in highly varying transcription qualities. In our approach we detect a subset of phonemes in the speech track, and align them to the sequence of phonemes extracted from the transcript. We report on the results for 4 speech-transcript sets ranging from 22 to 108 minutes. The alignment performance is promising, showing a correct matching of phonemes within 10, 20, 30 second error margins for more than 60 %, 75 %, 90 % of text, respectively, on average. For perfect manually generated transcripts, more than 75 % of text is correctly aligned within 5 seconds.
conference on image and video retrieval | 2007
Alexander Haubold; John R. Kender
In the domain of candidly captured student presentation videos, we examine and evaluate approaches for multi-modal analysis and indexing of audio and video. We apply visual segmentation techniques on unedited video to determine likely changes of topics. Speaker segmentation methods are employed to determine individual student appearances, which are linked to extracted headshots to create a visual speaker index. Videos are augmented with time-aligned filtered keywords and phrases from highly inaccurate speech transcripts. Our experimental user interface, the VAST MM Browser (Video Audio Structure Text Multi Media Browser), combines streaming videos, visual, and textual indices for browsing and searching. We evaluate the UI and methods in a large engineering design course. We report on observations and statistics collected over 4 semesters and 598 student participants. Results suggest that our video indexing and retrieval approach is effective, and that our continuous improvements are reflecting in an increase in precision and recall of user study tasks.
ACM Transactions on Multimedia Computing, Communications, and Applications | 2007
Tiecheng Liu; John R. Kender
Video key frame extraction is one of the most important research problems for video summarization, indexing, and retrieval. For a variety of applications such as ubiquitous media access and video streaming, the temporal boundaries between video key frames are required for synchronizing visual content with audio. In this article, we define temporal video sampling as a unified process of extracting video key frames and computing their temporal boundaries, and formulate it as an optimization problem. We first provide an optimal approach that minimizes temporal video sampling error using a dynamic programming process. The optimal approach retrieves a key frame hierarchy and all temporal boundaries in O(n4) time and O(n2) space. To further reduce computational complexity, we also provide a suboptimal greedy algorithm that exploits the data structure of a binary heap and uses a novel “look-ahead” computational technique, enabling all levels of key frames to be extracted with an average-case computational time of O(n log n) and memory usage of O(n). Both the optimal and the greedy methods are free of parameters, thus avoiding the threshold-selection problem that exists in other approaches. We empirically compare the proposed optimal and greedy methods with several existing methods in terms of video sampling error, computational cost, and subjective quality. An evaluation of eight videos of different genres shows that the greedy approach achieves performance very close to that of the optimal approach while drastically reducing computational cost, making it suitable for processing long video sequences in large video databases.
international conference on web engineering | 2006
Hassan H. Malik; John R. Kender
This paper presents a new approach to cluster web images. Images are first processed to extract signal features such as color in HSV format and quantized orientation. Web pages referring to these images are processed to extract textual features (keywords) and feature reduction techniques such as stemming, stop word elimination, and Zipfs law are applied. All visual and textual features are used to generate association rules. Hypergraphs are generated from these rules, with features used as vertices and discovered associations as hyperedges. Twenty-two objective interestingness measures are evaluated on their ability to prune non-interesting rules and to assign weights to hyperedges. Then a hypergraph partitioning algorithm is used to generate clusters of features, and a simple scoring function is used to assign images to clusters. A tree-distance-based evaluation measure is used to evaluate the quality of image clustering with respect to manually generated ground truth. Our experiments indicate that combining textual and content-based features results in better clustering as compared to signal-only or text-only approaches. Online steps are done in real-time, which makes this approach practical for web images. Furthermore, we demonstrate that statistical interestingness measures such as Correlation Coefficient, Laplace, Kappa and J-Measure result in better clustering compared to traditional association rule interestingness measures such as Support and Confidence.