Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Spencer Cappallo is active.

Publication


Featured researches published by Spencer Cappallo.


international conference on multimedia retrieval | 2015

Bag-of-Fragments: Selecting and Encoding Video Fragments for Event Detection and Recounting

Pascal Mettes; Jan C. van Gemert; Spencer Cappallo; Thomas Mensink; Cees G. M. Snoek

The goal of this paper is event detection and recounting using a representation of concept detector scores. Different from existing work, which encodes videos by averaging concept scores over all frames, we propose to encode videos using fragments that are discriminatively learned per event. Our bag-of-fragments split a video into semantically coherent fragment proposals. From training video proposals we show how to select the most discriminative fragment for an event. An encoding of a video is in turn generated by matching and pooling these discriminative fragments to the fragment proposals of the video. The bag-of-fragments forms an effective encoding for event detection and is able to provide a precise temporally localized event recounting. Furthermore, we show how bag-of-fragments can be extended to deal with irrelevant concepts in the event recounting. Experiments on challenging web videos show that i) our modest number of fragment proposals give a high sub-event recall, ii) bag-of-fragments is complementary to global averaging and provides better event detection, iii) bag-of-fragments with concept filtering yields a desirable event recounting. We conclude that fragments matter for video event detection and recounting.


international conference on multimedia retrieval | 2015

Latent Factors of Visual Popularity Prediction

Spencer Cappallo; Thomas Mensink; Cees G. M. Snoek

Predicting the popularity of an image on social networks based solely on its visual content is a difficult problem. One image may become widely distributed and repeatedly shared, while another similar image may be totally overlooked. We aim to gain insight into how visual content affects image popularity. We propose a latent ranking approach that takes into account not only the distinctive visual cues in popular images, but also those in unpopular images. This method is evaluated on two existing datasets collected from photo-sharing websites, as well as a new proposed dataset of images from the microblogging website Twitter. Our experiments investigate factors of the ranking model, the level of user engagement in scoring popularity, and whether the discovered senses are meaningful. The proposed approach yields state of the art results, and allows for insight into the semantics of image popularity on social networks.


acm multimedia | 2015

Image2Emoji: Zero-shot Emoji Prediction for Visual Media

Spencer Cappallo; Thomas Mensink; Cees G. M. Snoek

We present Image2Emoji, a multi-modal approach for generating emoji labels for an image in a zero-shot manner. Different from existing zero-shot image-to-text approaches, we exploit both image and textual media to learn a semantic embedding for the new task of emoji prediction. We propose that the widespread adoption of emoji suggests a semantic universality which is well-suited for interaction with visual media. We quantify the efficacy of our proposed model on the MSCOCO dataset, and demonstrate the value of visual, textual and multi-modal prediction of emoji. We conclude the paper with three examples of the application potential of emoji in the context of multimedia retrieval.


acm multimedia | 2015

Query-by-Emoji Video Search

Spencer Cappallo; Thomas Mensink; Cees G. M. Snoek

This technical demo presents Emoji2Video, a query-by-emoji interface for exploring video collections. Ideogram-based video search and representation presents an opportunity for an intuitive, visual interface and concise non-textual summary of video contents, in a form factor that is ideal for small screens. The demo allows users to build search strings comprised of ideograms which are used to query a large dataset of YouTube videos. The system returns a list of the top-ranking videos for the user query along with an emoji summary of the video contents so that users may make an informed decision whether to view a video or refine their search terms. The ranking of the videos is done in a zero-shot, multi-modal manner that employs an embedding space to exploit semantic relationships between user-selected ideograms and the videos visual and textual content.


IEEE Transactions on Multimedia | 2018

The New Modality: Emoji Challenges in Prediction, Anticipation, and Retrieval

Spencer Cappallo; Stacey Svetlichnaya; Pierre Garrigues; Thomas Mensink; Cees G. M. Snoek

Over the past decade, emoji have emerged as a new and widespread form of digital communication, spanning diverse social networks and spoken languages. We propose treating these ideograms as a new modality in their own right, distinct in their semantic structure from both the text in which they are often embedded as well as the images which they resemble. As a new modality, emoji present rich novel possibilities for representation and interaction. In this paper, we explore the challenges that arise naturally from considering the emoji modality through the lens of multimedia research, specifically the ways in which emoji can be related to other common modalities such as text and images. To do so, we first present a large-scale data set of real-world emoji usage collected from Twitter. This data set contains examples of both text-emoji and image-emoji relationships within tweets. We present baseline results on the challenge of predicting emoji from both text and images, using state-of-the-art neural networks. Further, we offer a first consideration into the problem of how to account for new, unseen emoji—a relevant issue as the emoji vocabulary continues to expand on a yearly basis. Finally, we present results for multimedia retrieval using emoji as queries.


british machine vision conference | 2016

Video Stream Retrieval of Unseen Queries using Semantic Memory

Spencer Cappallo; Thomas Mensink; Cees G. M. Snoek

Retrieval of live, user-broadcast video streams is an under-addressed and increasingly relevant challenge. The on-line nature of the problem requires temporal evaluation and the unforeseeable scope of potential queries motivates an approach which can accommodate arbitrary search queries. To account for the breadth of possible queries, we adopt a no-example approach to query retrieval, which uses a querys semantic relatedness to pre-trained concept classifiers. To adapt to shifting video content, we propose memory pooling and memory welling methods that favor recent information over long past content. We identify two stream retrieval tasks, instantaneous retrieval at any particular time and continuous retrieval over a prolonged duration, and propose means for evaluating them. Three large scale video datasets are adapted to the challenge of stream retrieval. We report results for our search methods on the new stream retrieval tasks, as well as demonstrate their efficacy in a traditional, non-streaming video task.


acm multimedia | 2017

Future-Supervised Retrieval of Unseen Queries for Live Video

Spencer Cappallo; Cees G. M. Snoek

Live streaming video presents new challenges for retrieval and content understanding. Its live nature means that video representations should be relevant to current content, and not necessarily to past content. We investigate retrieval of previously unseen queries for live video content. Drawing from existing whole-video techniques, we focus on adapting image-trained semantic models to the video domain. We introduce the use of future frame representations as a supervision signal for learning temporally aware semantic representations on unlabeled video data. Additionally, we introduce an approach for broadening a querys representation within a pre-constructed semantic space, with the aim of increasing overlap between embedded visual semantics and the query semantics. We demonstrate the efficacy of these contributions for unseen query retrieval on live videos. We further explore their applicability to tasks such as no example, whole-video action classification and no-example live video action prediction, and demonstrate state of the art results.


TRECVID Workshop | 2014

MediaMill at TRECVID 2014: Searching Concepts, Objects, Instances and Events in Video

Cees G. M. Snoek; K.E.A. van de Sande; D. Fontijne; Spencer Cappallo; J.C. van Gemert; Amirhossein Habibian; Thomas Mensink; Pascal Mettes; Ran Tao; Dennis Koelma; Arnold W. M. Smeulders


TRECVID Workshop | 2015

Qualcomm Research and University of Amsterdam at TRECVID 2015: Recognizing Concepts, Objects, and Events in Video

Cees G. M. Snoek; Spencer Cappallo; D. Fontijne; D. Julian; Dennis Koelma; Pascal Mettes; K.E.A. van de Sande; A. Sarah; H. Stokman; R.B. Towal


Archive | 2018

Predicting visual trends

Spencer Cappallo

Collaboration


Dive into the Spencer Cappallo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ran Tao

University of Amsterdam

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge