Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Amirhossein Habibian is active.

Publication


Featured researches published by Amirhossein Habibian.


acm multimedia | 2014

VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events

Amirhossein Habibian; Thomas Mensink; Cees G. M. Snoek

This paper proposes a new video representation for few-example event recognition and translation. Different from existing representations, which rely on either low-level features, or pre-specified attributes, we propose to learn an embedding from videos and their descriptions. In our embedding, which we call VideoStory, correlated term labels are combined if their combination improves the video classifier prediction. Our proposed algorithm prevents the combination of correlated terms which are visually dissimilar by optimizing a joint-objective balancing descriptiveness and predictability. The algorithm learns from textual descriptions of video content, which we obtain for free from the web by a simple spidering procedure. We use our VideoStory representation for few-example recognition of events on more than 65K challenging web videos from the NIST TRECVID event detection task and the Columbia Consumer Video collection. Our experiments establish that i) VideoStory outperforms an embedding without joint-objective and alternatives without any embedding, ii) The varying quality of input video descriptions from the web is compensated by harvesting more data, iii) VideoStory sets a new state-of-the-art for few-example event recognition, outperforming very recent attribute and low-level motion encodings. What is more, VideoStory translates a previously unseen video to its most likely description from visual content only.


international conference on multimedia retrieval | 2013

Recommendations for video event recognition using concept vocabularies

Amirhossein Habibian; Koen E. A. van de Sande; Cees G. M. Snoek

Representing videos using vocabularies composed of concept detectors appears promising for event recognition. While many have recently shown the benefits of concept vocabularies for recognition, the important question what concepts to include in the vocabulary is ignored. In this paper, we study how to create an effective vocabulary for arbitrary-event recognition in web video. We consider four research questions related to the number, the type, the specificity and the quality of the detectors in concept vocabularies. A rigorous experimental protocol using a pool of 1,346 concept detectors trained on publicly available annotations, a dataset containing 13,274 web videos from the Multimedia Event Detection benchmark, 25 event groundtruth definitions, and a state-of-the-art event recognition pipeline allow us to analyze the performance of various concept vocabulary definitions. From the analysis we arrive at the recommendation that for effective event recognition the concept vocabulary should i) contain more than 200 concepts, ii) be diverse by covering object, action, scene, people, animal and attribute concepts,iii) include both general and specific concepts, and iv) increase the number of concepts rather than improve the quality of the individual detectors. We consider the recommendations for video event recognition using concept vocabularies the most important contribution of the paper, as they provide guidelines for future work.


acm multimedia | 2013

Querying for video events by semantic signatures from few examples

Masoud Mazloom; Amirhossein Habibian; Cees G. M. Snoek

We aim to query web video for complex events using only a handful of video query examples, where the standard approach learns a ranker from hundreds of examples. We consider a semantic signature representation, consisting of off-the-shelf concept detectors, to capture the variance in semantic appearance of events. Since it is unknown what similarity metric and query fusion to use in such an event retrieval setting, we perform three experiments on unconstrained web videos from the TRECVID event detection task. It reveals that: retrieval with semantic signatures using normalized correlation as similarity metric outperforms a low-level bag-of-words alternative, multiple queries are best combined using late fusion with an average operator, and event retrieval is preferred over event classification when less than eight positive video examples are available.


machine vision applications | 2014

Evaluating multimedia features and fusion for example-based event detection

Gregory K. Myers; Ramesh Nallapati; Julien van Hout; Stephanie Pancoast; Ramakant Nevatia; Chen Sun; Amirhossein Habibian; Dennis Koelma; Koen E. A. van de Sande; Arnold W. M. Smeulders; Cees G. M. Snoek

Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia Events. SESAME includes multiple bag-of-words event classifiers based on single data types: low-level visual, motion, and audio features; high-level semantic visual concepts; and automatic speech recognition. Event detection performance was evaluated for each event classifier. The performance of low-level visual and motion features was improved by the use of difference coding. The accuracy of the visual concepts was nearly as strong as that of the low-level visual features. Experiments with a number of fusion methods for combining the event detection scores from these classifiers revealed that simple fusion methods, such as arithmetic mean, perform as well as or better than other, more complex fusion methods. SESAME’s performance in the 2012 TRECVID MED evaluation was one of the best reported.


Computer Vision and Image Understanding | 2014

Recommendations for Recognizing Video Events by Concept Vocabularies

Amirhossein Habibian; Cees G. M. Snoek

Representing videos using vocabularies composed of concept detectors appears promising for generic event recognition. While many have recently shown the benefits of concept vocabularies for recognition, studying the characteristics of a universal concept vocabulary suited for representing events is ignored. In this paper, we study how to create an effective vocabulary for arbitrary-event recognition in web video. We consider five research questions related to the number, the type, the specificity, the quality and the normalization of the detectors in concept vocabularies. A rigorous experimental protocol using a pool of 1346 concept detectors trained on publicly available annotations, two large arbitrary web video datasets and a common event recognition pipeline allow us to analyze the performance of various concept vocabulary definitions. From the analysis we arrive at the recommendation that for effective event recognition the concept vocabulary should (i) contain more than 200 concepts, (ii) be diverse by covering object, action, scene, people, animal and attribute concepts, (iii) include both general and specific concepts, (iv) increase the number of concepts rather than improve the quality of the individual detectors, and (v) contain detectors that are appropriately normalized. We consider the recommendations for recognizing video events by concept vocabularies the most important contribution of the paper, as they provide guidelines for future work.


international conference on multimedia retrieval | 2015

Discovering Semantic Vocabularies for Cross-Media Retrieval

Amirhossein Habibian; Thomas Mensink; Cees G. M. Snoek

This paper proposes a data-driven approach for cross-media retrieval by automatically learning its underlying semantic vocabulary. Different from the existing semantic vocabularies, which are manually pre-defined and annotated, we automatically discover the vocabulary concepts and their annotations from multimedia collections. To this end, we apply a probabilistic topic model on the text available in the collection to extract its semantic structure. Moreover, we propose a learning to rank framework, to effectively learn the concept classifiers from the extracted annotations. We evaluate the discovered semantic vocabulary for cross-media retrieval on three datasets of image/text and video/text pairs. Our experiments demonstrate that the discovered vocabulary does not require any manual labeling to outperform three recent alternatives for cross-media retrieval.


international conference on multimedia retrieval | 2015

Encoding Concept Prototypes for Video Event Detection and Summarization

Masoud Mazloom; Amirhossein Habibian; Dong Liu; Cees G. M. Snoek; Shih-Fu Chang

This paper proposes a new semantic video representation for few and zero example event detection and unsupervised video event summarization. Different from existing works, which obtain a semantic representation by training concepts over images or entire video clips, we propose an algorithm that learns a set of relevant frames as the concept prototypes from web video examples, without the need for frame-level annotations, and use them for representing an event video. We formulate the problem of learning the concept prototypes as seeking the frames closest to the densest region in the feature space of video frames from both positive and negative training videos of a target concept. We study the behavior of our video event representation based on concept prototypes by performing three experiments on challenging web videos from the TRECVID 2013 multimedia event detection task and the MED-summaries dataset. Our experiments establish that i) Event detection accuracy increases when mapping each video into concept prototype space. ii) Zero-example event detection increases by analyzing each frame of a video individually in concept prototype space, rather than considering the holistic videos. iii) Unsupervised video event summarization using concept prototypes is more accurate than using video-level concept detectors.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2017

Video2vec Embeddings Recognize Events When Examples Are Scarce

Amirhossein Habibian; Thomas Mensink; Cees G. M. Snoek

This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between video features and term vectors. In our proposed embedding, which we call Video2vec, the correlations between the words are utilized to learn a more effective representation by optimizing a joint objective balancing descriptiveness and predictability. We show how learning the Video2vec embedding using a multimodal predictability loss, including appearance, motion and audio features, results in a better predictable representation. We also propose an event specific variant of Video2vec to learn a more accurate representation for the words, which are indicative of the event, by introducing a term sensitive descriptiveness loss. Our experiments on three challenging collections of web videos from the NIST TRECVID Multimedia Event Detection and Columbia Consumer Videos datasets demonstrate: i) the advantages of Video2vec over representations using attributes or alternative embeddings, ii) the benefit of fusing video modalities by an embedding over common strategies, iii) the complementarity of term sensitive descriptiveness and multimodal predictability for event recognition. By its ability to improve predictability of present day audio-visual video features, while at the same time maximizing their semantic descriptiveness, Video2vec leads to state-of-the-art accuracy for both few- and zero-example recognition of events in video.


acm multimedia | 2013

Video2Sentence and vice versa

Amirhossein Habibian; Cees G. M. Snoek

In this technical demonstration, we showcase a multimedia search engine that retrieves a video from a sentence, or a sentence from a video. The key novelty is our machine translation capability that exploits a cross-media representation for both the visual and textual modality using concept vocabularies. We will demonstrate the translations using arbitrary web videos and sentences related to everyday events. What is more, we will provide an automatically generated explanation, in terms of concept detectors, on why a particular video or sentence has been retrieved as the most likely translation.


international conference on multimedia retrieval | 2014

On-the-Fly Video Event Search by Semantic Signatures

Amirhossein Habibian; Masoud Mazloom; Cees G. M. Snoek

In this technical demonstration, we showcase an event search engine that facilities instant access to an archive of web video. Different from many search engines which rely on high dimensional low-level visual features to represent videos, we rely on our proposed semantic signature. We extract semantic signature as the detection scores obtained by applying a vocabulary of 1,346 concept detectors on videos. The semantic signatures are compact, semantic and effective, as we will demonstrate for on-the-fly event retrieval using only a few positive examples. In addition, we will show how the signatures provide a crude interpretation on why a certain video has been retrieved.

Collaboration


Dive into the Amirhossein Habibian's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chen Sun

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge