Stevan Rudinac
University of Amsterdam
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stevan Rudinac.
international conference on multimedia retrieval | 2011
Martha Larson; Mohammad Soleymani; Pavel Serdyukov; Stevan Rudinac; Christian Wartena; Vanessa Murdock; Gerald Friedland; Roeland Ordelman; Gareth J. F. Jones
Automatically generated tags and geotags hold great promise to improve access to video collections and online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features.
IEEE Transactions on Multimedia | 2013
Stevan Rudinac; Alan Hanjalic; Martha Larson
In this paper, we present a novel approach for automatic visual summarization of a geographic area that exploits user-contributed images and related explicit and implicit metadata collected from popular content-sharing websites. By means of this approach, we search for a limited number of representative but diverse images to represent the area within a certain radius around a specific location. Our approach is based on the random walk with restarts over a graph that models relations between images, visual features extracted from them, associated text, as well as the information on the uploader and commentators. In addition to introducing a novel edge weighting mechanism, we propose in this paper a simple but effective scheme for selecting the most representative and diverse set of images based on the information derived from the graph. We also present a novel evaluation protocol, which does not require input of human annotators, but only exploits the geographical coordinates accompanying the images in order to reflect conditions on image sets that must necessarily be fulfilled in order for users to find them representative and diverse. Experiments performed on a collection of Flickr images, captured around 207 locations in Paris, demonstrate the effectiveness of our approach.
IEEE Transactions on Multimedia | 2013
Stevan Rudinac; Martha Larson; Alan Hanjalic
In this paper we propose a novel approach to selecting images suitable for inclusion in the visual summaries. The approach is grounded in insights about how people summarize image collections. We utilize the Amazon Mechanical Turk crowdsourcing platform to obtain a large number of manually created visual summaries as well as information about criteria for image inclusion in the summary. Based on these large-scale user tests, we propose an automatic image selection approach, which jointly utilizes the analysis of image content, context, popularity, visual aesthetic appeal as well as the sentiment derived from the comments posted on the images. In our approach we do not describe images based on their properties only, but also in the context of semantically related images, which improves robustness and effectively enables propagation of sentiment, aesthetic appeal as well as various inherent attributes associated with a particular group of images. We discuss the phenomenon of a low inter-user agreement, which makes an automatic evaluation of visual summaries a challenging task and propose a solution inspired by the text summarization and machine translation communities. The experiments performed on a collection of geo-referenced Flickr images demonstrate the effectiveness of our image selection approach.
acm multimedia | 2011
Stevan Rudinac; Alan Hanjalic; Martha Larson
This paper presents an automatic approach that uses community-contributed images to create representative and diverse visual summaries of specific geographic areas. Complex relations between images, extracted visual features, text associated with the images as well as users and their social network are modeled using a multimodal graph. To compute affinities between nodes in the graph we rely on the proven concept of random walk with restarts. The novelty of our approach lies in its use of the multimodal graph to create a diverse, yet representative, image set. Further, we introduce an edge-weighting mechanism for the fusion of heterogeneous modalities. We evaluate our summaries with a new protocol that tests for representativeness and diversity using image geo-coordinates and is independent of the need for human evaluators. The experiments, performed on a set of Flickr images, demonstrate the effectiveness of our approach.
workshop on image analysis for multimedia interactive services | 2007
Stevan Rudinac; Marija Uscumlic; Maja Rudinac; Goran Zajic; Branimir Reljin
The global image search and regional image search are compared by using content-based image retrieval system with user relevance feedback. It was expectable that regional search can minimize the effect of the background to the image retrieval. Images from database are partitioned into regular rectangular regions: 4times4 non-overlapped (NOV) regions and 3times3 overlapped (OV) regions, and a feature vectors are determined for whole images and for regions. Four CBIR scenarios are considered: global search, search based on 4times4 NOV regions, based on 3times3 OV regions and based on arbitrary cropped part of a query image. System is tested over images from Corel IK dataset.
workshop on image analysis for multimedia interactive services | 2009
Stevan Rudinac; Martha Larson; Alan Hanjalic
In this paper we propose an approach that utilizes visual features and conventional text-based pseudo-relevance feedback (PRF) to improve the results of semantic-theme-based video retrieval. Our visual reranking method is based on an Average Item Distance (AID) score. AID-based visual reranking is designed to improve the suitability of items at the top of the initial results list, i.e., those feedback items selected for use in query expansion. Our method is intended to help target feedback items representative of visual regularity typifying the semantic theme of the query. Experiments performed on the VideoCLEF 2008 data set and on a number of retrieval scenarios combining the inputs from speech-transcript-based (i.e., text-based) search and visual reranking demonstrate the benefits of using AID-based visual representatives to compensate for the inherent problems of PRF, such as topic drift.
acm multimedia | 2016
Masoud Mazloom; Robert Rietveld; Stevan Rudinac; Marcel Worring; Willemijn van Dolen
Brand-related user posts on social networks are growing at a staggering rate, where users express their opinions about brands by sharing multimodal posts. However, while some posts become popular, others are ignored. In this paper, we present an approach for identifying what aspects of posts determine their popularity. We hypothesize that brand-related posts may be popular due to several cues related to factual information, sentiment, vividness and entertainment parameters about the brand. We call the ensemble of cues engagement parameters. In our approach, we propose to use these parameters for predicting brand-related user post popularity. Experiments on a collection of fast food brand-related user posts crawled from Instagram show that: visual and textual features are complementary in predicting the popularity of a post; predicting popularity using our proposed engagement parameters is more accurate than predicting popularity directly from visual and textual features; and our proposed approach makes it possible to understand what drives post popularity in general as well as isolate the brand specific drivers.
european conference on information retrieval | 2010
Stevan Rudinac; Martha Larson; Alan Hanjalic
We propose a technique that predicts both if and how expansion should be applied to individual queries. The prediction is made on the basis of the topical consistency of the top results of the initial results lists returned by the unexpanded query and several query expansion alternatives. We use the coherence score, known to capture the tightness of topical clustering structure, and also propose two simplified coherence indicators. We test our technique in a spoken content retrieval task, with the intention of helping to control the effects of speech recognition errors. Experiments use 46 semantic-theme-based queries defined by VideoCLEF 2009 over the TRECVid 2007 and 2008 video data sets. Our indicators make the best choice roughly 50% of the time. However, since they predict the right query expansion in critical cases, overall MAP improves. The approach is computationally lightweight and requires no training data.
International Journal of Multimedia Information Retrieval | 2012
Stevan Rudinac; Martha Larson; Alan Hanjalic
In this paper, we present a novel approach that utilizes noisy shot-level visual concept detection to improve text-based video retrieval. As opposed to most of the related work in the field, we consider entire videos as the retrieval units and focus on queries that address a general subject matter (semantic theme) of a video. Retrieval is performed using a coherence-based query performance prediction framework. In this framework, we make use of video representations derived from the visual concepts detected in videos to select the best possible search result given the query, video collection, available search mechanisms and the resources for query modification. In addition to investigating the potential of this approach to outperform typical text-based video retrieval baselines, we also explore the possibility to achieve further improvement in retrieval performance through combining our concept-based query performance indicators with the indicators utilizing the spoken content of the videos. The proposed retrieval approach is data driven, requires no prior training and relies exclusively on the analyses of the video collection and different results lists returned for the given query text. The experiments are performed on the MediaEval 2010 datasets and demonstrate the effectiveness of our approach.
acm multimedia | 2010
Stevan Rudinac; Martha Larson; Alan Hanjalic
In this paper, we present a technique for unsupervised construction of concept vectors, concept-based representations of complete video units, from the noisy shot-level output of a set of visual concept detectors. We deploy these vectors to improve spoken-content-based video retrieval using Query Expansion Selection (QES). Our QES approach analyzes results lists returned in response to several alternative query expansions, applying a coherence indicator calculated on top-ranked items to choose the appropriate expansion. The approach is data driven, does not require prior training and relies solely on the analysis of the collection being queried and the results lists produced for the given query text. The experiments, performed on two datasets, TRECVID 2007/2008 and TRECVID 2009, demonstrate the effectiveness of our approach and show that a small set of well-selected visual concept detectors is sufficient to improve retrieval performance.