Makarand Tapaswi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Makarand Tapaswi is active.

Explore More

Publication

Featured researches published by Makarand Tapaswi.

computer vision and pattern recognition | 2013

Semi-supervised Learning with Constraints for Person Identification in Multimedia Data

Martin Bäuml; Makarand Tapaswi; Rainer Stiefelhagen

We address the problem of person identification in TV series. We propose a unified learning framework for multi-class classification which incorporates labeled and unlabeled data, and constraints between pairs of features in the training. We apply the framework to train multinomial logistic regression classifiers for multi-class face recognition. The method is completely automatic, as the labeled data is obtained by tagging speaking faces using subtitles and fan transcripts of the videos. We demonstrate our approach on six episodes each of two diverse TV series and achieve state-of-the-art performance.

computer vision and pattern recognition | 2012

“Knock! Knock! Who is it?” probabilistic person identification in TV-series

Makarand Tapaswi; Martin Bäuml; Rainer Stiefelhagen

We describe a probabilistic method for identifying characters in TV series or movies. We aim at labeling every character appearance, and not only those where a face can be detected. Consequently, our basic unit of appearance is a person track (as opposed to a face track). We model each TV series episode as a Markov Random Field, integrating face recognition, clothing appearance, speaker recognition and contextual constraints in a probabilistic manner. The identification task is then formulated as an energy minimization problem. In order to identify tracks without faces, we learn clothing models by adapting available face recognition results. Within a scene, as indicated by prior analysis of the temporal structure of the TV series, clothing features are combined by agglomerative clustering. We evaluate our approach on the first 6 episodes of The Big Bang Theory and achieve an absolute improvement of 20% for person identification and 12% for face recognition.

computer vision and pattern recognition | 2016

Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning

Ziad Al-Halah; Makarand Tapaswi; Rainer Stiefelhagen

Collecting training images for all visual categories is not only expensive but also impractical. Zero-shot learning (ZSL), especially using attributes, offers a pragmatic solution to this problem. However, at test time most attribute-based methods require a full description of attribute associations for each unseen class. Providing these associations is time consuming and often requires domain specific knowledge. In this work, we aim to carry out attribute-based zero-shot classification in an unsupervised manner. We propose an approach to learn relations that couples class embeddings with their corresponding attributes. Given only the name of an unseen class, the learned relationship model is used to automatically predict the class-attribute associations. Furthermore, our model facilitates transferring attributes across data sets without additional effort. Integrating knowledge from multiple sources results in a significant additional improvement in performance. We evaluate on two public data sets: Animals with Attributes and aPascal/aYahoo. Our approach outperforms state-of the-art methods in both predicting class-attribute associations and unsupervised ZSL by a large margin.

computer vision and pattern recognition | 2015

Book2Movie: Aligning video scenes with book chapters

Makarand Tapaswi; Martin Bäuml; Rainer Stiefelhagen

Film adaptations of novels often visually display in a few shots what is described in many pages of the source novel. In this paper we present a new problem: to align book chapters with video scenes. Such an alignment facilitates finding differences between the adaptation and the original source, and also acts as a basis for deriving rich descriptions from the novel for the video clips. We propose an efficient method to compute an alignment between book chapters and video scenes using matching dialogs and character identities as cues. A major consideration is to allow the alignment to be non-sequential. Our suggested shortest path based approach deals with the non-sequential alignments and can be used to determine whether a video scene was part of the original book. We create a new data set involving two popular novel-to-film adaptations with widely varying properties and compare our method against other text-to-video alignment baselines. Using the alignment, we present a qualitative analysis of describing the video through rich narratives obtained from the novel.

computer vision and pattern recognition | 2014

StoryGraphs: Visualizing Character Interactions as a Timeline

Makarand Tapaswi; Martin Bäuml; Rainer Stiefelhagen

We present a novel way to automatically summarize and represent the storyline of a TV episode by visualizing character interactions as a chart. We also propose a scene detection method that lends itself well to generate over-segmented scenes which is used to partition the video. The positioning of character lines in the chart is formulated as an optimization problem which trades between the aesthetics and functionality of the chart. Using automatic person identification, we present StoryGraphs for 3 diverse TV series encompassing a total of 22 episodes. We define quantitative criteria to evaluate StoryGraphs and also compare them against episode summaries to evaluate their ability to provide an overview of the episode.

indian conference on computer vision, graphics and image processing | 2014

Total Cluster: A person agnostic clustering method for broadcast videos

Makarand Tapaswi; Omkar M. Parkhi; Esa Rahtu; Eric Sommerlade; Rainer Stiefelhagen; Andrew Zisserman

The goal of this paper is unsupervised face clustering in edited video material – where face tracks arising from different people are assigned to separate clusters, with one cluster for each person. In particular we explore the extent to which faces can be clustered automatically without making an error. This is a very challenging problem given the variation in pose, lighting and expressions that can occur, and the similarities between different people. The novelty we bring is three fold: first, we show that a form of weak supervision is available from the editing structure of the material – the shots, threads and scenes that are standard in edited video; second, we show that by first clustering within scenes the number of face tracks can be significantly reduced with almost no errors; third, we propose an extension of the clustering method to entire episodes using exemplar SVMs based on the negative training data automatically harvested from the editing structure. The method is demonstrated on multiple episodes from two very different TV series, Scrubs and Buffy. For both series it is shown that we move towards our goal, and also outperform a number of baselines from previous works.

international conference on multimedia retrieval | 2014

Story-based Video Retrieval in TV series using Plot Synopses

Makarand Tapaswi; Martin Bäuml; Rainer Stiefelhagen

We present a novel approach to search for plots in the storyline of structured videos such as TV series. To this end, we propose to align natural language descriptions of the videos, such as plot synopses, with the corresponding shots in the video. Guided by subtitles and person identities the alignment problem is formulated as an optimization task over all possible assignments and solved efficiently using dynamic programming. We evaluate our approach on a novel dataset comprising of the complete season 5 of Buffy the Vampire Slayer, and show good alignment performance and the ability to retrieve plots in the storyline.

international conference on computer vision | 2012

Fusion of speech, faces and text for person identification in TV broadcast

Hervé Bredin; Johann Poignant; Makarand Tapaswi; Guillaume Fortier; Viet Bac Le; Thibault Napoléon; Hua Gao; Claude Barras; Sophie Rosset; Laurent Besacier; Jakob J. Verbeek; Georges Quénot; Frédéric Jurie; Hazim Kemal Ekenel

The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when combined with name detection through video optical character recognition. Moreover, we show that unsupervised multimodal person recognition systems can achieve performance nearly as good as supervised monomodal ones (with several hundreds of identity models).

ieee international conference on automatic face gesture recognition | 2015

Improved weak labels using contextual cues for person identification in videos

Makarand Tapaswi; Martin Bäuml; Rainer Stiefelhagen

Fully automatic person identification in TV series has been achieved by obtaining weak labels from subtitles and transcripts [11]. In this paper, we revisit the problem of matching subtitles with face tracks to obtain more assignments and more accurate weak labels. We perform a detailed analysis of the state-of-the-art showing the types of errors during the assignment and providing insights into their cause. We then propose to model the problem of assigning names to face tracks as a joint optimization problem. Using negative constraints between co-occurring pairs of tracks and positive constraints from track threads, we are able to significantly improve the speaker assignment performance. This directly influences the identification performance on all face tracks. We also propose a new feature to determine whether a tracked face is speaking and show further improvements in performance while being computationally more efficient.

spoken language technology workshop | 2008

Multilingual spoken-password based user authentication in emerging economies using cellular phone networks

Amitava Das; Ohil K. Manyam; Makarand Tapaswi; Veeresh Taranalli

Mobile phones are playing an important role in changing the socio-economic landscapes of emerging economies like India. A proper voice-based user authentication will help in many new mobile based applications including mobile-commerce and banking. We present our exploration and evaluation of an experimental set-up for user authentication in remote Indian villages using mobile phones and user-selected multilingual spoken passwords. We also present an effective speaker recognition method using a set of novel features called Compressed Feature Dynamics (CFD) which capture the speaker-identity effectively from the speech dynamics contained in the spoken passwords. Early trials demonstrate the effectiveness of the proposed method in handling noisy cell-phone speech. Compared to conventional text-dependent speaker recognition methods, the proposed CFD method delivers competitive performance while significantly reducing storage and computational complexity - an advantage highly beneficial for cell-phone based deployment of such user authentication systems.

Explore More