Tanaya Guha
Indian Institute of Technology Kanpur
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tanaya Guha.
Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge | 2014
Rahul Gupta; Nikolaos Malandrakis; Bo Xiao; Tanaya Guha; Maarten Van Segbroeck; Matthew P. Black; Alexandros Potamianos; Shrikanth Narayanan
Depression is one of the most common mood disorders. Technology has the potential to assist in screening and treating people with depression by robustly modeling and tracking the complex behavioral cues associated with the disorder (e.g., speech, language, facial expressions, head movement, body language). Similarly, robust affect recognition is another challenge which stands to benefit from modeling such cues. The Audio/Visual Emotion Challenge (AVEC) aims toward understanding the two phenomena and modeling their correlation with observable cues across several modalities. In this paper, we use multimodal signal processing methodologies to address the two problems using data from human-computer interactions. We develop separate systems for predicting depression levels and affective dimensions, experimenting with several methods for combining the multimodal information. The proposed depression prediction system uses a feature selection approach based on audio, visual, and linguistic cues to predict depression scores for each session. Similarly, we use multiple systems trained on audio and visual cues to predict the affective dimensions in continuous-time. Our affect recognition system accounts for context during the frame-wise inference and performs a linear fusion of outcomes from the audio-visual systems. For both problems, our proposed systems outperform the video-feature based baseline systems. As part of this work, we analyze the role played by each modality in predicting the target variable and provide analytical insights.
international conference on acoustics, speech, and signal processing | 2015
Tanaya Guha; Naveen Kumar; Shrikanth Narayanan; Stacy L. Smith
In general, popular films and screenplays follow a well defined storytelling paradigm that comprises three essential segments or acts: exposition (act I), conflict (act II) and resolution (act III). Deconstructing a movie into its narrative units can enrich semantic understanding of movies, and help in movie summarization, navigation and detection of the key events. A multimodal framework for detecting such three act narrative structure is developed in this paper. Various low-level features are designed and extracted from video, audio and text channels of a movie so as to capture the pace and excitement of the movies narrative. Information from the three modalities is combined to compute a continuous dynamic measure of the movies narrative flow, referred to as the story intensity of the movie in this paper. Guided by the knowledge of film grammar, the act boundaries are detected, and compared against annotations collected from human experts. Promising results are demonstrated for nine full-length Hollywood feature films of various genres.
international conference on acoustics, speech, and signal processing | 2015
Tanaya Guha; Zhaojun Yang; Anil Ramakrishna; Ruth B. Grossman; Darren Hedley; Sungbok Lee; Shrikanth Narayanan
Children with Autism Spectrum Disorder (ASD) are known to have difficulty in producing and perceiving emotional facial expressions. Their expressions are often perceived as atypical by adult observers. This paper focuses on data driven ways to analyze and quantify atypicality in facial expressions of children with ASD. Our objective is to uncover those characteristics of facial gestures that induce the sense of perceived atypicality in observers. Using a carefully collected motion capture database, facial expressions of children with and without ASD are compared within six basic emotion categories employing methods from information theory, time-series modeling and statistical analysis. Our experiments show that children with ASD exhibit lower complexity in facial dynamics, with the eye regions contributing more than other facial regions towards the differences between children with and without ASD. Our study also notes that children with ASD exhibit lower left-right facial symmetry, and more uniform motion intensity across facial regions.
international conference on acoustics, speech, and signal processing | 2016
Ankit Goyal; Naveen Kumar; Tanaya Guha; Shrikanth Narayanan
This paper addresses the problem of continuous emotion prediction in movies from multimodal cues. The rich emotion content in movies is inherently multimodal, where emotion is evoked through both audio (music, speech) and video modalities. To capture such affective information, we put forth a set of audio and video features that includes several novel features such as, Video Compressibility and Histogram of Facial Area (HFA). We propose a Mixture of Experts (MoE)-based fusion model that dynamically combines information from the audio and video modalities for predicting the emotion evoked in movies. A learning module, based on hard Expectation-Maximization (EM) algorithm, is presented for the MoE model. Experiments on a database of popular movies demonstrate that our MoE-based fusion method outperforms popular fusion strategies (e.g. early and late fusion) in the context of dynamic emotion prediction.
IEEE Transactions on Affective Computing | 2018
Tanaya Guha; Zhaojun Yang; Ruth B. Grossman; Shrikanth Narayanan
Several studies have established that facial expressions of children with autism are often perceived as atypical, awkward or less engaging by typical adult observers. Despite this clear deficit in the quality of facial expression production, very little is understood about its underlying mechanisms and characteristics. This paper takes a computational approach to studying details of facial expressions of children with high functioning autism (HFA). The objective is to uncover those characteristics of facial expressions, notably distinct from those in typically developing children, and which are otherwise difficult to detect by visual inspection. We use motion capture data obtained from subjects with HFA and typically developing subjects while they produced various facial expressions. This data is analyzed to investigate how the overall and local facial dynamics of children with HFA differ from their typically developing peers. Our major observations include reduced complexity in the dynamic facial behavior of the HFA group arising primarily from the eye region.
multimedia signal processing | 2016
Naveen Kumar; Tanaya Guha; Che-Wei Huang; Colin Vaz; Shrikanth Narayanan
The majority of computational work on emotion in music concentrates on developing machine learning methodologies to build new, more accurate prediction systems, and usually relies on generic acoustic features. Relatively less effort has been put to the development and analysis of features that are particularly suited for the task. The contribution of this paper is twofold. First, the paper proposes two features that can efficiently capture the emotion-related properties in music. These features are named compressibility and sparse spectral components. These features are designed to capture the overall affective characteristics of music (global features). We demonstrate that they can predict emotional dimensions (arousal and valence) with high accuracy as compared to generic audio features. Secondly, we investigate the relationship between the proposed features and the dynamic variation in the emotion ratings. To this end, we propose a novel Haar transform-based technique to predict dynamic emotion ratings using only global features.
international conference on multimodal interfaces | 2015
Tanaya Guha; Che-Wei Huang; Naveen Kumar; Yan Zhu; Shrikanth Narayanan
The goal of this paper is to enable an objective understanding of gender portrayals in popular films and media through multimodal content analysis. An automated system for analyzing gender representation in terms of screen presence and speaking time is developed. First, we perform independent processing of the video and the audio content to estimate gender distribution of screen presence at shot level, and of speech at utterance level. A measure of the movies excitement or intensity is computed using audiovisual features for every scene. This measure is used as a weighting function to combine the gender-based screen/speaking time information at shot/utterance level to compute gender representation for the entire movie. Detailed results and analyses are presented on seventeen full length Hollywood movies.
international conference on acoustics, speech, and signal processing | 2017
Atanu Samanta; Tanaya Guha
Non-verbal behavioral cues, such as head movement, play a significant role in human communication and affective expression. Although facial expression and gestures have been extensively studied in the context of emotion understanding, the head motion (which accompany both) is relatively less understood. This paper studies the significance of head movement in adults affect communication using videos from movies. These videos are taken from the Acted Facial Expression in the Wild (AFEW) database and are labeled with seven basic emotion categories: anger, disgust, fear, joy, neutral, sadness, and surprise. Considering human head as a rigid body, we estimate the head pose at each video frame in terms of the three Euler angles, and obtain a time-series representation of head motion. First, we investigate the importance of the energy of angular head motion dynamics (displacement, velocity and acceleration) in discriminating among emotions. Next, we analyze the temporal variation of head motion by fitting an autoregressive model to the head motion time series. We observe that head motion carries sufficient information to distinguish any emotion from the rest with high accuracy and this information is complementary to that of facial expression as it helps improve emotion recognition accuracy.
indian conference on computer vision, graphics and image processing | 2016
Indra Kiran; Tanaya Guha; Gaurav Pandey
This paper addresses the problem of estimating the quality of an image as it would be perceived by a human. A well accepted approach to assess perceptual quality of an image is to quantify its loss of structural information. We propose a blind image quality assessment method that aims at quantifying structural information loss in a given (possibly distorted) image by comparing its structures with those extracted from a database of clean images. We first construct a subspace from the clean natural images using (i) principal component analysis (PCA), and (ii) overcomplete dictionary learning with sparsity constraint. While PCA provides mathematical convenience, an overcomplete dictionary is known to capture the perceptually important structures resembling the simple cells in the primary visual cortex. The subspace learned from the clean images is called the source subspace. Similarly, a subspace, called the target subspace, is learned from the distorted image. In order to quantify the structural information loss, we use a subspace alignment technique which transforms the target subspace into the source by optimizing over a transformation matrix. This transformation matrix is subsequently used to measure the global and local (patch-based) quality score of the distorted image. The quality scores obtained by the proposed method are shown to correlate well with the subjective scores obtained from human annotators. Our method achieves competitive results when evaluated on three benchmark databases.
MediaEval | 2014
Naveen Kumar; Rahul Gupta; Tanaya Guha; Colin Vaz; Maarten Van Segbroeck; Jangwon Kim; Shrikanth Narayanan